Data Visualization is the presentation of the data in the graphical or pictorial form so that human brain can easily process the information present in the data and interpret it to get viable outcome and adjust the different variables to see the effects. In Machine Learning Data Visualization is the most important part. More you understand the data more you will be able to implement the machine learning models. Simply, Data Visualization allows us to interpret the data and let us choose suitable model to implement in Machine Learning.
There are various tools for data visualization but we will be looking for Matplotlib and Seaborn tools to visualize the data. There are different methods that we can use to visualize the data. To visualize the data we can use Scatter Plot, Step Plot, Bar Chart, Fill Between, Time Series, Box Plot, Histogram, Pie Chart and many more.
Be sure that you have checked out the Data Preprocessing for Machine Learning Part I and Data Preprocessing for Machine Learning Part II so that you are can get better results. We have perform data preprocessing and now will be performing Data Visualization.
Getting Started with Matplotlib
Matplotlib is a python library that is used for 2D plotting’s and It’s a comprehensive library for creating static, animated, and interactive, visualizations in python. It is easily customizable through accessing the class. We will be performing all the operations in google colab and colab has preinstalled Matplotlib library so we don’t need to install it.
First we have to import the library and we can perform plotting.
#import matplotlib from matplotlib import pyplot as plt
We will be simply plotting a linear graph and labelling as an basic example and the code is as:
#Plotting Linear Graph plt.plot([1,4,2,7,3],[7,5,4,3,8],color = 'red') #Lable X-axis plt.xlabel('X-axis') #Lable Y-axis plt.ylabel('Y-axis') #Give the Title plt.title("Simple Chart") #Showing what we plotted plt.show()
Adding Styles to Chart
Import the style function from Matplotlib and you are ready to style your graph. Using styles will simply make your chart beautiful. We will be using the same chart with style.
from matplotlib import style style.use("ggplot") #Plotting Linear Graph plt.plot([1,4,2,7,3],[7,5,4,3,8],color = 'red',label = "Line") #Lable X-axis plt.xlabel('X-axis') #Lable Y-axis plt.ylabel('Y-axis') #Give the Title plt.title("Simple Chart") #Showing what we plotted plt.legend() plt.show()
Bar graph are the graphs that can be used to show something that changes over time and to compare items. They have an x-axis (horizontal) and a y-axis (vertical).
#Bar Graph plt.bar([1,2,3,4,5],[45,69,4,60,24]) plt.xlabel('bar number') plt.ylabel('bar height') plt.title("Bar Graph") plt.show()
A Histogram is a graphical display of data using bars. This graph looks like Bar Graph but this is continuous type of chart. In a histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range.
x = [21,22,23,4,5,6,77,8,9,10,31,32,33,34,60,55,35,36,37,18,49,50,100] num_bins = 10 plt.hist(x, num_bins) plt.xlabel("Weekly Earnings ($)") plt.ylabel("No. of Students") plt.title("Histogram") #plt.legend() plt.show()
Here, the number of divisions is 10 and the no data between those intervals are counted and displayed as height.
A scatter plot uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.
x= [1,2,3,4,5,6] y= [5,7,12,20,25,50] plt.scatter(x,y, color = 'r') plt.xlabel('x') plt.ylabel('y') plt.title('Scatter Plot') plt.show()
A stack plot is a plot that shows the whole data set with easy visualization of how each part makes up the whole. Each constituent of the stack plot is stacked on top of each other. It shows the part makeup of the unit, as well as the whole unit.
years = [2000, 2005, 2010, 2015, 2020] revenue = [140, 250, 300, 400, 350] profit = [40, 50, 60, 70, 65] plt.stackplot(years, profit, revenue, colors = ['r','g']) plt.xlabel('Years') plt.ylabel('Billions ($)') plt.title('Stack or Area Plot') plt.show()
A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice, is proportional to the quantity it represents.
students = [400, 1000, 1500, 700, 500] interests = ['Computer Engineerng','Civil Engineering','Mechanical Engineering','Geomatics Engineering','Computer Science'] col= ['r','b','g','y','m'] plt.pie(students,labels=interests, colors= col) plt.title('Pie Plot') plt.show()
Go checkout the next blog of Data Visualization for Machine Learning with Seaborn and Data Preprocessing for Machine Learning to clean your data.