Data Visualization for Machine Learning with Matplotlib

Data Visualization is the presentation of the data in the graphical or pictorial form so that human brain can easily process the information present in the data and interpret it to get viable outcome and adjust the different variables to see the effects. In Machine Learning Data Visualization is the most important part. More you understand the data more you will be able to implement the machine learning models. Simply, Data Visualization allows us to interpret the data and let us choose suitable model to implement in Machine Learning.

There are various tools for data visualization but we will be looking for Matplotlib and Seaborn tools to visualize the data. There are different methods that we can use to visualize the data. To visualize the data we can use Scatter Plot, Step Plot, Bar Chart, Fill Between, Time Series, Box Plot, Histogram, Pie Chart and many more.

Data Plotting

Be sure that you have checked out the Data Preprocessing for Machine Learning Part I and Data Preprocessing for Machine Learning Part II so that you are can get better results. We have perform data preprocessing and now will be performing Data Visualization.

Getting Started with Matplotlib

Matplotlib is a python library that is used for 2D plotting’s and It’s a comprehensive library for creating static, animated, and interactive, visualizations in python. It is easily customizable through accessing the class. We will be performing all the operations in google colab and colab has preinstalled Matplotlib library so we don’t need to install it.

First we have to import the library and we can perform plotting.

#import matplotlib
from matplotlib import pyplot as plt

We will be simply plotting a linear graph and labelling as an basic example and the code is as:

#Plotting Linear Graph
plt.plot([1,4,2,7,3],[7,5,4,3,8],color = 'red')
#Lable X-axis
plt.xlabel('X-axis')
#Lable Y-axis
plt.ylabel('Y-axis')
#Give the Title
plt.title("Simple Chart")
#Showing what we plotted
plt.show()

Adding Styles to Chart

Import the style function from Matplotlib and you are ready to style your graph. Using styles will simply make your chart beautiful. We will be using the same chart with style.

from matplotlib import style 

style.use("ggplot")
#Plotting Linear Graph
plt.plot([1,4,2,7,3],[7,5,4,3,8],color = 'red',label = "Line")
#Lable X-axis
plt.xlabel('X-axis')
#Lable Y-axis
plt.ylabel('Y-axis')
#Give the Title
plt.title("Simple Chart")
#Showing what we plotted
plt.legend()
plt.show()

Bar Graph

Bar graph are the graphs that can be used to show something that changes over time and to compare items. They have an x-axis (horizontal) and a y-axis (vertical).

Code:

#Bar Graph
plt.bar([1,2,3,4,5],[45,69,4,60,24])
plt.xlabel('bar number')
plt.ylabel('bar height')

plt.title("Bar Graph")
plt.show()

Histogram

A Histogram is a graphical display of data using bars. This graph looks like Bar Graph but this is continuous type of chart. In a histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range.

x = [21,22,23,4,5,6,77,8,9,10,31,32,33,34,60,55,35,36,37,18,49,50,100]
num_bins = 10

plt.hist(x, num_bins)

plt.xlabel("Weekly Earnings ($)")
plt.ylabel("No. of Students")

plt.title("Histogram")

#plt.legend()

plt.show()

Here, the number of divisions is 10 and the no data between those intervals are counted and displayed as height.

Scatter Plot

A scatter plot uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.

Code:

x= [1,2,3,4,5,6]
y= [5,7,12,20,25,50]

plt.scatter(x,y, color = 'r')

plt.xlabel('x')
plt.ylabel('y')
plt.title('Scatter Plot')

plt.show()

Stack Plot

A stack plot is a plot that shows the whole data set with easy visualization of how each part makes up the whole. Each constituent of the stack plot is stacked on top of each other. It shows the part makeup of the unit, as well as the whole unit.

Code:

years = [2000, 2005, 2010, 2015, 2020]
revenue = [140, 250, 300, 400, 350]
profit = [40, 50, 60, 70, 65]

plt.stackplot(years, profit, revenue, colors = ['r','g'])

plt.xlabel('Years')
plt.ylabel('Billions ($)')
plt.title('Stack or Area Plot')
plt.show()
Red is Profit and Green is Revenue

Pie Chart

A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice, is proportional to the quantity it represents.

Code:

students = [400, 1000, 1500, 700, 500]
interests = ['Computer Engineerng','Civil Engineering','Mechanical Engineering','Geomatics Engineering','Computer Science']
col= ['r','b','g','y','m']

plt.pie(students,labels=interests, colors= col)

plt.title('Pie Plot')

plt.show()
Pie Chart

Get Full code here.

Go checkout the next blog of Data Visualization for Machine Learning with Seaborn and Data Preprocessing for Machine Learning to clean your data.

Leave a Comment

Your email address will not be published. Required fields are marked *