Data Visualization for Machine Learning with Seaborn

Seaborn

Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures. Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them.

Seaborn has 4 dependencies. Before installing Seaborn, make sure you have already installed NumPy, Pandas, Matplotlib and SciPy.

Visit Seaborn Documentation

Advantages of Seaborn Over Matplotlib:

  • Provides variety of visualization patterns.
  • Uses Fewer Syntax.
  • Beautiful default themes

Getting Started with Seaborn

Seaborn has its default datasets and we will be using those datasets for this session. We will be using flights and tips datasets available in sns and will import it.

lets import flights datasets along with necessary libraries.

#Importing Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy
import seaborn as sns

#Load Datasets
a = sns.load_dataset("flights") #Loading daasets
sns.relplot(x="passengers", y = "month", data = a,) 
This is the format of the dataset.
Plotted using the Seaborn Graphical Library.

Now we will be importing the next dataset and then performing the different plotting available in seaborn. Seaborn is very easy to use and can be executed in lines.

We will be importing the default datasets of seaborn named tips and the datasets looks like.

Seaborn Tips dataset
b = sns.load_dataset("tips")
sns.relplot(x="time", y="tip", data=b, kind ="line")
Line representation of tips over time

Categorical Plot

This plot plots the data with different category available. We will be plotting the data with categories of day.

#Categorical Plot
sns.catplot(x="day", y="total_bill", data=b)

Violin Plot

To identify the skewness and outliers we can use violin plot. This will give a general idea where our model will be effective. The broad part is the area of great effectiveness.

sns.catplot(x="day", y="total_bill", kind= "violin", data=b)

We can find the skewness of the chart as in the blue plot the chart is left skewed and can perform our operations as the skewness.

Box Plot

This is also a categorical plot and we can plot box plot by using kind = boxen.

Code:

sns.catplot(x = "day", y = "total_bill", data = b, kind = "boxen")

Multi-Plot Grids

Graphs are plotted side-by-side using the same scale and axes to aid comparison

It is pretty useful to help developer or researchers to understand the large amount of data in a blink. lets us analyze who gives more tips male or female, smoker or non smoker and more.

b = sns.load_dataset("tips")
c = sns.FacetGrid(b,col ="sex" )
c.map(plt.hist,"tip")
Oh, Male gives more tips than female.

Do you agree that male gives more tips than female?

View Results

Loading ... Loading ...

b = sns.load_dataset("tips")
c = sns.FacetGrid(b,col ="smoker" )
c.map(plt.hist,"tip")
Non smokers are kind Hearted.
Saturday is the best day to get tips. Work hard on Saturday.
People having Dinner will pay you more.

By this way you can visualize the data and understand the data in a better way. You can get full code Here. Comment if you have any suggestion or queries. Read the Data Preprocessing for Machine Learning to clean your data.

Leave a Comment

Your email address will not be published. Required fields are marked *