Simple Linear Regression

Regression is a statistical method used in finance, investing and other disciplines that attempts to determine the strength and character of the relationship between one dependence variable (let us say Y) and a series of other variables known as independent variables and Simple linear regression is the one which have one dependent variable and one independent variable.

Generally, it is represented by

 Y= α + βx or Y = mX +c

which describes a line with slope β and y-intercept α. In general, such a relationship may not hold exactly for the largely unobserved population of values of the independent and dependent variables; we call the unobserved deviations from the above equation the errors. When using simple regression, we must remove the outliers as this will help in making your model more accurate. You can simply remove any outliers in data pre-processing phase.  

Fig: Representation of Simple Linear Regression

Residual

Residuals are the difference between the actual and estimated values. Generally represented as:

Uses of Linear Regressions:

>They are mostly used in estimating sales and trends.

>They are used in analyzing the impact of Price Changes.

>Risk Prediction in financial Institutions.

Let’s see the implementation of Linear Regression. We will be learning every machine learning model using an example which will help you in understanding the models of machine learning more efficiently. Today we will be practicing Linear Regression on predicting the Weight of the Brain using the size of the head. Be ready to know the weight of the brain.

Are you ready to know the weight of your brain?

First, we will be importing the necessary libraries and the go through each of the process of machine learnings.

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import math
from sklearn.metrics import mean_squared_error

Then load the data in the notebook. You can use Jupyter notebook or Google Collab. Get the data Here.

data = pd.read_csv("/content/drive/MyDrive/Data/headbrain.csv")
data.head()

Let us analyze the data and see what are the important factors or data to take into consideration before visualizing the data. From the table below we can see that the Size of Head and Brain Weights can be the useful data’s to make a prediction. We can use all three independent variables for prediction which will be more accurate but we are learning the Simple Linear Regression so we have to choose only one independent variable and here Head Size makes sense.

Fig: Data Samples

Now, let us separate the data into dependent variable Y and Independent Variable X.

size = data['Head Size(cm^3)']
weight = data['Brain Weight(grams)']
print(size)
print(weight)

Now let’s visualize the data using the following code. Here from the visualization, we can see that implementing the Linear Regression could be useful. If the data patterns are not linear or have higher variance than usual then Linear Regression couldn’t be implemented.

Fig : Visualization

Then we have to look the data formats. Normally all data are in lists while importing using pandas. So, we have to convert the data into arrays.

All the pre-processing tasks are done now we can train the model. We will be using the models from sklearn through the series. We have already imported the LinearRegression library from sklearn.  Simply define the mode and fit into the model.

Errors:

Let us see some of the errors that helps us in understanding how the model performed. Basically, there are many ways to understand the performance of the model which will be discussed in further blogs.

Let’s look for Root Mean Squared Error and R squared error.

The mean squared error or mean squared deviation of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the error loss. And the square root of mean squared error is Root Mean Square Error.  The is the value representing the loss function.

R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale. If the score is higher than we can conclude that the linear regression models fits the observations.

You can calculate the errors using the following code:

regression_model_mse = mean_squared_error(x,y)
print('Mean Squared Error:\t',math.sqrt(regression_model_mse))
print("R squared value\t\t",model.score(x,y))

Now you can visualize how the model perform in your data using matplotlib.

plt.scatter(x,y,color = 'blue')
plt.plot(x,model.predict(x),color = 'black')
plt.title('Trained Linear Regression')
plt.xlabel('Size')
plt.ylabel('Weight')
Fig: Visualization of Output

Hey, It’s time to predict the output. Have you measure the size of the head so you could find the weight of the brain?

predict = model.predict([[550]])
print('The value is ',predict ,'for Brain size 550')

Hope you Enjoyed Learning. Now, You can get full code here. Do leave a comment if you have any confusion.

Leave a Comment

Your email address will not be published. Required fields are marked *