Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Simple Linear Regression in Python
Next article icon

Simple Linear Regression in Python

Last Updated : 16 Jan, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Simple linear regression models the relationship between a dependent variable and a single independent variable. In this article, we will explore simple linear regression and it's implementation in Python using libraries such as NumPy, Pandas, and scikit-learn.

Understanding Simple Linear Regression

Simple Linear Regression aims to describe how one variable i.e the dependent variable changes in relation with reference to the independent variable. For example consider a scenario where a company wants to predict sales based on advertising expenditure. By using simple linear regression the company can determine if an increase in advertising leads to higher sales or not.

The below graph explains the relationship between advertising expenditure and sales using simple linear regression:

simplelinearregression
Simple Linear Regression

The relationship between the dependent and independent variables is represented by the simple linear equation:

y=mx+b

Here:

  • y is the predicted value (dependent variable).
  • m is the slope of the line
  • x is the independent variable.
  • b is the y-intercept (the value of y when x is 0).

In this equation m signifies the slope of the line indicating how much y changes for a one-unit increase in x, a positive m suggests a direct relationship while a negative m indicates an inverse relationship.

To better understand this relationship we can express it in a more statistical context using the linear regression formula:

Y = β_0 + β_1x

In simple linear regression the parameters \beta_0 \space \text {and} \space \beta_1​ play crucial roles in defining the relationship between the independent variable X and the dependent variable Y in the regression equation.

Intercept \beta_0:

  • The intercept \beta_0 represents the predicted value of the dependent variable Y when the independent variable X is zero. In other words it is the point where the regression line crosses the y-axis.
  • It​ provides a baseline value for Y and helps us understand the expected outcome when there is no influence from X. For example if you were predicting sales based on advertising expenditure it​ would indicate the estimated sales when no money is spent on advertising.
intercept
Simple Linear Regression

Slope \beta_1:

  • A positive \beta_1 value suggests that as X increases, Y also increases, indicating a direct relationship.
  • Conversely, a negative \beta_1​ value indicates that an increase in X leads to a decrease in Y, indicating an inverse relationship.

For instance, if \beta_1 = 2, this would mean that for every additional unit spent on advertising, sales are expected to increase by 2 units.

slope
Simple Linear Regression

Implementing Simple Linear Regression in Python

In order to use Simple Linear Regression in Python we must first install Python and a few necessary libraries. The fundamental setup tasks are listed below:

  • Install Python: You can download and install Python from the official Python website.
  • Install required libraries: Use pip (Python’s package installer) to install the necessary libraries, such as numpy, pandas, matplotlib, and scikit-learn.

pip install numpy pandas matplotlib scikit-learn

This configuration will set up the environment for Python machine learning modelling, data processing, and visualization.

We will use an actual dataset to demonstrate how to use basic linear regression. We'll be using the Boston Housing Dataset. This dataset provides details on Boston real estate costs as well as room counts, crime rates and other attributes. Based on the number of rooms we will forecast the cost of the house.

Step 1: Import Required Libraries

Python
# Import necessary libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score from sklearn.datasets import load_boston 

Step 2: Load and Explore the Dataset

We will load the dataset check its structure and examine a few sample records.

Python
# Load the Boston Housing dataset boston = load_boston() df = pd.DataFrame(boston.data, columns=boston.feature_names)  # Add the target variable (house prices) to the DataFrame df['PRICE'] = boston.target  # Display the first few rows of the dataset print(df.head())  # Check for missing values print(df.isnull().sum()) 

Output:

      CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  \
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0


PTRATIO B LSTAT PRICE
0 15.3 396.90 4.98 24.0
1 17.8 396.90 9.14 21.6
2 17.8 392.83 4.03 34.7
3 18.7 394.63 2.94 33.4
4 18.7 396.90 5.33 36.2
CRIM 0
ZN 0
INDUS 0
CHAS 0
NOX 0
RM 0
AGE 0
DIS 0
RAD 0
TAX 0
PTRATIO 0
B 0
LSTAT 0
PRICE 0
dtype: int64

In this dataset:

  • CRIM: per capita crime rate by town
  • ZN: proportion of residential land zoned for lots over 25,000 sq. ft.
  • RM: average number of rooms per dwelling
  • PRICE: median value of owner-occupied homes in $1000s (this is our target variable)

Step 3: Selecting Features and Splitting the Data

We will use the number of rooms (RM) as the independent variable and predict house prices (PRICE). Next, we'll split the data into training and testing sets.

Python
# Define the feature (independent variable) and target (dependent variable) X = df[['RM']]  # Number of rooms y = df['PRICE']  # House prices  # Split the data into training and testing sets (80% train, 20% test) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  print(f"Training set size: {X_train.shape[0]}") print(f"Testing set size: {X_test.shape[0]}") 

Output:

Training set size: 404
Testing set size: 102

Step 4: Train the Simple Linear Regression Model

Now we can create a Linear Regression model using scikit-learn and train it on the training data. The model will calculate the intercept (𝛽0) and coefficient (𝛽1) of the linear equation.

Python
# Create a Linear Regression model model = LinearRegression()  # Train the model on the training data model.fit(X_train, y_train)  # Print the intercept and coefficient print(f"Intercept: {model.intercept_}") print(f"Coefficient: {model.coef_}") 

Output:

Intercept: -36.24631889813792
Coefficient: [9.34830141]

Step 5: Make Predictions

Once the model is trained, we can use it to make predictions on the test data.

Python
# Predict house prices for the test set y_pred = model.predict(X_test)  # Display the first few predictions alongside the actual values predictions = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred}) print(predictions.head()) 

Output:

     Actual  Predicted
173 23.6 23.732383
274 32.4 26.929502
491 13.6 19.684568
72 22.8 20.451129
452 16.1 22.619935

Step 6: Visualize the Regression Line

A good way to understand the relationship between the predicted and actual data is to visualize it. We'll plot the regression line along with the actual data points.

Python
# Plot the actual data points plt.scatter(X_test, y_test, color='blue', label='Actual')  # Plot the regression line plt.plot(X_test, y_pred, color='red', label='Regression Line')  # Add labels and title plt.xlabel('Number of Rooms (RM)') plt.ylabel('House Price ($1000s)') plt.title('Simple Linear Regression: Number of Rooms vs. House Price') plt.legend() plt.show() 

Output:

Screenshot-2024-09-15-153524
Simple Linear Regression in Python

Step 7: Evaluate the Model

Finally, we will evaluate the model's performance using metrics such as Mean Squared Error (MSE) and R-squared score.

Python
# Calculate Mean Squared Error (MSE) mse = mean_squared_error(y_test, y_pred) print(f"Mean Squared Error: {mse}")  # Calculate R-squared score r2 = r2_score(y_test, y_pred) print(f"R-squared score: {r2}") 

Output:

Mean Squared Error: 46.144775347317264
R-squared score: 0.3707569232254778

In this tutorial we used the scikit-learn framework and Python to develop Simple Linear Regression on the Boston Housing Dataset. We used measures like Mean Squared Error (MSE) and R-squared score to assess the model's performance and forecast house prices based on the number of rooms.


Next Article
Simple Linear Regression in Python

D

daswanta_kumar_routhu
Improve
Article Tags :
  • Machine Learning
  • AI-ML-DS
  • Regression
  • ML-Regression
  • AI-ML-DS With Python
Practice Tags :
  • Machine Learning

Similar Reads

    Solving Linear Regression in Python
    Linear regression is a widely used statistical method to find the relationship between dependent variable and one or more independent variables. It is used to make predictions by finding a line that best fits the data we have. The most common approach to best fit a linear regression model is least-s
    3 min read
    Python | Linear Regression using sklearn
    Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting. Different regression models
    3 min read
    Linear Regression in Python using Statsmodels
    In this article, we will discuss how to use statsmodels using Linear Regression in Python. Linear regression analysis is a statistical technique for predicting the value of one variable(dependent variable) based on the value of another(independent variable). The dependent variable is the variable th
    4 min read
    Linear Regression using PyTorch
    Linear Regression is a very commonly used statistical method that allows us to determine and study the relationship between two continuous variables. The various properties of linear regression and its Python implementation have been covered in this article previously. Now, we shall find out how to
    4 min read
    Linear Regression for Single Prediction
    Linear regression is a statistical method and machine learning foundation used to model relationship between a dependent variable and one or more independent variables. The primary goal is to predict the value of the dependent variable based on the values of the independent variables.Predicting a Si
    6 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences