Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Python Tutorial
  • Interview Questions
  • Python Quiz
  • Python Glossary
  • Python Projects
  • Practice Python
  • Data Science With Python
  • Python Web Dev
  • DSA with Python
  • Python OOPs
Open In App
Next Article:
Linear Regression for Single Prediction
Next article icon

Linear Regression in Python using Statsmodels

Last Updated : 22 Dec, 2022
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article, we will discuss how to use statsmodels using Linear Regression in Python.

Linear regression analysis is a statistical technique for predicting the value of one variable(dependent variable) based on the value of another(independent variable). The dependent variable is the variable that we want to predict or forecast. In simple linear regression, there’s one independent variable used to predict a single dependent variable. In the case of multilinear regression, there’s more than one independent variable. The independent variable is the one you’re using to forecast the value of the other variable. The statsmodels.regression.linear_model.OLS method is used to perform linear regression. Linear equations are of the form:

y = mx+cm = slopec = constant

Syntax: statsmodels.regression.linear_model.OLS(endog, exog=None, missing=’none’, hasconst=None, **kwargs)

Parameters: 

  • endog: array like object. 
  • exog: array like object. 
  • missing: str. None, decrease, and raise are the available alternatives. If the value is ‘none,’ no nan testing is performed. Any observations with nans are dropped if ‘drop’ is selected. An error is raised if ‘raise’ is used. ‘none’ is the default.
  • hasconst: None or Bool. Indicates whether a user-supplied constant is included in the RHS. If True, k constant is set to 1 and all outcome statistics are calculated as if a constant is present. If False, k constant is set to 0 and no constant is verified.
  • **kwargs: When using the formula interface, additional arguments are utilised to set model characteristics.

Return: Ordinary least squares are returned.

Installation 

pip install numpy pip install pandas pip install statsmodels

Stepwise Implementation

Step 1: Import packages.

Importing the required packages is the first step of modeling. The pandas, NumPy, and stats model packages are imported.

import numpy as np import pandas as pd import statsmodels.api as sm

Step 2: Loading data.

To access the CSV file click here. The CSV file is read using pandas.read_csv() method. The head or the first five rows of the dataset is returned by using the head() method. Head size and Brain weight are the columns.

Python3

df = pd.read_csv('headbrain1.csv')
df.head()
                      
                       

The head of the data frame looks like this:

 

Visualizing the data:

By using the matplotlib and seaborn packages, we visualize the data. sns.regplot() function helps us create a regression plot.

Python3

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
 
df = pd.read_csv('headbrain1.csv')
sns.regplot('Head Size(cm^3)', 'Brain Weight(grams)', data=df)
 
plt.show()
                      
                       

Output:

Linear Regression in Python using Statsmodels

 

Step 3: Setting a hypothesis.

  • Null hypothesis (H0): There is no relationship between head size and brain weight.
  • Alternative hypothesis (Ha): There is a relationship between head size and brain weight.

Step 4: Fitting the model

statsmodels.regression.linear_model.OLS() method is used to get ordinary least squares, and fit() method is used to fit the data in it. The ols method takes in the data and performs linear regression. we provide the dependent and independent columns in this format :

inpendent_columns ~ dependent_column: 

left side of the ~ operator contains the independent variables and right side of the operator contains the name of the dependent variable or the predicted column.

Python3

df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight', data=df).fit()
                      
                       

Step 5: Summary of the model.

All the summary statistics of the linear regression model are returned by the model.summary() method. The p-value and many other values/statistics are known by this method. Predictions about the data are found by the model.summary() method.

Python3

print(model.summary())
                      
                       

Code Implementation:

Python3

# import packages
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
 
# loading the csv file
df = pd.read_csv('headbrain1.csv')
print(df.head())
 
# fitting the model
df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight', data=df).fit()
 
# model summary
print(model.summary())
                      
                       

Output:

Linear Regression in Python using Statsmodels

 

Description of some of the terms in the table :

  •  R- squared value: R-squared value ranges between 0 and 1. An R-squared of 100 percent indicates that all changes in the dependent variable are completely explained by changes in the independent variable(s). if we get 1 as an r-squared value it means there’s a perfect fit. In our example, the r-squared value is 0.638. 
  • F- statistic: The F statistic simply compares the combined effect of all variables. In simplest terms, reject the null hypothesis if your alpha level is greater than your p-value. 
  • coef: the coefficients of the independent variables in the regression equation.

Our predictions:

If we take our significance level (alpha) to be 0.05, we reject the null hypothesis and accept the alternative hypothesis as p<0.05. so, we can say that there is a relationship between head size and brain weight.



Next Article
Linear Regression for Single Prediction
author
isitapol2002
Improve
Article Tags :
  • Python
  • Python-statsmodels
Practice Tags :
  • python

Similar Reads

  • Simple Linear Regression in Python
    Simple linear regression models the relationship between a dependent variable and a single independent variable. In this article, we will explore simple linear regression and it's implementation in Python using libraries such as NumPy, Pandas, and scikit-learn. Understanding Simple Linear Regression
    7 min read
  • Weighted Least Squares Regression in Python
    Weighted Least Squares (WLS) regression is a powerful extension of ordinary least squares regression, particularly useful when dealing with data that violates the assumption of constant variance. In this guide, we will learn brief overview of Weighted Least Squares regression and demonstrate how to
    6 min read
  • Scatter Plot with Regression Line using Altair in Python
    Prerequisite: Altair In this article, we are going to discuss how to plot to scatter plots with a regression line using the Altair library. Scatter Plot and Regression Line The values of two different numeric variables is represented by dots or circle in Scatter Plot. Scatter Plot is also known as a
    4 min read
  • How To Make Scatter Plot with Regression Line using Seaborn in Python?
    In this article, we will learn how to male scatter plots with regression lines using Seaborn in Python. Let's discuss some concepts : Seaborn : Seaborn is a tremendous visualization library for statistical graphics plotting in Python. It provides beautiful default styles and color palettes to make s
    2 min read
  • Linear Regression for Single Prediction
    Linear regression is a statistical method and machine learning foundation used to model relationship between a dependent variable and one or more independent variables. The primary goal is to predict the value of the dependent variable based on the values of the independent variables. Predicting a S
    6 min read
  • 7 Steps to Run a Linear Regression Analysis using R
    Linear Regression is a useful statistical tool for modelling the relationship between a dependent variable and one or more independent variables. It is widely used in many disciplines, such as science, medicine, economics, and education. For instance, several areas of education employ linear regress
    9 min read
  • How to Plot the Linear Regression in R
    In this article, we are going to learn to plot linear regression in R. But, to plot Linear regression, we first need to understand what exactly is linear regression. What is Linear Regression?Linear Regression is a supervised learning model, which computes and predicts the output implemented from th
    8 min read
  • How to Install Statsmodels in Python?
    Statsmodels is a Python library that enables us to estimate and analyze various statistical models. It is built on numeric and scientific libraries like NumPy and SciPy. It provides classes & functions for the estimation of many different statistical models. Before installing Statsmodels, ensure
    3 min read
  • Box Office Revenue Prediction Using Linear Regression in ML
    When a movie is produced then the director would certainly like to maximize his/her movie's revenue. But can we predict what will be the revenue of a movie by using its genre or budget information? This is exactly what we'll learn in this article, we will learn how to implement a machine learning al
    6 min read
  • statsmodels.robust_kurtosis() in Python
    With the help of statsmodels.robust_kurtosis() method, we can calculate the four kurtosis value by using statsmodels.robust_kurtosis() method. Syntax : statsmodels.robust_kurtosis(numpy_array) Return : Return four value of kurtosis i.e kr1, kr2, kr3 and kr4. Example #1 : In this example we can see t
    1 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences