Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Multiple Linear Regression using R
Next article icon

ML | Rainfall prediction using Linear regression

Last Updated : 05 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Predicting rainfall is a vital aspect of weather forecasting, agriculture planning and water resource management. In this article we will use Linear regression algorithm that help establish relationship between two variables: one dependent (rainfall) and one or more independent variables (temperature, humidity). It tells us how many inches of rainfall we can expect.

Step 1: Importing the required libraries

Here we will use pandas, numpy, matplotlib and scikit learn.

Python
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score import matplotlib.pyplot as plt 

Step 2: Data Collection and Loading

Gather historical weather data, including rainfall, temperature, humidity and other relevant factors. Reliable data ensures better model accuracy and load it. You can download dataset from h: Dataset.

Python
data = pd.read_csv("Austin-2019-01-01-to-2023-07-22.csv") 

Step 3: Data Preprocessing

Clean and preprocess the data by handling missing values, removing outliers and scaling variables. Split the dataset into training and testing sets. Preprocessing ensures the model isn’t biased or skewed due to incomplete or inconsistent data leading to reliable predictions.

  • data.dropna(): This function is used to remove rows containing missing (NaN) values in the specified columns (features and target). It’s important to handle missing data to avoid errors in model training.
Python
features = ['tempmax', 'tempmin', 'humidity', 'dew'] target = 'precip' data = data.dropna(subset=features + [target]) 

Step 4: Feature Selection

Identify which weather variables i.e features are most correlated with rainfall. For example humidity might have a stronger correlation than temperature. Selecting relevant features improves model performance and reduces computational complexity by focusing on important variables.

Python
X = data[features] y = data[target] 

Step 5: Model Training

Use the training dataset to fit a linear regression model. Model learns the relationship between the independent variables (humidity, temperature) and rainfall.

  • train_test_split(): This function splits the dataset into training and testing sets.
  • test_size=0.2 indicates that 20% of the data will be used for testing and the remaining 80% will be used for training.
  • random_state=42 ensures that the split is reproducible.
  • model.fit(): Trains the linear regression model on the training data. The model learns the relationship between the features and the target variable.
Python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  model = LinearRegression() model.fit(X_train, y_train) 

Output:

Screenshot-2025-01-21-131516

LInear Regression model

Step 6: Model Evaluation

Test the model using the testing dataset and evaluate its performance using metrics like Mean Squared Error (MSE) or R-squared.

  • model.predict(): Uses the trained model to predict the target variable (y_pred) for the test data (X_test). The predicted values of rainfall are stored in y_pred.
Python
y_pred = model.predict(X_test) 

Step 7: Prediction and Visualziing Results

Input new data into the trained model to predict rainfall. For instance, given a specific temperature and humidity, the model forecasts rainfall levels. Prediction is the ultimate goal, enabling actionable insights, such as preparing for heavy rainfall or managing agricultural schedules.

  • mean_squared_error(): Calculates the Mean Squared Error (MSE), which measures the average squared differences between actual and predicted values. A lower MSE indicates better model performance.
  • np.sqrt(): Computes the Root Mean Squared Error (RMSE), which is the square root of MSE. It gives an error metric in the same unit as the target variable (rainfall).
  • r2_score(): Calculates the R-squared value, which indicates how well the model explains the variance in the data. Value ranges from 0 to 1, with higher values indicating a better fit.
Python
mse = mean_squared_error(y_test, y_pred) rmse = np.sqrt(mse) r2 = r2_score(y_test, y_pred)  print(f"Mean Squared Error: {mse}") print(f"Root Mean Squared Error: {rmse}") print(f"R-squared: {r2}")  plt.figure(figsize=(10, 6)) plt.scatter(y_test, y_pred, alpha=0.6) plt.plot([y.min(), y.max()], [y.min(), y.max()], color='red', linestyle='--') plt.title('Actual vs Predicted Rainfall') plt.xlabel('Actual Rainfall') plt.ylabel('Predicted Rainfall') plt.grid() plt.show()  residuals = y_test - y_pred plt.figure(figsize=(10, 6)) plt.scatter(y_pred, residuals, alpha=0.6) plt.axhline(y=0, color='red', linestyle='--') plt.title('Residual Plot') plt.xlabel('Predicted Rainfall') plt.ylabel('Residuals') plt.grid() plt.show() 

Output :

Mean Squared Error: 0.04974770851826499 Root Mean Squared Error: 0.22304194340586478 R-squared: 0.1661984442789477 

Screenshot-2025-01-21-130943

Actual vs Predicted Rainfall

Screenshot-2025-01-21-131103

Residual Plot

In this project, we used linear regression to predict rainfall based on weather-related features like temperature, humidity and dew point. The model showed reasonable performance with a Root Mean Squared Error (RMSE) of 0.22 and an R-squared value of 0.17 approx. , indicating some predictive capability but room for improvement. Visualizations like the Actual vs Predicted Rainfall plot and the Residual Plot helped analyze model accuracy and identify areas where predictions deviated from actual values.

This analysis demonstrates the potential of linear regression for basic rainfall prediction while highlighting the need for more complex models or additional features to enhance accuracy.

You can download the source code from here.



Next Article
Multiple Linear Regression using R

A

Adith Bharadwaj
Improve
Article Tags :
  • AI-ML-DS
  • Machine Learning
  • AI-ML-DS With Python
  • Machine Learning Projects
Practice Tags :
  • Machine Learning

Similar Reads

  • Linear Regression for Single Prediction
    Linear regression is a statistical method and machine learning foundation used to model relationship between a dependent variable and one or more independent variables. The primary goal is to predict the value of the dependent variable based on the values of the independent variables. Predicting a S
    6 min read
  • Multiple Linear Regression using R
    Prerequisite: Simple Linear-Regression using RLinear Regression: It is the basic and commonly used type for predictive analysis. It is a statistical approach for modeling the relationship between a dependent variable and a given set of independent variables.These are of two types: Simple linear Regr
    3 min read
  • ML | Multiple Linear Regression using Python
    Linear regression is a fundamental statistical method widely used for predictive analysis. It models the relationship between a dependent variable and a single independent variable by fitting a linear equation to the data. Multiple Linear Regression is an extension of this concept that allows us to
    4 min read
  • Linear Regression using PyTorch
    Linear Regression is a very commonly used statistical method that allows us to determine and study the relationship between two continuous variables. The various properties of linear regression and its Python implementation have been covered in this article previously. Now, we shall find out how to
    4 min read
  • Box Office Revenue Prediction Using Linear Regression in ML
    When a movie is produced then the director would certainly like to maximize his/her movie's revenue. But can we predict what will be the revenue of a movie by using its genre or budget information? This is exactly what we'll learn in this article, we will learn how to implement a machine learning al
    6 min read
  • Multiple Linear Regression using R to predict housing prices
    Predicting housing prices is a common task in the field of data science and statistics. Multiple Linear Regression is a valuable tool for this purpose as it allows you to model the relationship between multiple independent variables and a dependent variable, such as housing prices. In this article,
    12 min read
  • Placement prediction using Logistic Regression
    Prerequisites: Understanding Logistic Regression, Logistic Regression using Python In this article, we are going to discuss how to predict the placement status of a student based on various student attributes using Logistic regression algorithm. Placements hold great importance for students and educ
    4 min read
  • Rainfall Prediction using Machine Learning - Python
    Today there are no certain methods by using which we can predict whether there will be rainfall today or not. Even the meteorological department's prediction fails sometimes. In this article, we will learn how to build a machine-learning model which can predict whether there will be rainfall today o
    7 min read
  • Python | Linear Regression using sklearn
    Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting. Different regression models
    3 min read
  • Solving Linear Regression in Python
    Linear regression is a widely used statistical method to find the relationship between dependent variable and one or more independent variables. It is used to make predictions by finding a line that best fits the data we have. The most common approach to best fit a linear regression model is least-s
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences