SQL for Data Analysis Cheat Sheet
Last Updated : 24 Feb, 2025
SQL (Structured Query Language) is essential for data analysis as it enables efficient data retrieval, manipulation, and transformation. It allows analysts to filter, sort, group, and aggregate large datasets, making data-driven decision-making easier. SQL integrates seamlessly with business intelligence tools like Tableau, Power BI, and Python libraries, enhancing data visualization and reporting.
Why SQL is Important for Data Analysis
SQL is widely used in data analysis because it provides a powerful and efficient way to interact with structured data. Here are some reasons why SQL is essential for data analysts:
- Efficient Data Retrieval – SQL allows users to fetch specific data from large datasets quickly using simple queries.
- Data Manipulation – SQL supports various operations such as filtering, sorting, grouping, and aggregating data to prepare datasets for analysis.
- Scalability – SQL can handle vast amounts of data efficiently, making it suitable for enterprise-level analytics.
- Integration with BI Tools – SQL works seamlessly with business intelligence tools like Tableau, Power BI, and Python libraries (Pandas, SQLAlchemy) for advanced analysis and visualization.
- Data Cleaning and Transformation – SQL functions help clean and normalize data, removing duplicates and handling missing values.
- Decision-Making Support – Businesses use SQL queries to extract insights and drive data-driven decisions.
- Automation – SQL scripts can be scheduled and automated for regular reporting and data processing.
Data Retrieval
1. SELECT Statement: Used to fetch data from a database.
SELECT column1, column2
FROM table_name;
2. DISTINCT Keyword: Eliminates duplicate records.
SELECT DISTINCT column1
FROM table_name;
3. WHERE Clause: Filters records based on specified conditions.
SELECT column1
FROM table_name
WHERE condition;
2. Sorting and Limiting Results
ORDER BY Clause: Sorts the result set.
SELECT column1
FROM table_name
ORDER BY column1 [ASC|DESC];
LIMIT Clause: Limits the number of returned records.
SELECT column1
FROM table_name
LIMIT number;
3. Aggregate Functions
COUNT(): Returns the number of rows.
SELECT COUNT(*)
FROM table_name;
SUM(): Calculates the total sum of a numeric column.
SELECT SUM(column1)
FROM table_name;
AVG(): Calculates the average value.
SELECT AVG(column1)
FROM table_name;
MIN() and MAX(): Retrieve the minimum and maximum values.
SELECT MIN(column1), MAX(column1)
FROM table_name;
4. Grouping Data
GROUP BY: Group rows that have the same values in specified columns into summary rows.
SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1;
HAVING: Filter groups based on aggregate functions.
SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1
HAVING COUNT(*) > 1;
5. Joining Tables
INNER JOIN: Select records with matching values in both tables.
SELECT a.column1, b.column2
FROM table1 a INNER JOIN table2 b
ON a.common_field = b.common_field;
LEFT JOIN: Include all records from the left table and matched records from the right table; fill with NULLs if no match.
SELECT a.column1, b.column2
FROM table1 aLEFT JOIN table2 b ON a.common_field = b.common_field;
RIGHT JOIN: Include all records from the right table and matched records from the left table; fill with NULLs if no match.
SELECT a.column1, b.column2
FROM table1 aRIGHT JOIN table2 b
ON a.common_field = b.common_field;
FULL OUTER JOIN: Return all records when there is a match in either left or right table.
SELECT a.column1, b.column2FROM table1 aFULL OUTER JOIN table2 b ON a.common_field = b.common_field;
5. Subqueries
Subquery in WHERE Clause: Use a subquery to filter results.
SELECT column1
FROM table_name
WHERE column2 IN (SELECT column2 FROM another_table WHERE condition);
Subquery in FROM Clause: Use a subquery as a temporary table.
SELECT a.column1, b.column2
FROM (SELECT column1 FROM table_name WHERE condition)
a JOIN another_table b ON a.common_field = b.common_field;
6. Set Operations
UNION: Combine the result sets of two queries and remove duplicates.
SELECT column1
FROM table1
UNION
SELECT column1
FROM table2;
UNION ALL: Combine the result sets of two queries, including duplicates.
SELECT column1
FROM table1
UNION ALL
SELECT column1
FROM table2;
INTERSECT: Return the common records from two queries.
SELECT column1 FROM table1
INTERSECTSELECT column1
FROM table2;
EXCEPT: Return records from the first query that are not in the second query.
SELECT column1 FROM table1
EXCEPT
SELECT column1 FROM table2;
7. Window Functions
ROW_NUMBER(): Assign a unique sequential integer to rows within a partition.
SELECT column1,
ROW_NUMBER() OVER (PARTITION BY column2 ORDER BY column3) as row_num
FROM table_name;
RANK(): Assign a rank to rows within a partition, with gaps for ties.
SELECT column1,
RANK() OVER (PARTITION BY column2 ORDER BY column3) AS rank_num
FROM table_name;
8. Date and Time Functions
CURRENT_DATE and CURRENT_TIME: Retrieve the current date and time.
SELECT CURRENT_DATE, CURRENT_TIME;
NOW(): Returns the current timestamp (date and time)
SELECT NOW();
EXTRACT(): Extracts a specific part of a date (e.g., year, month, day).
SELECT EXTRACT(YEAR FROM date_column) AS year_value
FROM table_name;
DATE_ADD() and DATE_SUB(): Adds or subtracts an interval from a date.
SELECT DATE_ADD(date_column, INTERVAL 7 DAY) AS next_week
FROM table_name;
SELECT DATE_SUB(date_column, INTERVAL 1 MONTH) AS previous_month
FROM table_name;
DATEDIFF(): Finds the difference between two dates.
SELECT DATEDIFF(end_date, start_date) AS days_difference
FROM table_name;
DATE_FORMAT(): Formats a date in a specific pattern.
SELECT DATE_FORMAT(date_column, '%Y-%m-%d') AS formatted_date
FROM table_name;
Conclusion
SQL plays a vital role in data analysis by providing powerful tools for querying, manipulating, and transforming structured data. Its ability to retrieve specific data efficiently, perform aggregations, and join multiple tables makes it indispensable for analysts. SQL integrates seamlessly with BI tools and programming languages, enhancing visualization and reporting. Additionally, it supports automation and scalable data processing, making it suitable for large datasets. Mastering SQL empowers professionals to extract valuable insights, streamline decision-making, and improve data management, solidifying its importance in the world of data analysis.
Similar Reads
SQL for Data Analysis
SQL (Structured Query Language) is an indispensable tool for data analysts, providing a powerful way to query and manipulate data stored in relational databases. With its ability to handle large datasets and perform complex operations, SQL has become a fundamental skill for anyone involved in data a
7 min read
Top 10 SQL Projects For Data Analysis
SQL stands for Structured Query Language and is a standard database programming language that is used in data analysis and to access data in databases. It is a popular query language that is used in all types of devices. SQL is a fundamental tool for data scientists to extract, manipulate, and analy
9 min read
How to Use SPSS for Data Analysis
Data Analysis involves the use of statistics and other techniques to interpret the data. It involves cleaning, analyzing, finding statistics and finally visualizing them in graphs or charts. Data Analytics tools are mainly used to deal with structured data. The steps involved in Data Analysis are as
5 min read
SQL Cheat Sheet ( Basic to Advanced)
Creating and managing databases in SQL involves various commands and concepts that handle the structuring, querying, and manipulation of data. In this guide, we will see a comprehensive cheat sheet for essential SQL operations, offering a practical reference for tasks ranging from database creation
15 min read
OLA Data Analysis with SQL
Have you ever thought about how ride-hailing companies manage large amounts of booking data, how to analyze customer behaviour and decide on discounts to offer? In this blog, we will do an in-depth analysis of Bengaluru ride data using a large dataset of 50,000 Ola bookings. It covers essential aspe
10 min read
SQL Exercises for Data Analyst
Structured Query Language (SQL) is an essential skill for data analysts which enables them to extract, manipulate and analyze data efficiently. Regular practice with SQL exercises helps improve query-writing skills, enhances understanding of database structures, and builds expertise in using aggrega
6 min read
Healthcare Data Analysis using SQL
Healthcare data analysis plays a vital role in enhancing patient care, improving hospital efficiency and managing financial operations. By utilizing Power BI, healthcare professionals and administrators can gain valuable insights into patient demographics, medical conditions, hospital performance, a
7 min read
How to Use SQL for Social Media Data Analysis.
Social media has enormous data possibilities for corporations, marketers, and researchers. SQL effectively extracts and alters data for analysis. Customer behavior and market trends were among the insights gained. SQL's strength resides in being capable of querying relational databases, thereby faci
6 min read
Data Analysis Examples
Data analysis stands as the cornerstone of informed decision-making in today's data-driven world, driving innovation and yielding actionable insights across industries. From healthcare and finance to retail and urban planning, the applications of data analysis are vast and transformative. In this in
7 min read
Time-Series Data Analysis Using SQL
Time-series data analysis is essential for businesses to monitor trends, forecast demand, and make strategic decisions. One effective method is calculating a 7-day moving average, which smooths out short-term fluctuations and highlights underlying patterns in sales data. This technique helps busines
5 min read