Practical Data Analysis with SQL
SQL databases have been around since the 1970s. Some of the smartest people in the world worked on making it easy to slice, dice, fetch and manipulate data quickly and efficiently. SQL databases have come such a long way that many developers and data scientists have lost track of what they can do with plain SQL.
Understanding advanced concepts and techniques for analyzing data using SQL is key to unlocking the benefits of relational databases for data analysis.
Producing descriptive statistics, breaking a large query into multiple steps, cleaning and preparing data for analysis, produce aggregate results and subtotals, analyze a time series using cumulative and window frames, how to handle and fill missing data, and how to produce histograms are all tasks that can easily be done directly in the database.
In an attempt to educate my team and readers I created the interactive class Practical Data Analysis with SQL on Educative. In this class I cover some of the following topics.
Basic SQL for Data Analysis
The basic anatomy of a query and how to identify simple issues in SQL. How to use common table expressions to break large queries into smaller, more readable chunks, how to generate data for testing and benchmarking, and how to reliably and correctly produce random data and sample a large dataset.
The class includes an interactive exercise of a very common task in data analysis: splitting a large dataset for training and testing purposes.
Check out Basic SQL for Data Analysis
Descriptive Statistics
When we get a fresh data set, the first thing we usually want to do is get familiar with it. Some call this exploratory data analysis (EDA).
SQL provides functions to produce descriptive statistics such as min, max, average, variance and standard deviation. But, SQL also includes other lesser known functions to find the most common value in a series, percentiles and so on.
In this lesson you will learn how to produce descriptive statistics in SQL for different types of data series.
Check out Describing a Series using SQL
Grouping and Subtotals
When working with big data, it is useful to aggregate results at different levels. For example, counting the number of students per class, finding the maximum price for each category, and so on.
In this lesson, you’ll learn how to use SQL to group results, compute aggregates, what are conditional aggregates and how to produce subtotals at multiple levels. In addition to all that, you will also learn some advances features in SQL to make you more productive and make your life easier.
Check out grouping and totals
Running and Cumulative Aggregation
If you are working with SQL long enough you probably use GROUP BY
to calculate aggregates such as min, max, sum, and count at different levels. However, there is another way to calculate aggregates in SQL without using GROUP BY
.
Using aggregate expressions, we can calculate an aggregate over a group of rows without reducing the number of rows. This unlocks many different types of analysis such as running and cumulative aggregations, ranking, operations on window frames and so on.
Check out Running and Cumulative Aggregation.
Interpolation
Before we start analyzing data, we usually need to clean it first. Data cleaning is an important part of this process, and handling missing values is a large part of that. There are several ways to handle missing data, for example, replacing missing data with a constant, forward and backward filling missing data with adjacent data points or using interpolation.
Check out Interpolation.
Binning
Binning or bucketing, is a technique to divide a series of values into ranges for analysis or for visualization. For each range, we calculate the frequency — the number of values from the series that fit in this range. The ranges are often referred to as bins, buckets, or groups.
There are usually two options to choose from, either equal-height binning or equal-width binning. In this lesson you’ll learn how to produce both and decide when each is appropriate.
Check out Binning.
Conclusion
SQL is more than just syntax. SQL is about being able to access data and gain insight from it. In this course you will learn and practice different ways to analyze data and turn it into actionable information.
Check out Practical Data Analysis with SQL on Educative.