Creating a Histogram in SQL – A Practical Approach with Subqueries, CTEs, and CASE WHEN

7 December 2024

Histogram in SQL – Subqueries CTEs and CASE WHEN

Histograms are one of the most popular tools for analyzing data distribution. While we often associate them with visual charts, their logic and analytical value can also be replicated in SQL. In this article, I’ll show you step by step how to create a simple histogram in SQL using subqueries, Common Table Expressions (CTEs), and CASE WHEN.

1. What is a Histogram and How to Use It in SQL?

A histogram is a way of representing the distribution of data, allowing us to analyze the frequency of specific values in a dataset. In SQL, we can retrieve this information by properly aggregating data. For example, we can understand the structure of customer orders: do most customers place only one order, or do we have many loyal customers with frequent transactions?

2. First Steps – Grouping Data

To start the analysis, we can calculate how many orders each customer has placed. This is done using GROUP BY and the COUNT function:

SELECT 
    customer_id,
    COUNT(DISTINCT order_id) AS order_cnt
FROM 
    orders
GROUP BY 
    customer_id;

The result is a table showing the number of orders placed by each customer.

3. Subqueries – Counting Customers in Ranges

The next step is to count how many customers placed a specific number of orders. To achieve this, we use a subquery:

SELECT 
    order_cnt,
    COUNT(DISTINCT customer_id) AS customer_cnt
FROM (
    SELECT 
        customer_id,
        COUNT(DISTINCT order_id) AS order_cnt
    FROM 
        orders
    GROUP BY 
        customer_id
) s
GROUP BY 
    order_cnt;

At this stage, we have a table showing how many customers placed each specific number of orders. This is our raw histogram, but it still needs simplification.

4. Creating Buckets with CASE WHEN

For large datasets, the number of unique order counts might be too high, making the results hard to read. To address this, we group customers into ranges (buckets) using the CASE WHEN construct. This allows us to create ranges like 1 order, 2-5 orders, 6-10 orders, etc.

Here’s the SQL code:

WITH order_counts AS (
    SELECT 
        customer_id,
        COUNT(DISTINCT order_id) AS order_cnt
    FROM 
        orders
    GROUP BY 
        customer_id
),
customer_counts AS (
    SELECT 
        order_cnt,
        COUNT(DISTINCT customer_id) AS customer_cnt
    FROM 
        order_counts
    GROUP BY 
        order_cnt
)
SELECT 
    CASE 
        WHEN order_cnt = 1 THEN '1'
        WHEN order_cnt < 6 THEN '2 - 5'
        WHEN order_cnt < 11 THEN '6 - 10'
        ELSE '>10'
    END AS customer_group,
    SUM(customer_cnt) AS customer_group_cnt
FROM 
    customer_counts
GROUP BY 
    customer_group;

5. Result Interpretation

The output table might look like this:

Order RangeNumber of Customers
112
2 – 5317
6 – 10415
>1049

From this table, we can see that most customers place between 2 and 10 orders, while only a small number have more than 10 orders. Such an analysis helps understand the customer structure and can be useful, for example, in loyalty segmentation.

6. Conclusion

This method is an effective way to analyze data distribution in SQL. Using subqueries, CTEs, and CASE WHEN, we can not only efficiently group data but also present results in readable categories. If you want to learn more techniques like this, check out my courses on KajoData!

Prefer to read in Polish? No problem!

Other interesting articles:

That’s all on this topic. Analyze in peace!

Did you like this article 🙂?
Share it on Social Media 📱
>>> You can share it on LinkedIn and show that you learn something new every day.
>>> You can throw it on Facebook – and perhaps help a friend of yours who is looking for this.
>>> And remember to bookmark this page, you never know if it won’t come handy in in the future.

You prefer to watch 📺 – no problem
>>> Subscribe and watch my English channel on YouTube.

Ja Ci ją z przyjemnością wyślę. Za darmo. Bez spamu.

Poradnik Początkującego Analityka

Video - jak szukać pracy w IT

Regularne dawki darmowej wiedzy, bez spamu.