How LAG works in SQL? Best LAG examples

How LAG works in SQL? Best LAG examples

When working with SQL, there are times when you need to access data from a previous row in a result set. This is exactly what the LAG function is designed for. It’s a powerful window function that allows you to retrieve values from a preceding row without the need for self-joins or complicated subqueries. In this article, I’ll break down how LAG works, show some practical examples, and explain why it’s such a valuable tool.

Understanding the LAG Function

The LAG function is a window function introduced in SQL:2008 that helps you access data from the previous row relative to the current one. The basic syntax is as follows:


LAG(column_name, offset, default_value) OVER (PARTITION BY partition_column ORDER BY order_column)

Here’s what each part means:

  • column_name: The column from which you want to retrieve the previous value.
  • offset: The number of rows behind the current row to look at (default is 1).
  • default_value: The value to return if there is no previous row (optional).
  • OVER (PARTITION BY … ORDER BY …): Defines how the data is grouped and sorted.

Basic Example of LAG

Let’s take a simple table tracking daily sales of a store:


CREATE TABLE sales (
    sales_date DATE,
    product VARCHAR(50),
    revenue DECIMAL(10,2)
);

INSERT INTO sales VALUES
('2024-06-01', 'Laptop', 1000),
('2024-06-02', 'Laptop', 1200),
('2024-06-03', 'Laptop', 800),
('2024-06-04', 'Laptop', 1500);

Now, if I want to compare each day’s revenue with the previous day, I can use LAG:


SELECT 
    sales_date, 
    product, 
    revenue, 
    LAG(revenue, 1, 0) OVER (PARTITION BY product ORDER BY sales_date) AS previous_day_revenue
FROM sales;

The result will look like this:

sales_date product revenue previous_day_revenue
2024-06-01 Laptop 1000 0
2024-06-02 Laptop 1200 1000
2024-06-03 Laptop 800 1200
2024-06-04 Laptop 1500 800

Notice how the previous_day_revenue column shows the revenue from the previous row.

Using LAG with Partitioning

The power of LAG really comes into play when we include partitions. Let’s say we have multiple products tracked in our sales data:


INSERT INTO sales VALUES
('2024-06-01', 'Phone', 600),
('2024-06-02', 'Phone', 700),
('2024-06-03', 'Phone', 650),
('2024-06-04', 'Phone', 800);

Now, if I apply LAG with partitioning per product, each product’s history will be analyzed separately:


SELECT 
    sales_date, 
    product, 
    revenue, 
    LAG(revenue) OVER (PARTITION BY product ORDER BY sales_date) AS previous_day_revenue
FROM sales;

This ensures that the revenue comparison is done per product and not across all products.

Customizing LAG Offset

By default, LAG retrieves the value from one row behind, but you can change that. Let’s say I want to compare the revenue with two days ago:


SELECT 
    sales_date, 
    product, 
    revenue, 
    LAG(revenue, 2, 0) OVER (PARTITION BY product ORDER BY sales_date) AS revenue_two_days_ago
FROM sales;

This would return revenue from two days before, filling in 0 where there’s no data available.

Difference Between LAG and LEAD

One common question is how LAG differs from LEAD. Essentially:

  • LAG fetches data from a previous row.
  • LEAD fetches data from a future row.

If I wanted to get the revenue for the next day instead of the previous day, I would simply use:


SELECT 
    sales_date, 
    product, 
    revenue, 
    LEAD(revenue) OVER (PARTITION BY product ORDER BY sales_date) AS next_day_revenue
FROM sales;

Real-World Use Cases for LAG

Understanding past data points can be crucial in multiple scenarios. Here are some real-world applications:

  1. Sales trend analysis – Compare current and past performance per product or region.
  2. Stock market analysis – Track price changes compared to previous trading days.
  3. Tracking user behavior – Identify when a user last logged in or performed an action.
  4. Financial reporting – Calculate deltas and trends in financial statements.

Final Thoughts

The LAG function is an incredibly useful tool when dealing with sequential data, especially when you need historical comparisons. Whether it’s for financial analysis, sales tracking, or behavioral data, LAG simplifies queries that would otherwise involve complex self-joins. If you haven’t used LAG before, I highly recommend experimenting with it in your datasets to see how much easier it makes historical data analysis.

 

Other interesting article:

How LEAD works in SQL? Best LEAD examples