
When working with SQL, there are times when you need to access data from a previous row in a result set. This is exactly what the LAG
function is designed for. It’s a powerful window function that allows you to retrieve values from a preceding row without the need for self-joins or complicated subqueries. In this article, I’ll break down how LAG
works, show some practical examples, and explain why it’s such a valuable tool.
Understanding the LAG Function
The LAG
function is a window function introduced in SQL:2008 that helps you access data from the previous row relative to the current one. The basic syntax is as follows:
LAG(column_name, offset, default_value) OVER (PARTITION BY partition_column ORDER BY order_column)
Here’s what each part means:
- column_name: The column from which you want to retrieve the previous value.
- offset: The number of rows behind the current row to look at (default is 1).
- default_value: The value to return if there is no previous row (optional).
- OVER (PARTITION BY … ORDER BY …): Defines how the data is grouped and sorted.
Basic Example of LAG
Let’s take a simple table tracking daily sales of a store:
CREATE TABLE sales (
sales_date DATE,
product VARCHAR(50),
revenue DECIMAL(10,2)
);
INSERT INTO sales VALUES
('2024-06-01', 'Laptop', 1000),
('2024-06-02', 'Laptop', 1200),
('2024-06-03', 'Laptop', 800),
('2024-06-04', 'Laptop', 1500);
Now, if I want to compare each day’s revenue with the previous day, I can use LAG
:
SELECT
sales_date,
product,
revenue,
LAG(revenue, 1, 0) OVER (PARTITION BY product ORDER BY sales_date) AS previous_day_revenue
FROM sales;
The result will look like this:
sales_date | product | revenue | previous_day_revenue |
---|---|---|---|
2024-06-01 | Laptop | 1000 | 0 |
2024-06-02 | Laptop | 1200 | 1000 |
2024-06-03 | Laptop | 800 | 1200 |
2024-06-04 | Laptop | 1500 | 800 |
Notice how the previous_day_revenue
column shows the revenue from the previous row.
Using LAG with Partitioning
The power of LAG
really comes into play when we include partitions. Let’s say we have multiple products tracked in our sales data:
INSERT INTO sales VALUES
('2024-06-01', 'Phone', 600),
('2024-06-02', 'Phone', 700),
('2024-06-03', 'Phone', 650),
('2024-06-04', 'Phone', 800);
Now, if I apply LAG
with partitioning per product, each product’s history will be analyzed separately:
SELECT
sales_date,
product,
revenue,
LAG(revenue) OVER (PARTITION BY product ORDER BY sales_date) AS previous_day_revenue
FROM sales;
This ensures that the revenue comparison is done per product and not across all products.
Customizing LAG Offset
By default, LAG
retrieves the value from one row behind, but you can change that. Let’s say I want to compare the revenue with two days ago:
SELECT
sales_date,
product,
revenue,
LAG(revenue, 2, 0) OVER (PARTITION BY product ORDER BY sales_date) AS revenue_two_days_ago
FROM sales;
This would return revenue from two days before, filling in 0
where there’s no data available.
Difference Between LAG and LEAD
One common question is how LAG
differs from LEAD
. Essentially:
- LAG fetches data from a previous row.
- LEAD fetches data from a future row.
If I wanted to get the revenue for the next day instead of the previous day, I would simply use:
SELECT
sales_date,
product,
revenue,
LEAD(revenue) OVER (PARTITION BY product ORDER BY sales_date) AS next_day_revenue
FROM sales;
Real-World Use Cases for LAG
Understanding past data points can be crucial in multiple scenarios. Here are some real-world applications:
- Sales trend analysis – Compare current and past performance per product or region.
- Stock market analysis – Track price changes compared to previous trading days.
- Tracking user behavior – Identify when a user last logged in or performed an action.
- Financial reporting – Calculate deltas and trends in financial statements.
Final Thoughts
The LAG
function is an incredibly useful tool when dealing with sequential data, especially when you need historical comparisons. Whether it’s for financial analysis, sales tracking, or behavioral data, LAG
simplifies queries that would otherwise involve complex self-joins. If you haven’t used LAG
before, I highly recommend experimenting with it in your datasets to see how much easier it makes historical data analysis.
Other interesting article:
How LEAD works in SQL? Best LEAD examples