How FIRST_VALUE works in SQL? Best FIRST_VALUE examples

When working with SQL analytics, one of the most useful window functions is FIRST_VALUE. It allows us to grab the first value of a specified column based on a defined window. This can be invaluable for ranking, reporting, and trend analysis. I’ll explain how FIRST_VALUE works, provide examples, and show different ways to use it efficiently.

Understanding FIRST_VALUE in SQL

The FIRST_VALUE function is part of the window functions available in SQL. It retrieves the first value of a specified column within a given partition of the dataset. Unlike aggregate functions like MIN or MAX, FIRST_VALUE does not collapse the result set but instead works on partitions of data while maintaining row-level granularity.

Basic Syntax of FIRST_VALUE

Here’s the basic syntax of the FIRST_VALUE function:

FIRST_VALUE(column_name) OVER (
    PARTITION BY partition_column 
    ORDER BY order_column
)

Let’s break it down:

FIRST_VALUE(column_name) – Specifies the column from which the first value is retrieved.
PARTITION BY partition_column – (Optional) Divides the dataset into partitions.
ORDER BY order_column – Determines the order of the rows within each partition.

Example 1: Simple FIRST_VALUE Use Case

Consider a table orders storing customer transactions:

order_id	customer_id	order_date	amount
1	101	2023-01-10	150
2	101	2023-02-15	200
3	102	2023-01-20	300

Using FIRST_VALUE to find each customer’s first transaction amount:

SELECT 
    order_id, 
    customer_id, 
    order_date, 
    amount, 
    FIRST_VALUE(amount) OVER (
        PARTITION BY customer_id ORDER BY order_date
    ) AS first_order_amount
FROM orders;

Output:

order_id	customer_id	order_date	amount	first_order_amount
1	101	2023-01-10	150	150
2	101	2023-02-15	200	150
3	102	2023-01-20	300	300

In this example, we partitioned the data by customer_id and sorted each partition by order_date. This means each row reflects the first order amount per customer.

Example 2: Handling NULL Values

By default, FIRST_VALUE considers NULLs. If the first value in order is NULL, that NULL will be returned. We can use COALESCE to handle it:

SELECT 
    order_id, 
    customer_id, 
    amount, 
    COALESCE(FIRST_VALUE(amount) OVER (
        PARTITION BY customer_id ORDER BY order_date
    ), 0) AS first_non_null_amount
FROM orders;

This replaces any NULL values with 0.

Example 3: Using Different ORDER BY Clauses

Sometimes, you may need to determine the first value based on a different column, such as the highest amount:

SELECT 
    order_id, 
    customer_id, 
    amount, 
    FIRST_VALUE(amount) OVER (
        PARTITION BY customer_id ORDER BY amount DESC
    ) AS first_max_amount
FROM orders;

Here, we retrieve the highest amount per customer instead of the first chronological transaction.

When to Use FIRST_VALUE?

The FIRST_VALUE function is incredibly useful in various scenarios:

Finding First Transactions: Identifying the first purchase for each customer.
Getting First Interaction Details: Analyzing user engagement based on the first login or interaction.
Evaluating Performance Trends: Comparing initial vs. later data points (e.g., stock prices, sales metrics).

Performance Considerations

While FIRST_VALUE is powerful, it may have performance implications with large datasets:

Indexing: Ensure the relevant columns used in PARTITION BY and ORDER BY are indexed for better efficiency.
Window Size: Large partitions can slow down performance due to increased memory usage.
Alternative Approaches: Consider using JOIN with subqueries when optimizations are necessary.

Conclusion

The FIRST_VALUE function in SQL is an excellent tool for retrieving the first value within a partitioned set. By setting clear ORDER BY rules, we can extract meaningful insights from datasets without losing row-level detail. Understanding how to use FIRST_VALUE effectively can greatly enhance reporting and analytics in SQL.

Other interesting article:

How LAG works in SQL? Best LAG examples