How ROW_NUMBER works in SQL? Best ROW_NUMBER examples

How ROW_NUMBER works in SQL? Best ROW_NUMBER examples

When working with SQL, you’ll often need to assign a unique sequential integer to rows in a result set. That’s where the ROW_NUMBER() function comes into play. It’s a powerful window function that allows you to rank rows based on a specific order. Let’s dive into how ROW_NUMBER() works in SQL and explore the best use cases for it.

What Is the ROW_NUMBER() Function?

The ROW_NUMBER() function is a window function that assigns a unique, sequential number to each row within a specified partition of a result set. It resets the numbering each time a new partition is encountered. If no partitioning is specified, the numbers are assigned for the entire result set.

Understanding the Syntax

Here’s the general syntax of ROW_NUMBER() in SQL:

ROW_NUMBER() OVER ([PARTITION BY partition_expression] ORDER BY order_expression)

Breaking it down:

  • PARTITION BY partition_expression – Determines how the rows should be partitioned before numbering. If omitted, the entire result set is considered one partition.
  • ORDER BY order_expression – Specifies the column(s) that define the order in which row numbers should be assigned.

Basic Usage of ROW_NUMBER()

Let’s look at a simple example using ROW_NUMBER() in a basic query. Suppose we have a table called employees:

employee_id name department salary
1 Alice HR 5000
2 Bob HR 4800
3 Charlie IT 6000
4 David IT 6200

If I want to assign row numbers to employees ordering them by salary in descending order, I’d write:

SELECT 
    employee_id, 
    name, 
    department, 
    salary, 
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;

This assigns a unique row number to each employee based on their salary ranking from highest to lowest.

Using PARTITION BY with ROW_NUMBER()

What if I want to generate row numbers separately for each department? This is where PARTITION BY comes in:

SELECT 
    employee_id, 
    name, 
    department, 
    salary, 
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

Now, employees are ranked within each department based on their salaries.

Practical Use Cases for ROW_NUMBER()

The ROW_NUMBER() function is useful in multiple scenarios. Here are a few:

  1. Removing Duplicates: Find duplicate records and keep only one entry.
  2. Pagination: Select a specific set of rows for page-based queries.
  3. Top N Per Group: Retrieve the highest-earning employee in each department.

Example: Deleting Duplicate Records

To remove duplicate entries while keeping only one, I can use a common table expression (CTE) with ROW_NUMBER():

WITH CTE AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY employee_id) AS row_num
    FROM employees
)
DELETE FROM employees WHERE employee_id IN (SELECT employee_id FROM CTE WHERE row_num > 1);

Example: Paginating Results

Implementing pagination with ROW_NUMBER() is straightforward. For example, if I want to retrieve rows 11-20 of a large query result:

WITH CTE AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) AS row_num
    FROM employees
)
SELECT * FROM CTE WHERE row_num BETWEEN 11 AND 20;

Final Thoughts

The ROW_NUMBER() function is an invaluable tool in SQL when working with ranking, partitioning data, and pagination. By understanding its syntax and use cases, you can optimize queries for better performance and data organization.

 

Other interesting article:

How DENSE_RANK works in SQL? Best DENSE_RANK examples