
When working with SQL, sometimes you need to split your dataset into equal groups, whether for ranking, percentiles, or statistical analysis. This is where the NTILE
function comes into play. It’s a powerful window function that divides rows into a specified number of groups and assigns a unique bucket number to each row. In this article, I’ll walk you through how NTILE
works in SQL, practical use cases, and some of the best examples to clarify its application.
Understanding NTILE in SQL
The NTILE
function is used with the OVER()
clause, which defines how the dataset should be partitioned and ordered before the grouping process. The syntax is as follows:
NTILE(n) OVER (PARTITION BY column_name ORDER BY column_name)
- n – Specifies the number of groups you want to divide the total rows into.
- PARTITION BY – (Optional) Divides the result set into partitions before applying
NTILE
. - ORDER BY – Determines the order in which the rows are assigned their bucket numbers.
How NTILE Works Internally
NTILE evenly assigns rows across defined groups. The way it distributes rows depends on the total number of rows and the specified number of tiles. Here’s a step-by-step breakdown:
- SQL determines the total number of rows in the dataset.
- It divides that total by the specified number of groups (
n
). - If the division is even, each group has the same number of rows.
- If not, the first few groups will have one extra row to account for any remainder.
Let’s look at a practical example.
SQL NTILE Example
Consider a table Employees
with the following data:
EmployeeID | Name | Salary |
---|---|---|
1 | Alice | 70000 |
2 | Bob | 85000 |
3 | Charlie | 60000 |
4 | David | 90000 |
5 | Eve | 75000 |
6 | Frank | 95000 |
7 | Grace | 62000 |
8 | Hank | 88000 |
Now, let’s apply NTILE(4)
to divide these employees into 4 salary-based groups.
SELECT
EmployeeID,
Name,
Salary,
NTILE(4) OVER (ORDER BY Salary DESC) AS SalaryGroup
FROM Employees;
The result would look something like this:
EmployeeID | Name | Salary | SalaryGroup |
---|---|---|---|
6 | Frank | 95000 | 1 |
4 | David | 90000 | 1 |
8 | Hank | 88000 | 2 |
2 | Bob | 85000 | 2 |
5 | Eve | 75000 | 3 |
1 | Alice | 70000 | 3 |
7 | Grace | 62000 | 4 |
3 | Charlie | 60000 | 4 |
Practical Use Cases of NTILE
Now that we’ve seen how NTILE works, let’s explore its practical applications:
- Generating quartiles for salary analysis – Divide employees into 4 equal salary bands.
- Creating percentiles – Group rows into 10 equal parts to generate deciles.
- Distributing workload evenly – Assign an equal number of employees to teams.
- Performance categorization – Rank students, employees, or customers into performance bands.
NTILE vs. Other Ranking Functions
There are other ranking functions in SQL, and it’s essential to understand their differences:
Function | Description |
---|---|
RANK() |
Assigns a unique rank to values, but skips numbers if ties exist. |
DENSE_RANK() |
Assigns ranks similar to RANK() but without skipping numbers. |
ROW_NUMBER() |
Provides a unique row number to each row based on ordering. |
The biggest difference here is that NTILE
strictly splits data into evenly distributed groups, while the others provide ranking behavior based on specific values.
Conclusion
The NTILE
function is an invaluable tool when you need to divide rows into equal groups dynamically. Whether for statistical grouping, defining percentiles, or workload distribution, it simplifies complex segmentation tasks. By understanding how NTILE
works in SQL and leveraging its power correctly, you can enhance your data analysis and reporting significantly.
Other interesting article:
How ROW_NUMBER works in SQL? Best ROW_NUMBER examples