Summary: in this tutorial, you will learn how to use the MySQL PERCENT_RANK()
function to calculate the percentile ranking of a row within a partition or result set.
The PERCENT_RANK()
is a window function that calculates the percentile rank of a row within a partition or result set.
The following shows the syntax of the PERCENT_RANK()
function:
PERCENT_RANK() OVER (
PARTITION BY partition_expression
ORDER BY
sort_expression [ASC | DESC]
)
Code language: SQL (Structured Query Language) (sql)
The PERCENT_RANK()
function returns a number that ranges from zero to one.
For a specified row, PERCENT_RANK()
calculates the rank of that row minus one, divided by 1 less than the number of rows in the evaluated partition or query result set:
(rank - 1) / (total_rows - 1)
Code language: SQL (Structured Query Language) (sql)
In this formula, rank
is the rank of a specified row and total_rows
is the number of rows being evaluated.
The PERCENT_RANK()
function always returns zero for the first row in a partition or result set. The repeated column values will receive the same PERCENT_RANK()
value.
Similar to other window functions, the PARTITION BY
clause distributes the rows into partitions and the ORDER BY
clause specifies the logical order of rows in each partition. The PERCENT_RANK()
function is calculated for each ordered partition independently.
Both PARTITION BY
and ORDER BY
clauses are optional. However, the PERCENT_RANK()
is an order-sensitive function, therefore, you should always use the ORDER BY
clause.
MySQL PERCENT_RANK() function examples
Let’s create a new table named productLineSales
based on the orders
, orderDetails
, and products
tables from the sample database:
CREATE TABLE productLineSales
SELECT
productLine,
YEAR(orderDate) orderYear,
quantityOrdered * priceEach orderValue
FROM
orderDetails
INNER JOIN
orders USING (orderNumber)
INNER JOIN
products USING (productCode)
GROUP BY
productLine ,
YEAR(orderDate),
orderValue;
Code language: SQL (Structured Query Language) (sql)
The productLineSales
table stores the summary of the sales data including product line, order year, and order value.
1) Using MySQL PERCENT_RANK() over the query result set
The following query finds the percentile rank of every product line by order values:
WITH t AS (
SELECT
productLine,
SUM(orderValue) orderValue
FROM
productLineSales
GROUP BY
productLine
)
SELECT
productLine,
orderValue,
ROUND(
PERCENT_RANK() OVER (
ORDER BY orderValue
)
,2) percentile_rank
FROM
t;
Code language: SQL (Structured Query Language) (sql)
In this example:
- First, we used a common table expression to summarize the order values by product lines.
- Second, we used the
PERCENT_RANK()
to calculate the percentile rank of the order value of each product. In addition, we used theROUND()
function to round the values to 2 decimals for a better representation.
Here is the output:
+------------------+------------+-----------------+
| productLine | orderValue | percentile_rank |
+------------------+------------+-----------------+
| Trains | 188532.92 | 0 |
| Ships | 661622.34 | 0.17 |
| Planes | 944295.37 | 0.33 |
| Trucks and Buses | 1014787.52 | 0.5 |
| Motorcycles | 1114192.13 | 0.67 |
| Vintage Cars | 1782234.23 | 0.83 |
| Classic Cars | 3782065.50 | 1 |
+------------------+------------+-----------------+
7 rows in set (0.00 sec)
Code language: JavaScript (javascript)
Here are some analyses from the output:
- The order values of
Trains
were not better than any other product lines, which was represented with a zero. -
Vintage Cars
performed better than 50% of other products. -
Classic Cars
performed better than any other product line so its percent rank is 1 or 100%
2) Using MySQL PERCENT_RANK() over the partition
The following statement returns the percentile ranking of product lines by order values in each year:
SELECT
productLine,
orderYear,
orderValue,
ROUND(
PERCENT_RANK()
OVER (
PARTITION BY orderYear
ORDER BY orderValue
),2) percentile_rank
FROM
productLineSales;
Code language: SQL (Structured Query Language) (sql)
Here is the output:
In this example, we divided the order values of the product lines by order year. The PERCENT_RANK()
then applied to each partition.
For example, in 2003 Vintage Cars
performed better than 50% of other product lines while in 2004 Ships performed better than 50% of other products.
Summary
- Use the MySQL
PERCENT_RANK()
function to calculate the percentile rank of a row within a partition or result set.