MySQL CUME_DIST() Function Explained By Examples

Summary: in this tutorial, you will learn how to use the MySQL CUME_DIST() function to calculate cumulative distribution value.

Overview of MySQL CUME_DIST() Function

The CUME_DIST() is a window function that returns the cumulative distribution of a value within a set of values. It represents the number of rows with values less than or equal to that row’s value divided by the total number of rows.

The returned value of the CUME_DIST() function is greater than zero and less than or equal to one (0 < CUME_DIST() <= 1). The repeated column values receive the same CUME_DIST() value.

The following shows the syntax of the CUME_DIST() function:

CUME_DIST() 
OVER (
   PARTITION BY expr
   ORDER BY expr [ASC | DESC]
)Code language: SQL (Structured Query Language) (sql)

In this syntax:

PARTITION BY

The PARTITION BY clause divides the result set into partitions to which the CUME_DIST() function is applied independently. If you omit the PARTITION BY clause, the function is applied to the whole result set.

ORDER BY

The ORDER BY clause specifies the order of the rows in each partition or the whole result set in case the PARTITION BY is omitted.

The CUME_DIST() function calculates the cumulative distribution value of each row based on its order in the partition.

The approximate formula of the CUME_DIST() function is as follows:

ROW_NUMBER() / total_rowsCode language: SQL (Structured Query Language) (sql)

MySQL CUME_DIST() function example

Let’s create a table called scores and populate some data for the demonstration:

CREATE TABLE scores (
    name VARCHAR(20) PRIMARY KEY,
    score INT NOT NULL
);

INSERT INTO
	scores(name, score)
VALUES
	('Smith',81),
	('Jones',55),
	('Williams',55),
	('Taylor',62),
	('Brown',62),
	('Davies',84),
	('Evans',87),
	('Wilson',72),
	('Thomas',72),
	('Johnson',100);Code language: SQL (Structured Query Language) (sql)

The following statement uses the CUME_DIST() function to find the cumulative distribution of the score in the result set:

SELECT 
  name, 
  score, 
  ROW_NUMBER() OVER (
    ORDER BY 
      score
  ) row_num, 
  CUME_DIST() OVER (
    ORDER BY 
      score
  ) cume_dist_val 
FROM 
  scores;Code language: SQL (Structured Query Language) (sql)

Here is the output:

In this example, the score is sorted in ascending order from 55 to 100. Note that the ROW_NUMBER() function was added for reference.

So how does the CUME_DIST() function perform calculation?

In the initial step, the function identifies the number of rows in the result set where the values are less than or equal to 55. This count is found to be 2 for the first row. Next, the CUME_DIST() function computes the cumulative distribution by dividing this count (2) by the total number of rows in the set, which is 10: 2/10. The result is 0.2 or 20%. The same procedure is then repeated for the second row.

In the case of the third row, the CUME_DIST() function identifies four rows within the result set where the values are less than or equal to 62. then, the CUME_DIST() function computes the cumulative distribution by dividing this count (4) by the total number of rows in the set, which is 10: 4/10. The result is 0.4 or 40%.

The same calculation logic is applied to the remaining rows.

Summary

Use the MySQL CUME_DIST() function to calculate the cumulative distribution of a value in a set of values.

Was this tutorial helpful?