Summary: in this tutorial, you will learn how to use MySQL standard deviation functions to calculate populate standard deviation and sample standard deviation.
Introduction to standard deviation
Standard deviation is a measure of how spread out the values in a data set are. The standard deviation shows how much variation exists from the average (mean).
A low standard deviation shows that the values in the data set are close to the mean. The high standard deviation indicates that the values of the dataset are spread out over a large range of values.
A standard deviation is a square root of the variance, which can be calculated by using the following steps:
- Step 1. Calculate the average of all values in the data set to get the average or mean e.g., suppose the data set consists of 1, 2, and 3, the mean is (1+2+3)/3 = 2.
- Step 2. Compute the difference of the value from the mean for each value, and square the result of each e.g., (1-2)2= (-1)2 = 1, (2-2)2 = (0)2 = 0, (3-2)2 = (1)2 = 1.
- Step 3. Calculate the average values in step 2, which produces the variance. Then take a square root of the variance to get the standard deviation of all values in the data set e.g., square root of ((1 + 0 + 1)/3) = 0.816497
Population standard deviation vs. sample standard deviation
If all values in the data set are taken into the calculation, this standard deviation is called population standard deviation. However, if a subset of values or a sample is taken into the calculation, this standard deviation is called sample standard deviation.
A sigma letter (σ) represents the standard deviation. The following equations illustrate how to calculate population standard deviation and sample standard deviation:
Population standard deviation:
Sample standard deviation:
The calculation of population standard deviation and sample standard deviation is slightly different. When calculating the variance of sample standard deviation, divide by N-1 instead of N, where N is the number of values in the data set.
MySQL standard deviation functions
MySQL makes it easy for you to calculate the population standard deviation and sample standard deviation.
To calculate population standard deviation, you use one of the following functions:
STD(expression)
– returns the population standard deviation of the expression. The STD function returns NULL if there is no matching row.STDDEV(expression)
– is equivalent to theSTD
function. It is provided to be compatible with Oracle Database only.STDEV_POP(expression)
– is equivalent to theSTD
function.
To calculate the sample standard deviation, you use the STDDEV_SAMP (expression)
function.
MySQL also provides some functions for population variance and sample variance calculation:
VAR_POP(expression)
– calculates the population standard variance of the expression.VARIANCE(expression)
– is equivalent to theVAR_POP
function.VAR_SAMP(expression)
– calculates the sample standard variance of the expression.
Examples of MySQL standard deviation functions
Let’s take a look at the orders
table in the sample database.
Examples of population standard deviation functions
First, the following query returns the customer numbers and the number of orders from the orders
table:
SELECT customerNumber,
COUNT(*) orderCount
FROM orders
WHERE status = 'Shipped'
GROUP BY customerNumber;
Code language: SQL (Structured Query Language) (sql)
Second, the following statement calculates the population standard deviation of the number of orders of the customers:
SELECT FORMAT(STD(orderCount),2)
FROM (SELECT customerNumber, count(*) orderCount
FROM orders
GROUP BY customerNumber) t;
Code language: SQL (Structured Query Language) (sql)
Notice that the FORMAT function is used for formatting the result of the STD
function.
Examples of sample standard deviation functions
Suppose you only want to evaluate shipped orders in the orders table.
First, the following query returns the customer numbers and the number of shipped orders:
SELECT customerNumber, count(*) orderCount
FROM orders
WHERE status = 'Shipped'
GROUP BY customerNumber;
Code language: SQL (Structured Query Language) (sql)
Second, the following query uses the STDDEV_SAMP
function to calculate the sample standard deviation:
SELECT FORMAT(STDDEV_SAMP(orderCount),2)
FROM (SELECT customerNumber, count(*) orderCount
FROM orders
WHERE status = 'Shipped'
GROUP BY customerNumber) t;
Code language: SQL (Structured Query Language) (sql)
In this tutorial, we have introduced you to the standard deviation concept. Then, we showed you how to use the MySQL standard deviation functions to calculate the population standard deviation and sample standard deviation of an expression.