Summary: in this tutorial, you will learn how to use the MySQL VARIANCE()
function to calculate the population variance of a set of values in a column of a table.
Introduction to the MySQL VARIANCE() function
By definition, a variance is an average of squared differences from the mean. To calculate the variance, you follow these steps:
- First, calculate the means of the numbers.
- Second, subtract the mean from the number and square the result. The result is called a squared difference.
- Finally, calculate the average of the squared differences.
In MySQL, you use the VARIANCE()
function to calculate the population variance of numbers in a column of a table.
Here’s the syntax of the VARIANCE()
function:
SELECT VARIANCE(numeric_expression)
FROM table_name;
Code language: SQL (Structured Query Language) (sql)
In this syntax:
numeric_exprssion
: The variance accepts an argument that can be a numeric expression or a numeric column.table_name
: The name of the table that contains the column.
If the numeric_expression
is null or the column has no rows, the VARIANCE()
function returns NULL
.
Note that the VARIANCE()
function is equivalent to the VAR_POP()
function. The VAR_POP()
function is a standard SQL, whereas the VARIANCE()
function is not.
MySQL VARIANCE examples
Let’s take some examples of using the VARIANCE()
function.
1) Simple VARIANCE function example
First, create a new table called apples
that has three columns id
, color
, and weight
:
CREATE TABLE apples(
id INT AUTO_INCREMENT,
color VARCHAR(255) NOT NULL,
weight DECIMAL(6,2) NOT NULL,
PRIMARY KEY(id)
);
Code language: SQL (Structured Query Language) (sql)
Second, insert some rows into the apples
table:
INSERT INTO apples (color, weight) VALUES
('Red', 0.6),
('Green', 0.4),
('Yellow', 0.35),
('Red', 0.28),
('Green', 0.42),
('Orange', 0.38),
('Red', 0.31),
('Purple', 0.45),
('Green', 0.37),
('Yellow', 0.33);
Code language: SQL (Structured Query Language) (sql)
Third, query data from the apples
table:
SELECT * FROM apples;
Code language: SQL (Structured Query Language) (sql)
Output:
+----+--------+--------+
| id | color | weight |
+----+--------+--------+
| 1 | Red | 0.60 |
| 2 | Green | 0.40 |
| 3 | Yellow | 0.35 |
| 4 | Red | 0.28 |
| 5 | Green | 0.42 |
| 6 | Orange | 0.38 |
| 7 | Red | 0.31 |
| 8 | Purple | 0.45 |
| 9 | Green | 0.37 |
| 10 | Yellow | 0.33 |
+----+--------+--------+
Code language: SQL (Structured Query Language) (sql)
Finally, calculate the population variance of the weight
of the apples
table:
SELECT color, VARIANCE(weight)
FROM apples
GROUP BY color;
Code language: SQL (Structured Query Language) (sql)
Output:
+--------+------------------------+
| color | VARIANCE(weight) |
+--------+------------------------+
| Red | 0.020822222222222218 |
| Green | 0.0004222222222222221 |
| Yellow | 0.00009999999999999934 |
| Orange | 0 |
| Purple | 0 |
+--------+------------------------+
5 rows in set (0.00 sec)
Code language: SQL (Structured Query Language) (sql)
Here is how you interpret the variance:
- The low variance indicates that the weights are close to the mean suggesting less variability (yellow and green).
- The high variance suggests that the weights are spread out from the mean indicating greater variability (red).
- Zero means there is no variance. The orange and purple apples have a single row, so the variance is zero.
2) Using the VARIANCE function to calculate the variance of quantity in stock of products
We’ll use the products
table from the sample database:
The following query uses the VARIANCE
function to calculate the variance of the quantity in stock by product line:
SELECT
productLine,
CAST(
VARIANCE (quantityInStock) as DECIMAL(10, 2)
) var
FROM
products
GROUP BY
productLine
ORDER BY
var;
Code language: SQL (Structured Query Language) (sql)
Output:
+------------------+-------------+
| productLine | var |
+------------------+-------------+
| Ships | 4243681.14 |
| Trucks and Buses | 4855676.15 |
| Planes | 7335654.08 |
| Classic Cars | 8079830.03 |
| Vintage Cars | 8136973.06 |
| Trains | 8455640.22 |
| Motorcycles | 10638768.40 |
+------------------+-------------+
7 rows in set (0.00 sec)
Code language: SQL (Structured Query Language) (sql)
The output indicates that the stocks for Ships
are fairly consistent or close to the average value for that product line while the stocks for Motorcycles
have the greatest variability.
Summary
- Use the MySQL
VARIANCE()
function to calculate the variance of a set of values in a column of a table.