Summary: in this tutorial, you will learn how to find duplicate values of one or more columns in MySQL.
Data duplication happens because of many reasons. Finding duplicate values is one of the important tasks that you must deal with when working with databases.
Setting up a sample table
First, create a table named contacts
with four columns: id
, first_name
, last_name
, and email
.
CREATE TABLE contacts (
id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(255) NOT NULL
);
Code language: SQL (Structured Query Language) (sql)
Second, inserts rows into the contacts
table:
INSERT INTO contacts (first_name,last_name,email)
VALUES ('Carine ','Schmitt','[email protected]'),
('Jean','King','[email protected]'),
('Peter','Ferguson','[email protected]'),
('Janine ','Labrune','[email protected]'),
('Jonas ','Bergulfsen','[email protected]'),
('Janine ','Labrune','[email protected]'),
('Susan','Nelson','[email protected]'),
('Zbyszek ','Piestrzeniewicz','[email protected]'),
('Roland','Keitel','[email protected]'),
('Julie','Murphy','[email protected]'),
('Kwai','Lee','[email protected]'),
('Jean','King','[email protected]'),
('Susan','Nelson','[email protected]'),
('Roland','Keitel','[email protected]');
Code language: SQL (Structured Query Language) (sql)
Third, query data from the contacts
table:
SELECT * FROM contacts
ORDER BY email;
Code language: SQL (Structured Query Language) (sql)
In the contacts
table, we have some rows that have duplicate values in the first_name
, last_name
, and email
columns. Let’s learn how to find them.
Find duplicate values in one column
The find duplicate values in one column of a table, you follow these steps:
- First, use the
GROUP BY
clause to group all rows by the target column, which is the column that you want to check duplicate. - Then, use the
COUNT()
function in theHAVING
clause to check if any group has more than 1 element. These groups are duplicates.
The following query illustrates the idea:
SELECT
col,
COUNT(col)
FROM
table_name
GROUP BY col
HAVING COUNT(col) > 1;
Code language: SQL (Structured Query Language) (sql)
By using this query template, you can find rows that have duplicate emails in the contacts
table as follows:
SELECT
email,
COUNT(email)
FROM
contacts
GROUP BY email
HAVING COUNT(email) > 1;
Code language: SQL (Structured Query Language) (sql)
This picture shows the output of the query that shows the duplicate emails:
Find duplicate values in multiple columns
Sometimes, you want to find duplicate rows based on multiple columns instead of one. In this case, you can use the following query:
SELECT
col1, COUNT(col1),
col2, COUNT(col2),
...
FROM
table_name
GROUP BY
col1,
col2, ...
HAVING
(COUNT(col1) > 1) AND
(COUNT(col2) > 1) AND
...
Code language: SQL (Structured Query Language) (sql)
Rows are considered duplicate only when the combination of columns is duplicate therefore we used the AND
operator in the HAVING
clause.
For example, to find rows in the contacts
table with duplicate values in first_name
, last_name
, and email
column, you use the following query:
SELECT
first_name, COUNT(first_name),
last_name, COUNT(last_name),
email, COUNT(email)
FROM
contacts
GROUP BY
first_name ,
last_name ,
email
HAVING COUNT(first_name) > 1
AND COUNT(last_name) > 1
AND COUNT(email) > 1;
Code language: SQL (Structured Query Language) (sql)
The following illustrates the output of the query:
In this tutorial, you have learned how to find duplicate rows based on the value of one or more columns in MySQL.