Delete duplicate rows ​​in SQL

Sometimes we have one, two or more pieces of data that are repeated in
our tables. Even sometimes they are not only repeated 2 times, but 3 or
more. So today we will see how to remove these duplicates.

Solution

Basically we have to add some indexes to the table. Since, if we
remember, indexes are indexes because they are not repeated (and because
they help make searches faster).

Then, if we add indexes in the fields that we do not want to be repeated,
they will be eliminated in such a way that, of all the repeated ones, only
one will remain.

The query is:

ALTER IGNORE TABLE your_table ADD UNIQUE INDEX(column_1, column_2);

Where your_table is the name of the table with repeated data. In this
case it is supposed to be only 2 columns, but we can add many columns.

We use IGNORE so that it does not give us any error, since it will try to
warn us that there is repeated data and that it will be eliminated. But
since we already know that there are duplicates, we ignore the error.

After doing so, we can remove the index using:

ALTER TABLE your_table DROP INDEX column_1;

Here it is important to mention that the name of the index is formed by
the name of the first column. So if the first column was ‘name’ then the
index will be ‘name’ no matter how many columns you have indexed.

Practical example

In a table called ‘qualifications’ created with the following statement:

CREATE TABLE `qualifications` (
`idnumber` INT(9) NULL DEFAULT NULL,
`name` VARCHAR(100) NULL DEFAULT NULL,
`email` VARCHAR(100) NULL DEFAULT NULL,
`qualification` INT(2) NULL DEFAULT NULL,
`period` VARCHAR(50) NULL DEFAULT NULL
);

And we fill it with the data:

ID, Name, Email1, Note, Period
45121547,user1,[email protected],10,2022-1
45121548,user2,[email protected],11,2022-2
45121549,user3,[email protected],12,2022-3
45121550,user4,[email protected],13,2022-4
45121547,user1,[email protected],10,2022-1
45121548,user2,[email protected],11,2022-2
45121549,user3,[email protected],12,2022-3
45121550,user4,[email protected],13,2022-4
45121547,user1,[email protected],10,2022-1
45121548,user2,[email protected],11,2022-2
45121549,user3,[email protected],12,2022-3
45121550,user4,[email protected],13,2022-4

Which you must save in a CSV extension file and then load it to the table,
using the following code:

LOAD DATA LOW_PRIORITY LOCAL INFILE ‘~\\your_file.csv’ INTO TABLE
`alumni`.`qualifications` CHARACTER SET utf8 FIELDS TERMINATED BY ‘,’
OPTIONALLY ENCLOSED BY ‘,’ ESCAPED BY ‘,’ IGNORE 1 LINES (`idnumber`,
`name`, `email`, `qualification`, `period`);

It can be seen that all the records are repeated.

Eliminating duplicates

Now you must execute the command:

ALTER IGNORE TABLE qualifications ADD UNIQUE INDEX(name);

I will display the data again:

SELECT * FROM qualifications;

Being that way:

qualifications

idnumber name email qualification period
45121547 user1 [email protected] 10 2022-1
45121548 user2 [email protected] 11 2022-2
45121549 user3 [email protected] 12 2022-3
45121550 user4 [email protected] 13 2022-4

Delete Index

Finally, if we don’t want the indexes to be there, we can use:

ALTER TABLE qualifications DROP INDEX name;

It is necessary to notice that the name of the index is taken from the
first column that we specify when indexing.

Since in this case it was ‘name’, then that was the name of the index.

Leave a Reply

Your email address will not be published.