Remove duplicates

De-duplication of data is a common need, especially when dealing with data-quality issues in environments that end up with duplicate rows due to lack of enforcement of uniqueness with constraints. As an example, the following code prepares sample data with duplicate orders in a table called MyItem:

IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'MyItems')
        DROP TABLE dbo.MyItems
GO
               
               
CREATE TABLE dbo.MyItems
(
        IdItem INT IDENTITY(1,1),
        Name VARCHAR(50),
        Description VARCHAR(256)
)
GO

INSERT INTO dbo.MyItems(Name, Description)
VALUES
        ('Item1', 'Description 1'),
        ('Item1', 'Description 1'),
        ('Item2', 'Description 2'),
        ('Item3', 'Description 3'),
        ('Item3', 'Description 3'),
        ('Item3', 'Description 3 - different'),
        ('Item4', 'Description 4')
       
SELECT *
FROM dbo.MyItems
GO

An option is to filter either just the distinct rows or all but the distinct rows. You compute both ROW_NUMBER and RANK based on Name + Description ordering, like so:

SELECT
        IdItem, Name, Description,
        ROW_NUMBER() OVER (ORDER BY Name, Description) as row_num,
        RANK() OVER (ORDER BY Name, Description) as rnk
FROM dbo.MyItems
GO

Suppose that you need to de-duplicate the data, keeping only one occurrence of each unique Name+Description combination.

WITH cte
AS
(
        SELECT
                IdItem, Name, Description,
                ROW_NUMBER() OVER (ORDER BY Name, Description) as row_num,
                RANK() OVER (ORDER BY Name, Description) as rnk
        FROM dbo.MyItems
)
DELETE FROM cte
WHERE row_num != rnk

SELECT IdItem, Name, Description
FROM dbo.MyItems

When you’re done, run the following code for cleanup:

IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'MyItems')
        DROP TABLE dbo.MyItems
GO

Is this useful?