Often there are situations where duplicate rows exist in a table and we need to get those that are unique or delete the duplicates. Earlier this year i had a post about Partitioning, now let's see how it works. In Microsoft SQL Server 2005 has been added the Row_Number() Over(Partition By...Order by...) feature and it can be used efficiently for such situations.

First let's build the scenario.
Create the table
create table Emp_Details 
( Emp_Name varchar(10)
, Company varchar(15)
, Join_Date datetime
, Resigned_Date datetime
)
go
and insert sample rows
insert into Emp_Details (Emp_Name, Company, Join_Date, Resigned_Date)
values ('John', 'Software', '20060101', '20061231')
,('John', 'Software', '20060101', '20061231')
,('John', 'Software', '20060101', '20061231')
,('John', 'SuperSoft', '20070101', '20071231')
,('John', 'UltraSoft', '20070201', '20080131')
,('John', 'ImproSoft', '20080201', '20081231')
,('John', 'ImproSoft', '20080201', '20081231')
,('Mary', 'Software', '20060101', '20081231')
,('Mary', 'SuperSoft', '20090101', '20090531')
,('Mary', 'SuperSoft', '20090101', '20090531')
,('Mary', 'UltraSoft', '20090601', '20100531')
,('Mary', 'UltraSoft', '20090601', '20100531')

So, what effect would have using Row_Number() Over() with the Partition By clause? The Row_Number() Over() function is looking for rows with the same values of Emp_Name, Company, Join_Date and Resigned_Date columns in the Emp_Details table. The first occurrence of this combination of columns is being allocated a RowNumber=1. The subsequent occurrences of the same combination of data are being allocated RowNumber of 2, 3, etc. When a new combination of Emp_Name, Company, Join_Date and Resigned_Date columns is encountered, that set is treated as a new partition and the RowNumber starts from 1 again thanks to the Partition By clause. In essence, the columns in the Partition By clause are being grouped together as per the Partition By clause and then ordered using the Order By clause:

select Emp_Name
,Company
,Join_Date
,Resigned_Date
,ROW_NUMBER() over (partition by Emp_Name, Company, Join_Date
,Resigned_Date
order by Emp_Name, Company, Join_Date
,Resigned_Date) RowNumber
from Emp_Details
And the result is:
Emp_Name Company      Join_Date      Resigned_Date  RowNumber
-------- ------------ -------------- -------------- ----------
John ImproSoft 2008-02-01 2008-12-31 1
John ImproSoft 2008-02-01 2008-12-31 2
John Software 2006-01-01 2006-12-31 1
John Software 2006-01-01 2006-12-31 2
John Software 2006-01-01 2006-12-31 3
John SuperSoft 2007-01-01 2007-12-31 1
John UltraSoft 2007-02-01 2008-01-31 1
Mary Software 2006-01-01 2008-12-31 1
Mary SuperSoft 2009-01-01 2009-05-31 1
Mary SuperSoft 2009-01-01 2009-05-31 2
Mary UltraSoft 2009-06-01 2010-05-31 1
Mary UltraSoft 2009-06-01 2010-05-31 2
If we need to get unique rows:
select a.Emp_Name, a.Company, a.Join_Date, a.Resigned_Date, a.RowNumber
from
(select Emp_Name
,Company
,Join_Date
,Resigned_Date
,ROW_NUMBER() over (partition by Emp_Name, Company, Join_Date
,Resigned_Date
order by Emp_Name, Company, Join_Date
,Resigned_Date) RowNumber
from Emp_Details) a
where a.RowNumber = 1
To see the rows with duplicates:
select a.Emp_Name, a.Company, a.Join_Date, a.Resigned_Date, a.RowNumber
from
(select Emp_Name
,Company
,Join_Date
,Resigned_Date
,ROW_NUMBER() over (partition by Emp_Name, Company, Join_Date
,Resigned_Date
order by Emp_Name, Company, Join_Date
,Resigned_Date) RowNumber
from Emp_Details) a
where a.RowNumber > 1
And finally to remove the duplicates:
delete from a
from
(select Emp_Name, Company, Join_Date, Resigned_Date
,ROW_NUMBER() over (partition by Emp_Name, Company, Join_Date
,Resigned_Date
order by Emp_Name, Company, Join_Date
,Resigned_Date) RowNumber
from Emp_Details) a
where a.RowNumber > 1

Enjoy!