Output ( here last two rows are duplicates, 6 is duplicate of 1 and 5 is duplicate of 3 )
id name class1 mark gender
0 1 John Four 75 female
1 2 Max Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Four 60 female
4 5 John Four 60 female
5 4 Krish Four 60 female6 2 Max Three 85 male
Based on one column duplicate values we can update one new column.
df['status']=df['class1'].duplicated()
Syntax
keep
Optional , 'first' default, all duplicates are marked True except first one 'last', all duplicates are marked True except last one 'False',all duplicates are marked True
subset
Columns to be considered for identifying duplicates, default value is all columns
Returns a Series by indicates duplicate values.
DataFrame :indicates duplicate rows.( can consider based some column values )
Serries.duplicated() »
Display duplicate rows only
print(df[df.duplicated()])
Output
id name class1 mark gender
5 4 Krish Four 60 female
6 2 Max Three 85 male
Similar output we will get by using keep='last'
print(df[df.duplicated(keep='last')])
Display unique rows
print(df[~df.duplicated()])
Output ( without 5 and 6th row )
id name class1 mark gender
0 1 John Four 75 female
1 2 Max Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Four 60 female
4 5 John Four 60 female