import pandas as pd
my_dict={
'id':[1,2,3,4,5,4,2],
'name':['John','Max','Arnold','Krish','John','Krish','Max'],
'class1':['Four','Three','Three','Four','Four','Four','Three'],
'mark':[75,85,55,60,60,60,85],
'gender':['female','male','male','female','female','female','male']
}
df = pd.DataFrame(data=my_dict)
print(df)
Output ( here last two rows are duplicates, 6 is duplicate of 1 and 5 is duplicate of 3 )
id name class1 mark gender
0 1 John Four 75 female
1 2 Max Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Four 60 female
4 5 John Four 60 female
5 4 Krish Four 60 female
6 2 Max Three 85 male
print(df.duplicated())
Output
0 False
1 False
2 False
3 False
4 False
5 True
6 True
dtype: bool
We can add one column with the above status.
df['status']=df.duplicated()
Based on one column duplicate values we can update one new column.
df['status']=df['class1'].duplicated()
Syntax
keep | Optional , 'first' default, all duplicates are marked True except first one 'last' , all duplicates are marked True except last one 'False' ,all duplicates are marked True
|
subset | Columns to be considered for identifying duplicates, default value is all columns |
print(df[df.duplicated()])
Output
id name class1 mark gender
5 4 Krish Four 60 female
6 2 Max Three 85 male
Similar output we will get by using keep='last'
print(df[df.duplicated(keep='last')])
print(df[~df.duplicated()])
Output ( without 5 and 6th row )
id name class1 mark gender
0 1 John Four 75 female
1 2 Max Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Four 60 female
4 5 John Four 60 female
print(df[df.duplicated(keep='last',subset=['class1'])])
Output
id name class1 mark gender
0 1 John Four 75 female
1 2 Max Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Four 60 female
4 5 John Four 60 female
We can use more than one column also.
print(df[df.duplicated(keep='last',subset=['class1','gender'])])
Without using subsetprint(df[~df['class1'].duplicated(keep='first')])
Output
id name class1 mark gender
0 1 John Four 75 female
1 2 Max Three 85 male
print(df[df[['class1','mark']].duplicated() ])
id name class1 mark gender
4 5 John Four 60 female
5 4 Krish Four 60 female
6 2 Max Three 85 male
AUTHOR
🎥 Join me live on YouTubePassionate about coding and teaching, I love sharing practical programming tutorials on PHP, Python, JavaScript, SQL, and web development. With years of experience, my goal is to make learning simple, engaging, and project-oriented. Whether you're a beginner or an experienced developer, I believe learning by doing is the best way to master coding. Let's explore the world of programming together!