drop_duplicates() : delete duplicate rows

Using DataFrame

Here is a sample DataFrame.
import pandas as pd 
my_dict={
  'id':[1,2,3,4,5,4,2],
  'name':['John','Max','Arnold','Krish','John','Krish','Max'],
  'class1':['Four','Three','Three','Four','Four','Four','Three'],
  'mark':[75,85,55,60,60,60,85],
  'gender':['female','male','male','female','female','female','male']
	}
df = pd.DataFrame(data=my_dict)
print(df)
Output ( here last two rows are duplicates, 6 is duplicate of 1 and 5 is duplicate of 3 )
   id    name class1  mark     gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female
5   4   Krish   Four    60  female
6   2     Max  Three    85    male

drop_duplicates: Deleting rows based on number of duplicate data #C03


Syntax
DataFrame.drop_duplicates(keep)
keepOptional ,
'first' default, delete all duplicate rows except first occurrence
'last', delete all duplicate rows except last occurrence
'False',delete all duplicate rows
Series : deletes duplicate values.
DataFrame :deletes duplicate rows.( can consider based some column values )
Serries.drop_duplicates()

Delete duplicate rows after first occurrence

Remove all duplicate rows but keep the first occurence. keep='first' but this is the default value of keep
import pandas as pd
my_dict={
	'id':[1,2,3,4,5,4,2],
	'name':['John','Max','Arnold','Krish','John','Krish','Max'],
    'class1':['Four','Three','Three','Four','Four','Four','Three'],
	'mark':[75,85,55,60,60,60,85],
    'gender':['female','male','male','female','female','female','male']
	}
df = pd.DataFrame(data=my_dict)
df=df.drop_duplicates(keep='first')
print(df)
Output : Note that we have assigned output to a new DataFrame df because by default inplace=False ( explained below )
   id    name class1  mark     gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female

Delete duplicate rows but keep last occurence

keep='last'
df=df.drop_duplicates(keep='last')
print(df)
Output
   id    name class1  mark     gender
0   1    John   Four    75  female
2   3  Arnold  Three    55    male
4   5    John   Four    60  female
5   4   Krish   Four    60  female
6   2     Max  Three    85    male

Delete duplicate rows in all places

keep=False
df=df.drop_duplicates(keep=False)
print(df)
Output ( all duplicate rows are deleted from all places )
   id    name class1  mark     gender
0   1    John   Four    75  female
2   3  Arnold  Three    55    male
4   5    John   Four    60  female

inplace=True

By default inplace=False, so our main dataframe df is not altered when we use drop_duplicates(). So in above codes we have used another DataFrame df to store the output of drop_duplicates(). By using inplace=True we can modify our main DataFrame df
df.drop_duplicates(inplace=True)
print(df)
Output
   id    name class1  mark     gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female

subset

Only consider certain columns for identifying duplicates, by default use all of the columns.
import pandas as pd 
my_dict={
  'id':[1,2,3,4,5,4,2],
  'name':['John','Max','Arnold','Krish','John','Krish','Max'],
  'class1':['Four','Three','Three','Four','Four','Four','Three'],
  'mark':[75,85,55,60,60,60,85],
  'gender':['female','male','male','female','female','female','male']
	}
df = pd.DataFrame(data=my_dict)
df.drop_duplicates(subset=['class1','mark','gender'],inplace=True)
print(df)
Output
   id    name class1  mark  gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female

ignore_index

If True, the resulting axis will be labeled 0, 1, …, n - 1, default value is False
df.drop_duplicates(keep='last',inplace=True,ignore_index=True)
Output
   id    name class1  mark  gender
0   1    John   Four    75  female
1   3  Arnold  Three    55    male
2   5    John   Four    60  female
3   4   Krish   Four    60  female
4   2     Max  Three    85    male
Data Cleaning
Pandas dataframe.duplicated() Series.duplicated() Series.drop_duplicates()
Subhendu Mohapatra — author at plus2net
Subhendu Mohapatra

Author

🎥 Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.



Subscribe to our YouTube Channel here



plus2net.com







Python Video Tutorials
Python SQLite Video Tutorials
Python MySQL Video Tutorials
Python Tkinter Video Tutorials
We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles Contact us
©2000-2025   plus2net.com   All rights reserved worldwide Privacy Policy Disclaimer