drop_duplicates() : deleting duplicate values

Pandas Data Cleaning

Series.drop_duplicates(keep)
keepOptional ,
'first' default, all duplicates are marked True except first one
'last', all duplicates are marked True except last one
'False',all duplicates are marked True
Series : Deletes duplicate values.
DataFrame :Deletes duplicate rows.
dataframe.drop_duplicates()

Using Series

In our series below, we have one element as duplicate value ( 'Two' )
import pandas as pd
my_data=pd.Series(['One','Two','Three','Two','Four'])
df=my_data.drop_duplicates()
print(df)
Output ( default value of keep is first , keep='first')
0      One
1      Two
2    Three
4     Four
dtype: object
With keep='last', duplicate values are deleted except last one
df=my_data.drop_duplicates(keep='last')
print(df)
0      One
2    Three
3      Two
4     Four
dtype: object
with keep=False , all duplicate values are deleted
df=my_data.drop_duplicates(keep=False)
print(df)
Output
0      One
2    Three
4     Four
dtype: object

inplace=True

By default inplace=False, so our main dataframe my_data is not altered when we use drop_duplicates(). So in above codes we have used another DataFrame df to store the output of drop_duplicates(). By using inplace=True we can modify our main DataFrame my_data
import pandas as pd
my_data=pd.Series(['One','Two','Three','Two','Four'])
my_data.drop_duplicates(keep='last',inplace=True)
print(my_data)
Output
0      One
2    Three
3      Two
4     Four
dtype: object
Data Cleaning


Pandas dataframe.duplicated() Series.duplicated() dataframe.drop_duplicates()


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer