duplicated() : getting duplicate values

Pandas Data Cleaning

Series.duplicated(keep)
keepOptional ,
'first' default, all duplicates are marked True except first one
'last', all duplicates are marked True except last one
'False',all duplicates are marked True
Series : indicates duplicate values.
DataFrame :indicates duplicate rows.( can consider based some column values )
dataframe.duplicated()

Using Series

In our series below, we have one element as duplicate value ( 'Two' )
import pandas as pd
my_data=pd.Series(['One','Two','Three','Two','Four'])
my_data.duplicated()
Output ( default value of keep is first , keep='first')
0    False
1    False
2    False
3     True
4    False
dtype: bool
with keep='last', duplicate values are marked True except last one
import pandas as pd
my_data=pd.Series(['One','Two','Three','Two','Four'])
my_data.duplicated(keep='last')
0    False
1     True
2    False
3    False
4    False
dtype: bool
with keep=False , all duplicate values are marked True
import pandas as pd
my_data=pd.Series(['One','Two','Three','Two','Four'])
my_data.duplicated(keep=False)
Output
0    False
1     True
2    False
3     True
4    False
dtype: bool

Displaying Only duplicate data

Showing data having duplicate value.
print(my_data[my_data.duplicated()])
Output
3    Two
dtype: object

Displaying unique data

Let us remove data having duplidate vlaues.
print(my_data[~my_data.duplicated()])
Output
0      One
1      Two
2    Three
4     Four
dtype: object
We can use unique()
print(my_data.unique())
Output
['One' 'Two' 'Three' 'Four']
Data Cleaning


Pandas dataframe.duplicated() Series.drop_duplicates() dataframe.drop_duplicates()


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer