« Pandas « Data Cleaning
Series.duplicated(keep)
keep | Optional ,
'first' default, all duplicates are marked True except first one
'last' , all duplicates are marked True except last one
'False' ,all duplicates are marked True
|
Series : indicates duplicate values.
DataFrame :indicates duplicate rows.( can consider based some column values )
dataframe.duplicated() »
Using Series
In our series below, we have one element as duplicate value ( 'Two' )
import pandas as pd
my_data=pd.Series(['One','Two','Three','Two','Four'])
my_data.duplicated()
Output ( default value of keep is first , keep='first')
0 False
1 False
2 False
3 True
4 False
dtype: bool
with keep='last'
, duplicate values are marked True except last one
import pandas as pd
my_data=pd.Series(['One','Two','Three','Two','Four'])
my_data.duplicated(keep='last')
0 False
1 True
2 False
3 False
4 False
dtype: bool
with keep=False
, all duplicate values are marked True
import pandas as pd
my_data=pd.Series(['One','Two','Three','Two','Four'])
my_data.duplicated(keep=False)
Output
0 False
1 True
2 False
3 True
4 False
dtype: bool
Displaying Only duplicate data
Showing data having duplicate value.
print(my_data[my_data.duplicated()])
Output
3 Two
dtype: object
Displaying unique data
Let us remove data having duplidate vlaues.
print(my_data[~my_data.duplicated()])
Output
0 One
1 Two
2 Three
4 Four
dtype: object
We can use unique()
print(my_data.unique())
Output
['One' 'Two' 'Three' 'Four']
Data Cleaning
« Pandas
dataframe.duplicated()
Series.drop_duplicates()
dataframe.drop_duplicates()
← Subscribe to our YouTube Channel here