duplicated() : getting duplicate rows

Pandas

DataFrame.duplicated(keep)
keepOptional ,
'first' default, all duplicates are marked True except first one
'last', all duplicates are marked True except last one
'False',all duplicates are marked True
Series : indicates duplicate values.
DataFrame :indicates duplicate rows.( can consider based some column values )
Serries.duplicated()

Using DataFrame

Here is a sample DataFrame.
import pandas as pd 
my_dict={
  'id':[1,2,3,4,5,4,2],
  'name':['John','Max','Arnold','Krish','John','Krish','Max'],
  'class1':['Four','Three','Three','Four','Four','Four','Three'],
  'mark':[75,85,55,60,60,60,85],
  'sex':['female','male','male','female','female','female','male']
	}
my_data = pd.DataFrame(data=my_dict)
print(my_data)
Output ( here last two rows are duplicates, 6 is duplicate of 1 and 5 is duplicate of 3 )
   id    name class1  mark     sex
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female
5   4   Krish   Four    60  female
6   2     Max  Three    85    male

Display rows indicating duplicates

print(my_data.duplicated())
Output
0    False
1    False
2    False
3    False
4    False
5     True
6     True
dtype: bool
We can add one column with the above status.
my_data['status']=my_data['class1'].duplicated()

Display duplicate rows only

print(my_data[my_data.duplicated()])
Output
   id   name class1  mark     sex
5   4  Krish   Four    60  female
6   2    Max  Three    85    male

Display unique rows

print(my_data[~my_data.duplicated()])
Output ( without 5 and 6th row )
   id    name class1  mark     sex
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female

Display based on unique value of column

In our class1 column we will identify the first unique values and then display the row.
print(my_data[~my_data['class1'].duplicated(keep='first')])
Output
   id  name class1  mark     sex
0   1  John   Four    75  female
1   2   Max  Three    85    male
Pandas Series.duplicated() Series.drop_duplicates() dataframe.drop_duplicates()


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer