DataFrame.set_index() to create index using one more more columns in Pandas

Pandas

Create index using columns .

options

keys : single column or list of columns which will be used as index
drop: Bool, default True. Delete the column after creating the index
append : Bool, default False. Whether to append column to existing index
inplace: Bool, default False. Modify the existing DataFrame or not.
verify_integrity : Bool, default False. Check for duplicates.

keys

import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,70,30],
         'ENGLISH':[80,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_data.set_index('NAME')
print(my_data)

Output

   NAME  ID  MATH  ENGLISH
0  Ravi   1    80       80
1  Raju   2    40       70
2  Alex   3    70       40
3   Ron   4    70       50
4  King   5    70       60
5  Jack   6    30       30

inplace

Modify the DataFrame or not

my_data.set_index('NAME',inplace=True)
print(my_data)

Output

      ID  MATH  ENGLISH
NAME                   
Ravi   1    80       80
Raju   2    40       70
Alex   3    70       40
Ron    4    70       50
King   5    70       60
Jack   6    30       30

drop

By default the column is deleted ( drop=True ) after marking it as index.

my_data_mod=my_data.set_index('NAME',drop=False)
print(my_data_mod)

Output

      NAME  ID  MATH  ENGLISH
NAME                         
Ravi  Ravi   1    80       80
Raju  Raju   2    40       70
Alex  Alex   3    70       40
Ron    Ron   4    70       50
King  King   5    70       60
Jack  Jack   6    30       30

verify_integrity

We have changed the DataFrame by using duplicate value for NAME column. Now if we will set the verify_integrity=True then we will get ValueError like this

ValueError: Index has duplicate keys: Index(['Ron'], dtype='object', name='NAME')

By changing like this verify_integrity=False we can supress the error and continue.

my_data_mod=my_data.set_index('NAME',verify_integrity=False)
print(my_data_mod)

Output

      ID  MATH  ENGLISH
NAME                   
Ravi   1    80       80
Raju   2    40       70
Alex   3    70       40
Ron    4    70       50
King   5    70       60
Ron    6    30       30

append

Default value is False We will check with append=True

my_data_mod=my_data.set_index('NAME',append=True)
print(my_data_mod)

Output is here

        ID  MATH  ENGLISH
  NAME                   
0 Ravi   1    80       80
1 Raju   2    40       70
2 Alex   3    70       40
3 Ron    4    70       50
4 King   5    70       60
5 Jack   6    30       30

Now let us make append=False

my_data_mod=my_data.set_index('NAME',append=False)
print(my_data_mod)

Output

      ID  MATH  ENGLISH
NAME                   
Ravi   1    80       80
Raju   2    40       70
Alex   3    70       40
Ron    4    70       50
King   5    70       60
Jack   6    30       30

Using set_index in DateTime columns

We can get all records of year 2020 and month March by this. Note that st_date is our datetime column

print(my_data.set_index('st_date')['2020-03'])

Similarly we can get all records between two periods like this.

print(my_data.set_index('st_date')['2019-03':'2019-04'])

You can get more examples of using date column at Exercise3

Pandas reset_index() date_range() to_datetime() period_range()

Numpy arrays Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here