get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
From categorical variables to dummy / indicator vaiables by using get_dummies().
self | array, DataFrame |
prefix | str (optional ), string to append to column names. |
prefix_sep | str ( Optional ),default is '_', Separator to be used in column names |
dummy_na | Bool ( Optional ),default is False, Column is used to indicate NaN values. |
columns | list ( Optional ),default is None, columns to be encoded. |
sparse | dummy columns to be sparse or not |
drop_first | Bool ( default False ), to remove first level of categorical levels |
import pandas as pd
my_dict={'ID':[1,2,3,4,5,6],
'MATH':[80,40,70,70,70,30],
'CLASS1':['Four','Three','Three','Four','Five','Three']}
my_data = pd.DataFrame(data=my_dict)
print(pd.get_dummies(my_data))
Output
ID MATH CLASS1_Five CLASS1_Four CLASS1_Three
0 1 80 0 1 0
1 2 40 0 0 1
2 3 70 0 0 1
3 4 70 0 1 0
4 5 70 1 0 0
5 6 30 0 0 1
Output
ID MATH CLASS1_Five CLASS1_Four CLASS1_Three
0 1 80 0 1 0
1 2 40 0 0 1
2 3 70 0 0 1
3 4 70 0 1 0
4 5 70 1 0 0
5 6 30 0 0 1
prefix
String to be added before column name.
print(pd.get_dummies(my_data,prefix='my'))
Output
ID MATH my_Five my_Four my_Three
0 1 80 0 1 0
1 2 40 0 0 1
2 3 70 0 0 1
3 4 70 0 1 0
4 5 70 1 0 0
5 6 30 0 0 1
prefex_sep
Separator to be used between column name and prefix. Default value is #
print(pd.get_dummies(my_data,prefix='my',prefix_sep='-'))
Output
ID MATH my-Five my-Four my-Three
0 1 80 0 1 0
1 2 40 0 0 1
2 3 70 0 0 1
3 4 70 0 1 0
4 5 70 1 0 0
5 6 30 0 0 1
columns
List of Column names on which get_dummies() will be applied. By default categorical column is used.
print(pd.get_dummies(my_data,prefix='my',columns=['CLASS1','MATH']))
Output
ID my_Five my_Four my_Three my_30 my_40 my_70 my_80
0 1 0 1 0 0 0 0 1
1 2 0 0 1 0 1 0 0
2 3 0 0 1 0 0 1 0
3 4 0 1 0 0 0 1 0
4 5 1 0 0 0 0 1 0
5 6 0 0 1 1 0 0 0
spares
Default value is False ( boolean ).
print(pd.get_dummies(my_data,prefix='my',sparse=True))
drop_first
Boolean , default value is False. If it is set to True then first level is removed.
print(pd.get_dummies(my_data,prefix='my',drop_first=True))
Output
ID MATH my_Four my_Three
0 1 80 1 0
1 2 40 0 1
2 3 70 0 1
3 4 70 1 0
4 5 70 0 0
5 6 30 0 1
« Pandas
Plotting graphs
mean
min
sum
len
Filtering of Data
← Subscribe to our YouTube Channel here