Python Pandas DataFrame get_dummies() to convert categorical variable into dummy/indicator variables

Pandas

get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

From categorical variables to dummy / indicator vaiables by using get_dummies().

`self`	array, DataFrame
`prefix`	str (optional ), string to append to column names.
`prefix_sep`	str ( Optional ),default is '_', Separator to be used in column names
`dummy_na`	Bool ( Optional ),default is False, Column is used to indicate NaN values.
`columns`	list ( Optional ),default is None, columns to be encoded.
`sparse`	dummy columns to be sparse or not
`drop_first`	Bool ( default False ), to remove first level of categorical levels

import pandas as pd 
my_dict={'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,70,30],
         'CLASS1':['Four','Three','Three','Four','Five','Three']}
my_data = pd.DataFrame(data=my_dict)
print(pd.get_dummies(my_data))

Output

   ID  MATH  CLASS1_Five  CLASS1_Four  CLASS1_Three
0   1    80            0            1             0
1   2    40            0            0             1
2   3    70            0            0             1
3   4    70            0            1             0
4   5    70            1            0             0
5   6    30            0            0             1

Output

   ID  MATH  CLASS1_Five  CLASS1_Four  CLASS1_Three
0   1    80            0            1             0
1   2    40            0            0             1
2   3    70            0            0             1
3   4    70            0            1             0
4   5    70            1            0             0
5   6    30            0            0             1

prefix

String to be added before column name.

print(pd.get_dummies(my_data,prefix='my'))

Output

   ID  MATH  my_Five  my_Four  my_Three
0   1    80        0        1         0
1   2    40        0        0         1
2   3    70        0        0         1
3   4    70        0        1         0
4   5    70        1        0         0
5   6    30        0        0         1

prefex_sep

Separator to be used between column name and prefix. Default value is #

print(pd.get_dummies(my_data,prefix='my',prefix_sep='-'))

Output

   ID  MATH  my-Five  my-Four  my-Three
0   1    80        0        1         0
1   2    40        0        0         1
2   3    70        0        0         1
3   4    70        0        1         0
4   5    70        1        0         0
5   6    30        0        0         1

columns

List of Column names on which get_dummies() will be applied. By default categorical column is used.

print(pd.get_dummies(my_data,prefix='my',columns=['CLASS1','MATH']))

Output

   ID  my_Five  my_Four  my_Three  my_30  my_40  my_70  my_80
0   1        0        1         0      0      0      0      1
1   2        0        0         1      0      1      0      0
2   3        0        0         1      0      0      1      0
3   4        0        1         0      0      0      1      0
4   5        1        0         0      0      0      1      0
5   6        0        0         1      1      0      0      0

spares

Default value is False ( boolean ).

print(pd.get_dummies(my_data,prefix='my',sparse=True))

drop_first

Boolean , default value is False. If it is set to True then first level is removed.

print(pd.get_dummies(my_data,prefix='my',drop_first=True))

Output

   ID  MATH  my_Four  my_Three
0   1    80        1         0
1   2    40        0         1
2   3    70        0         1
3   4    70        1         0
4   5    70        0         0
5   6    30        0         1

Pandas Plotting graphs mean min sum len Filtering of Data

Numpy arrays Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here