Pandas DataFrame get_dummies()

Pandas

get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
From categorical variables to dummy / indicator vaiables by using get_dummies().

selfarray, DataFrame
prefixstr (optional ), string to append to column names.
prefix_sepstr ( Optional ),default is '_', Separator to be used in column names
dummy_naBool ( Optional ),default is False, Column is used to indicate NaN values.
columnslist ( Optional ),default is None, columns to be encoded.
sparsedummy columns to be sparse or not
drop_firstBool ( default False ), to remove first level of categorical levels
import pandas as pd 
my_dict={'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,70,30],
         'CLASS1':['Four','Three','Three','Four','Five','Three']}
my_data = pd.DataFrame(data=my_dict)
print(pd.get_dummies(my_data))
Output
   ID  MATH  CLASS1_Five  CLASS1_Four  CLASS1_Three
0   1    80            0            1             0
1   2    40            0            0             1
2   3    70            0            0             1
3   4    70            0            1             0
4   5    70            1            0             0
5   6    30            0            0             1
Output
   ID  MATH  CLASS1_Five  CLASS1_Four  CLASS1_Three
0   1    80            0            1             0
1   2    40            0            0             1
2   3    70            0            0             1
3   4    70            0            1             0
4   5    70            1            0             0
5   6    30            0            0             1

prefix

String to be added before column name.
print(pd.get_dummies(my_data,prefix='my'))
Output
   ID  MATH  my_Five  my_Four  my_Three
0   1    80        0        1         0
1   2    40        0        0         1
2   3    70        0        0         1
3   4    70        0        1         0
4   5    70        1        0         0
5   6    30        0        0         1

prefex_sep

Separator to be used between column name and prefix. Default value is #
print(pd.get_dummies(my_data,prefix='my',prefix_sep='-'))
Output
   ID  MATH  my-Five  my-Four  my-Three
0   1    80        0        1         0
1   2    40        0        0         1
2   3    70        0        0         1
3   4    70        0        1         0
4   5    70        1        0         0
5   6    30        0        0         1

columns

List of Column names on which get_dummies() will be applied. By default categorical column is used.
print(pd.get_dummies(my_data,prefix='my',columns=['CLASS1','MATH']))
Output
   ID  my_Five  my_Four  my_Three  my_30  my_40  my_70  my_80
0   1        0        1         0      0      0      0      1
1   2        0        0         1      0      1      0      0
2   3        0        0         1      0      0      1      0
3   4        0        1         0      0      0      1      0
4   5        1        0         0      0      0      1      0
5   6        0        0         1      1      0      0      0

spares

Default value is False ( boolean ).
print(pd.get_dummies(my_data,prefix='my',sparse=True))

drop_first

Boolean , default value is False. If it is set to True then first level is removed.
print(pd.get_dummies(my_data,prefix='my',drop_first=True))
Output
   ID  MATH  my_Four  my_Three
0   1    80        1         0
1   2    40        0         1
2   3    70        0         1
3   4    70        1         0
4   5    70        0         0
5   6    30        0         1
Pandas Plotting graphs mean min sum len Filtering of Data


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer