describe(): Details of DataFrame

Pandas

We can get descriptive statistics of DataFrame or series by using describe().

percentiles: Default 25%,50% and 75%. We can specify the list as [.45,.68,.89].
include : 'all' , a list, 'None'. List of datatypes to be included in output
exclude :datatypes to be excluded from the output

Examples

We will use the options and check the output.
import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,60,30],
         'ENGLISH':[80,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
print(my_data['MATH'].describe())
Output
count     6.000000
mean     58.333333
std      19.407902
min      30.000000
25%      45.000000
50%      65.000000
75%      70.000000
max      80.000000
We can get for full DataFrame
print(my_data.describe())
Output
             ID       MATH    ENGLISH
count  6.000000   6.000000   6.000000
mean   3.500000  58.333333  55.000000
std    1.870829  19.407902  18.708287
min    1.000000  30.000000  30.000000
25%    2.250000  45.000000  42.500000
50%    3.500000  65.000000  55.000000
75%    4.750000  70.000000  67.500000
max    6.000000  80.000000  80.000000
You can see only numeric data type columns are included and one object column Name is not included.
countNumber of objects in the column
meanMean value of objects in the (numeric )column
stdStandard Deviation of objects in the column
minMinimum value appearing in the column
25%25th percentile of objects in the column
50%50th percentile of objects in the column
75%75th percentile of objects in the column
maxMaximum value appearing in the column

percentiles

By default we get value for 25%, 50% and 75%. Now we will select our own percentiles like this percentiles=[.45,.68,.89]
print(my_data['MATH'].describe(percentiles=[.45,.68,.89]))
Output
count     6.000000
mean     58.333333
std      19.407902
min      30.000000
45%      62.500000
50%      65.000000
68%      70.000000
89%      74.500000
max      80.000000

include

We can apply describe() to object type data columns also. By using include='all' output includes all types of columns.
Let us try by using include='all'
print(my_data.describe(include='all'))
Output ( watch the rows unique, top and freq )
        NAME        ID       MATH    ENGLISH
count      6  6.000000   6.000000   6.000000
unique     6       NaN        NaN        NaN
top     King       NaN        NaN        NaN
freq       1       NaN        NaN        NaN
mean     NaN  3.500000  58.333333  55.000000
std      NaN  1.870829  19.407902  18.708287
min      NaN  1.000000  30.000000  30.000000
25%      NaN  2.250000  45.000000  42.500000
50%      NaN  3.500000  65.000000  55.000000
75%      NaN  4.750000  70.000000  67.500000
max      NaN  6.000000  80.000000  80.000000
In our example above we don't have any category dtype column so it is not included here. You can see the output with one category column at the end of this page.
uniqueNumber of distinct object in the column
topMost frequently occurring object in the column
freqNumber of times the top appearing object in the column

include=[np.object]

print(my_data.describe(include=[np.object]))
Output
        NAME
count      6
unique     6
top     King
freq       1

include=[np.number]

Show only the numeric type. ( count, mean , std, min, 25%,50%,75%, max )
print(my_data.describe(include=[np.number]))
Output
             ID       MATH    ENGLISH
count  6.000000   6.000000   6.000000
mean   3.500000  58.333333  55.000000
std    1.870829  19.407902  18.708287
min    1.000000  30.000000  30.000000
25%    2.250000  45.000000  42.500000
50%    3.500000  65.000000  55.000000
75%    4.750000  70.000000  67.500000
max    6.000000  80.000000  80.000000

exclude

print(my_data.describe(exclude=['category']))
Output
        NAME        ID       MATH    ENGLISH
count      6  6.000000   6.000000   6.000000
unique     6       NaN        NaN        NaN
top     King       NaN        NaN        NaN
freq       1       NaN        NaN        NaN
mean     NaN  3.500000  58.333333  55.000000
std      NaN  1.870829  19.407902  18.708287
min      NaN  1.000000  30.000000  30.000000
25%      NaN  2.250000  45.000000  42.500000
50%      NaN  3.500000  65.000000  55.000000
75%      NaN  4.750000  70.000000  67.500000
max      NaN  6.000000  80.000000  80.000000
There is no category dtype in our example above. Read more with one category dtype at the end of this tutorial.

exclude=[np.number]

print(my_data.describe(exclude=[np.number]))
Output
        NAME
count      6
unique     6
top     King
freq       1

exclude=[np.object]

Exclude the object type data.
print(my_data.describe(exclude=[np.object]))
Output
             ID       MATH    ENGLISH
count  6.000000   6.000000   6.000000
mean   3.500000  58.333333  55.000000
std    1.870829  19.407902  18.708287
min    1.000000  30.000000  30.000000
25%    2.250000  45.000000  42.500000
50%    3.500000  65.000000  55.000000
75%    4.750000  70.000000  67.500000
max    6.000000  80.000000  80.000000

Using category data type

Here is one sample data with one category dtype ( grade here )
import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,60,30],
         'ENGLISH':[80,70,40,50,60,30],
         'grade':['a', 'c', 'b', 'b','b','c']}
my_data = pd.DataFrame(data=my_dict)
my_data['grade']=my_data['grade'].astype('category')
my_data.describe(include='all')
Output
	NAME	ID		MATH		ENGLISH		grade
count	6	6.000000	6.000000	6.000000	6
unique	6	NaN		NaN		NaN		3
top	Alex	NaN		NaN		NaN		b
freq	1	NaN		NaN		NaN		3
mean	NaN	3.500000	58.333333	55.000000	NaN
std	NaN	1.870829	19.407902	18.708287	NaN
min	NaN	1.000000	30.000000	30.000000	NaN
25%	NaN	2.250000	45.000000	42.500000	NaN
50%	NaN	3.500000	65.000000	55.000000	NaN
75%	NaN	4.750000	70.000000	67.500000	NaN
max	NaN	6.000000	80.000000	80.000000	NaN
my_data.describe(include='category')
Output ( only grade column included here )
	grade
count	6
unique	3
top	b
freq	3
We can remove the grade ( category dtype) and display other columns.
my_data.describe(exclude='category')
Pandas DataFrame Data types


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer