We can get descriptive statistics of DataFrame or series by using describe().
percentiles: Default 25%,50% and 75%. We can specify the list as [.45,.68,.89]. include : 'all' , a list, 'None'. List of datatypes to be included in output exclude :datatypes to be excluded from the output
count 6.000000
mean 58.333333
std 19.407902
min 30.000000
45% 62.500000
50% 65.000000
68% 70.000000
89% 74.500000
max 80.000000
include
We can apply describe() to object type data columns also. By using include='all' output includes all types of columns.
Let us try by using include='all'
print(my_data.describe(include='all'))
Output ( watch the rows unique, top and freq )
NAME ID MATH ENGLISH
count 6 6.000000 6.000000 6.000000
unique 6 NaN NaN NaN
top King NaN NaN NaN
freq 1 NaN NaN NaN
mean NaN 3.500000 58.333333 55.000000
std NaN 1.870829 19.407902 18.708287
min NaN 1.000000 30.000000 30.000000
25% NaN 2.250000 45.000000 42.500000
50% NaN 3.500000 65.000000 55.000000
75% NaN 4.750000 70.000000 67.500000
max NaN 6.000000 80.000000 80.000000
In our example above we don't have any category dtype column so it is not included here. You can see the output with one category column at the end of this page.
unique
Number of distinct object in the column
top
Most frequently occurring object in the column
freq
Number of times the top appearing object in the column
include=[np.object]
print(my_data.describe(include=[np.object]))
Output
NAME
count 6
unique 6
top King
freq 1
include=[np.number]
Show only the numeric type. ( count, mean , std, min, 25%,50%,75%, max )
print(my_data.describe(include=[np.number]))
Output
ID MATH ENGLISH
count 6.000000 6.000000 6.000000
mean 3.500000 58.333333 55.000000
std 1.870829 19.407902 18.708287
min 1.000000 30.000000 30.000000
25% 2.250000 45.000000 42.500000
50% 3.500000 65.000000 55.000000
75% 4.750000 70.000000 67.500000
max 6.000000 80.000000 80.000000
exclude
print(my_data.describe(exclude=['category']))
Output
NAME ID MATH ENGLISH
count 6 6.000000 6.000000 6.000000
unique 6 NaN NaN NaN
top King NaN NaN NaN
freq 1 NaN NaN NaN
mean NaN 3.500000 58.333333 55.000000
std NaN 1.870829 19.407902 18.708287
min NaN 1.000000 30.000000 30.000000
25% NaN 2.250000 45.000000 42.500000
50% NaN 3.500000 65.000000 55.000000
75% NaN 4.750000 70.000000 67.500000
max NaN 6.000000 80.000000 80.000000
There is no category dtype in our example above. Read more with one category dtype at the end of this tutorial.
exclude=[np.number]
print(my_data.describe(exclude=[np.number]))
Output
NAME
count 6
unique 6
top King
freq 1
exclude=[np.object]
Exclude the object type data.
print(my_data.describe(exclude=[np.object]))
Output
ID MATH ENGLISH
count 6.000000 6.000000 6.000000
mean 3.500000 58.333333 55.000000
std 1.870829 19.407902 18.708287
min 1.000000 30.000000 30.000000
25% 2.250000 45.000000 42.500000
50% 3.500000 65.000000 55.000000
75% 4.750000 70.000000 67.500000
max 6.000000 80.000000 80.000000
Using category data type
Here is one sample data with one category dtype ( grade here )
NAME ID MATH ENGLISH grade
count 6 6.000000 6.000000 6.000000 6
unique 6 NaN NaN NaN 3
top Alex NaN NaN NaN b
freq 1 NaN NaN NaN 3
mean NaN 3.500000 58.333333 55.000000 NaN
std NaN 1.870829 19.407902 18.708287 NaN
min NaN 1.000000 30.000000 30.000000 NaN
25% NaN 2.250000 45.000000 42.500000 NaN
50% NaN 3.500000 65.000000 55.000000 NaN
75% NaN 4.750000 70.000000 67.500000 NaN
max NaN 6.000000 80.000000 80.000000 NaN
my_data.describe(include='category')
Output ( only grade column included here )
grade
count 6
unique 3
top b
freq 3
We can remove the grade ( category dtype) and display other columns.