Pandas DataFrame mean()


Youtube Live session on Tkinter

Pandas

DataFrame.mean(self, axis=None, skipna=None, level=None, numeric_only=None)
selfarray, elements to get the mean value
axisInt (optional ), or tuple, default is None, mean among all the elements. If axis given then values across the axis is returned.
levelint ( Optional ),default is None, for multiindex Axis. count along the level.
skipnaBool ( Optional ),default is True, Exclude NA values.
numeric_onlyBool ( Optional ),default is None, include only Int, floot and boolean columns.
We can get mean value of rows or columns by using mean().

import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,70,30],
         'ENGLISH':[80,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
print(my_data.mean())
Output
ID          3.5
MATH       60.0
ENGLISH    55.0
dtype: float64
What is the mean mark in MATH ?
print(my_data['MATH'].mean()) # 60.0 
We can get the row or details of the record who got less or equal to the mean mark in MATH
print(my_data[my_data['MATH']<=my_data['MATH'].mean()])
Output is here
   NAME  ID  MATH  ENGLISH
1  Raju   2    40       70
5  Jack   6    30       30

Using axis

Axis of Two dimensional array We will use option axis=1 by adding to above code.

( The last line is only changed )
print(my_data.mean(axis=1))
Output is here
0    53.666667
1    37.333333
2    37.666667
3    41.333333
4    45.000000
5    22.000000
dtype: float64
axis=0 ( default ) is same as output shown at starting of this page.

level option

For MultiIndex (hierarchical) axis we can specify the level.
import pandas as pd 
my_dict=pd.MultiIndex.from_arrays(
         [[1,2,3,4,5,6],
         [80,40,70,70,70,30],
         [80,70,40,50,60,30]],
names=['id','math','eng'])
my_data = pd.Series([4, 2, 0, 8,3,4], name='marks', index=my_dict)
print(my_data.mean(level='math'))
Output
math
80    4.000000
40    2.000000
70    3.666667
30    4.000000
Name: marks, dtype: float64

Handling NA data using skipna option

We will use skipna=True to ignore the null or NA data. Let us check what happens if it is set to True ( skipna=True )
import numpy as np
import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,70,30],
         'ENGLISH':[80,70,np.nan,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
print(my_data.mean(skipna=True))
Output
ID          3.5
MATH       60.0
ENGLISH    58.0
dtype: float64
We will use skipna=False
print(my_data.mean(skipna=False))
Output
ID          3.5
MATH       60.0
ENGLISH     NaN
dtype: float64

numeric_only

Default value is None, we can set it to True ( numeric_only=True ) to include only float, int, boolean columns. We can included all by setting it to False ( numeric_only=False ) . Let us see the outputs .
print(my_data.mean(numeric_only=False))
This will generate error as we have string objects.

TypeError: could not convert string to float: 'RaviRajuAlexRonKingJack'

Pandas Plotting graphs max min sum len std Filtering of Data
Subscribe to our YouTube Channel here


Subscribe

* indicates required
Subscribe to plus2net

    plus2net.com



    Post your comments , suggestion , error , requirements etc here





    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2024 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer