Python Pandas DataFrame Exercise on basci data handling with solutions

Pandas

We used DataFrame of 35 rows by reading from a Excel file.
Download student.xlsx file OR Copy Sample Student DataFrame

Using the above excel file (or DataFrame ) you can try these questions on basics of Data handling.

Read the excel file, or copy the DataFrame.
How many rows and columns are there?
Read the first 4 rows
How many rows are there?
Display the columns of the DataFrame
Display the first 5 records of NAME column of the DataFrame
Display highest 5 records based on the MARK column.
Display all classes in CLASS column ( Unique class )
How to check the dimensions (rows and columns) of a DataFrame?
How to get a concise summary of the DataFrame, including column data types and missing values?
How to get the column names of a DataFrame?
How to get descriptive statistics of numeric columns in the DataFrame?
How to access the first few rows of a DataFrame?
How to filter rows based on a condition?
For example, filter rows where the "Mark" column is greater than 60.
How to select specific columns from a DataFrame?
For example, select only the "Name" and "Gender" columns.
How to sort the DataFrame based on a column?
How to add a new column to a DataFrame?
How to remove a column from a DataFrame?
How to group the DataFrame by a column and calculate summary statistics?
How to handle missing values in a DataFrame?
To check if there are any missing values:
To drop rows with missing values:
To fill missing values with a specific value:
How to access a specific row in a DataFrame based on its index?
How to filter rows based on multiple conditions?
For example, filter rows where the "class" column is Three and the "mark" column is More than 50.
How to calculate the total number of missing values in each column?
How to drop rows with missing values for specific columns? ( say Mark column )
How to apply a function to a column in a DataFrame? ( Add 5 mark for all )
How to merge/join two DataFrames based on a common column?
How to group the DataFrame by multiple columns and calculate summary statistics?
How to create a pivot table from a DataFrame?
How to rename columns in a DataFrame?
How to save a DataFrame to a CSV file?
How to calculate the maximum value in a specific column?
How to calculate the average value in a specific column?
How to calculate the average value in a specific column?
How to calculate the total sum of a specific column?
How to count the number of unique values in a column?
How to apply a function to multiple columns and create a new column based on the result? ( find out class rank based on mark )
How to drop duplicate rows from a DataFrame?
How to convert a column to a different data type?
How to apply a filter to a DataFrame based on a list of values?
How to reset the index of a DataFrame?
How to randomly select 7 rows (sample) from a DataFrame?

DataFrame Basic Questions

Read the excel file, to create the DataFrame.

import pandas as pd 
my_data = pd.read_excel('D:\student.xlsx',index_col='id')
print(my_data)

How many rows and columns are there?

print(my_data.shape) # ( 35, 5)

Read the first 4 rows

How many rows are there?

print(len(my_data)) #  35

Display the columns of the DataFrame

print(my_data.columns)

Output

Index(['id', 'name', 'class', 'mark', 'gender'], dtype='object')

Display the first 5 recrods of NAME column of the DataFrame

print(my_data['name'][:5]) # Five rows of name column

Output

0      John Deo
1      Max Ruin
2        Arnold
3    Krish Star
4     John Mike

Display highest 5 records based on the MARK column.

my_dt=my_data.sort_values(['mark'],ascending=False)
print(my_dt[:5])

Output

    id       name  class  mark     gender
32  33  Kenn Rein    Six    96  female
11  12      Recky    Six    94  female
31  32  Binn Rott  Seven    90  female
10  11     Ronald    Six    89  female
24  25   Giff Tow    Six    88    male

Display all classes in CLASS column ( Unique class )

print(my_data['class'].unique())

Output

['Four' 'Three' 'Five' 'Six' 'Seven' 'Nine' 'Eight']

To get the number of columns

print(len(my_data['class'].unique())) # 7

Exercise
Here is a list of Queries you can execute using the above learning and display the outcome.

List of Queries

Don’t use Query directly to manage MySQL database, instead use Excel files to create DataFrame and then apply Pandas methods to get the answers.

loc mask where query

Pandas Pandas DataFrame iloc - rows and columns by integers

Numpy arrays Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here

	14-07-2021
You are doing amazing job.

Pandas DataFrame Exercise 1

DataFrame Basic Questions

Read the excel file, to create the DataFrame.

How many rows and columns are there?

Read the first 4 rows

How many rows are there?

Display the columns of the DataFrame

Display the first 5 recrods of NAME column of the DataFrame

Display highest 5 records based on the MARK column.

Display all classes in CLASS column ( Unique class )

Subscribe