Python Pandas
We used DataFrame of 35 rows by reading from a Excel file.
Download student.xlsx file ⇓ OR
Copy Sample Student DataFrame
Using the above excel file (or DataFrame ) you can try these questions on basics of Data handling.
Read the excel file , or copy the DataFrame .
How many rows and columns are there?
Read the first 4 rows
How many rows are there?
Display the columns of the DataFrame
Display the first 5 records of NAME column of the DataFrame
Display highest 5 records based on the MARK column.
Display all classes in CLASS column ( Unique class )
How to check the dimensions (rows and columns) of a DataFrame?
How to get a concise summary of the DataFrame, including column data types and missing values?
How to get the column names of a DataFrame?
How to get descriptive statistics of numeric columns in the DataFrame?
How to access the first few rows of a DataFrame?
How to filter rows based on a condition?
For example, filter rows where the "Mark" column is greater than 60.
How to select specific columns from a DataFrame?
For example, select only the "Name" and "Gender" columns.
How to sort the DataFrame based on a column?
How to add a new column to a DataFrame?
How to remove a column from a DataFrame?
How to group the DataFrame by a column and calculate summary statistics?
How to handle missing values in a DataFrame?
To check if there are any missing values:
To drop rows with missing values:
To fill missing values with a specific value:
How to access a specific row in a DataFrame based on its index?
How to filter rows based on multiple conditions?
For example, filter rows where the "class" column is Three and the "mark" column is More than 50.
How to calculate the total number of missing values in each column?
How to drop rows with missing values for specific columns? ( say Mark column )
How to apply a function to a column in a DataFrame? ( Add 5 mark for all )
How to merge/join two DataFrames based on a common column?
How to group the DataFrame by multiple columns and calculate summary statistics?
How to create a pivot table from a DataFrame?
How to rename columns in a DataFrame?
How to save a DataFrame to a CSV file?
How to calculate the maximum value in a specific column?
How to calculate the average value in a specific column?
How to calculate the average value in a specific column?
How to calculate the total sum of a specific column?
How to count the number of unique values in a column?
How to apply a function to multiple columns and create a new column based on the result? ( find out class rank based on mark )
How to drop duplicate rows from a DataFrame?
How to convert a column to a different data type?
How to apply a filter to a DataFrame based on a list of values?
How to reset the index of a DataFrame?
How to randomly select 7 rows (sample) from a DataFrame?
DataFrame Basic Questions
VIDEO
More Questions at the end of this Page.
Read the excel file, to create the DataFrame.
Read more on how to read excel file here
import pandas as pd
my_data = pd.read_excel('D:\student.xlsx',index_col='id')
print(my_data)
How many rows and columns are there?
print(my_data.shape) # ( 35, 5)
Read the first 4 rows
Read more on DataFrame.head()
print(my_data.head(4))
Output
name class mark gender
id
1 John Deo Four 75 female
2 Max Ruin Three 85 male
3 Arnold Three 55 male
4 Krish Star Four 60 female
How many rows are there?
print(len(my_data)) # 35
Display the columns of the DataFrame
print(my_data.columns)
Output
Index(['id', 'name', 'class', 'mark', 'gender'], dtype='object')
Display the first 5 recrods of NAME column of the DataFrame
print(my_data['name'][:5]) # Five rows of name column
Output
0 John Deo
1 Max Ruin
2 Arnold
3 Krish Star
4 John Mike
Display highest 5 records based on the MARK column.
Read more about sort_values() to arrange rows in increasing or decreasing order
my_dt=my_data.sort_values(['mark'],ascending=False)
print(my_dt[:5])
Output
id name class mark gender
32 33 Kenn Rein Six 96 female
11 12 Recky Six 94 female
31 32 Binn Rott Seven 90 female
10 11 Ronald Six 89 female
24 25 Giff Tow Six 88 male
Display all classes in CLASS column ( Unique class )
print(my_data['class'].unique())
Output
['Four' 'Three' 'Five' 'Six' 'Seven' 'Nine' 'Eight']
To get the number of columns
print(len(my_data['class'].unique())) # 7
Exercise
Here is a list of Queries you can execute using the above learning and display the outcome.
List of Queries
Don’t use Query directly to manage MySQL database, instead use Excel files to create DataFrame and then apply Pandas methods to get the answers.
« loc mask
where
query
« Pandas
Pandas DataFrame
iloc - rows and columns by integers »
← Subscribe to our YouTube Channel here