Pandas DataFrame Exercise 1


Youtube Live session on Tkinter

Pandas

We used DataFrame of 35 rows by reading from a Excel file.
Download student.xlsx file OR Copy Sample Student DataFrame

Using the above excel file (or DataFrame ) you can try these questions on basics of Data handling.
  1. Read the excel file, or copy the DataFrame.
  2. How many rows and columns are there?
  3. Read the first 4 rows
  4. How many rows are there?
  5. Display the columns of the DataFrame
  6. Display the first 5 records of NAME column of the DataFrame
  7. Display highest 5 records based on the MARK column.
  8. Display all classes in CLASS column ( Unique class )
  9. How to check the dimensions (rows and columns) of a DataFrame?
  10. How to get a concise summary of the DataFrame, including column data types and missing values?
  11. How to get the column names of a DataFrame?
  12. How to get descriptive statistics of numeric columns in the DataFrame?
  13. How to access the first few rows of a DataFrame?
  14. How to filter rows based on a condition?
  15. For example, filter rows where the "Mark" column is greater than 60.
  16. How to select specific columns from a DataFrame?
  17. For example, select only the "Name" and "Gender" columns.
  18. How to sort the DataFrame based on a column?
  19. How to add a new column to a DataFrame?
  20. How to remove a column from a DataFrame?
  21. How to group the DataFrame by a column and calculate summary statistics?
  22. How to handle missing values in a DataFrame?
  23. To check if there are any missing values:
  24. To drop rows with missing values:
  25. To fill missing values with a specific value:
  26. How to access a specific row in a DataFrame based on its index?
  27. How to filter rows based on multiple conditions?
  28. For example, filter rows where the "class" column is Three and the "mark" column is More than 50.
  29. How to calculate the total number of missing values in each column?
  30. How to drop rows with missing values for specific columns? ( say Mark column )
  31. How to apply a function to a column in a DataFrame? ( Add 5 mark for all )
  32. How to merge/join two DataFrames based on a common column?
  33. How to group the DataFrame by multiple columns and calculate summary statistics?
  34. How to create a pivot table from a DataFrame?
  35. How to rename columns in a DataFrame?
  36. How to save a DataFrame to a CSV file?
  37. How to calculate the maximum value in a specific column?
  38. How to calculate the average value in a specific column?
  39. How to calculate the average value in a specific column?
  40. How to calculate the total sum of a specific column?
  41. How to count the number of unique values in a column?
  42. How to apply a function to multiple columns and create a new column based on the result? ( find out class rank based on mark )
  43. How to drop duplicate rows from a DataFrame?
  44. How to convert a column to a different data type?
  45. How to apply a filter to a DataFrame based on a list of values?
  46. How to reset the index of a DataFrame?
  47. How to randomly select 7 rows (sample) from a DataFrame?

  • DataFrame Basic Questions


Read the excel file, to create the DataFrame.


Read more on how to read excel file here
import pandas as pd 
my_data = pd.read_excel('D:\student.xlsx',index_col='id')
print(my_data)

How many rows and columns are there?

print(my_data.shape) # ( 35, 5)

Read the first 4 rows

Read more on DataFrame.head()
print(my_data.head(4))
Output
          name  class  mark     gender
id                                 
1     John Deo   Four    75  female
2     Max Ruin  Three    85    male
3       Arnold  Three    55    male
4   Krish Star   Four    60  female

How many rows are there?

print(len(my_data)) #  35

Display the columns of the DataFrame

print(my_data.columns)
Output
Index(['id', 'name', 'class', 'mark', 'gender'], dtype='object')

Display the first 5 recrods of NAME column of the DataFrame

print(my_data['name'][:5]) # Five rows of name column
Output
0      John Deo
1      Max Ruin
2        Arnold
3    Krish Star
4     John Mike

Display highest 5 records based on the MARK column.

Read more about sort_values() to arrange rows in increasing or decreasing order
my_dt=my_data.sort_values(['mark'],ascending=False)
print(my_dt[:5])
Output
    id       name  class  mark     gender
32  33  Kenn Rein    Six    96  female
11  12      Recky    Six    94  female
31  32  Binn Rott  Seven    90  female
10  11     Ronald    Six    89  female
24  25   Giff Tow    Six    88    male

Display all classes in CLASS column ( Unique class )

print(my_data['class'].unique())
Output
['Four' 'Three' 'Five' 'Six' 'Seven' 'Nine' 'Eight']
To get the number of columns
print(len(my_data['class'].unique())) # 7
loc mask where query

Pandas Pandas DataFrame iloc - rows and columns by integers
Subscribe to our YouTube Channel here


Subscribe

* indicates required
Subscribe to plus2net

    plus2net.com



    14-07-2021

    You are doing amazing job.

    Post your comments , suggestion , error , requirements etc here





    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2024 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer