In the above code the header row ( first row or 0th row ) is treated as data ( not as column headers ) .
The file have one header row at top but we want to read only data ( not the headers ) or skip the header. Note that by using header=None we will include header as first row data. To remove the header will use skiprows=1
By using converters option we can parse our input data to convert it to a desired dtype using a conversion function.
Here we have one column ( student-percentage.csv ) showing marks in percentage ( 55% ) , by using one function to_float() we have converted the column data to float value while reading the csv file.
import pandas as pd
def to_float(x):
return float(x.strip('%'))/100
#return int(float(x.strip('%'))) # as integer
df=pd.read_csv("D:\\my_data\\student-percentage.csv",
converters={'percentage':to_float})
print(df)
Output is here
id name class mark gender percentage
0 1 John Deo Four 75 female 0.75
1 2 Max Ruin Three 85 male 0.85
2 3 Arnold Three 55 male 0.55
3 4 Krish Star Four 60 female 0.60
4 5 John Mike Four 60 female 0.60
For a large number of rows we can break in chunks while reading the file, here as an example the above csv file is opened with a chunksize=2. We can read part ( or chunk ) of the total rows by this.
import pandas as pd
df=pd.read_csv("D:\\my_data\\student-percentage.csv",chunksize=2)
for chunk in df:
print(chunk)
Output
id name class mark gender percentage
0 1 John Deo Four 75 female 75.00%
1 2 Max Ruin Three 85 male 85.00%
id name class mark gender percentage
2 3 Arnold Three 55 male 55.00%
3 4 Krish Star Four 60 female 60.00%
id name class mark gender percentage
4 5 John Mike Four 60 female 60.00%
This is a common requirement as we read data from MySQL database and then save the data in CSV file.
We will further extend this script to read from CSV file and store data in MySQL database.
We are going to use sqlalchemy for our MySQL database connection.
We are first connecting to MySQL database by using our connection userid, password and database name ( db_name ). Then using read_sql() to run the query to get data from student table.
We are writing the data to CSV file by using to_csv().
In the 2nd part of the script we are reading the data from CSV file by using read_csv() and creating a DataFrame. Then we are creating the table by using to_sql(). Here is the complete code.
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mysql+mysqldb://userid:password@localhost/db_name")
sql="SELECT * FROM student "
my_data = pd.read_sql(sql,engine )
my_data.to_csv('D:\my_file.csv',index=False)
### End of storing data to CSV file ###
### Reading data from CSV file and creating table in MySQL ####
student3=pd.read_csv("D:\my_file.csv")
my_data = pd.DataFrame(data=student3)
print(my_data)
### Creating new table student3 or appending existing table
my_data.to_sql(con=engine,name='student3',if_exists='append')