« Pandas
Read and display data from sales1.csv file.
Download CSV file sales1.csv ⇓
import pandas as pd
sales=pd.read_csv("sales1.csv")
print(sales)
Output is here
sale_id c_id p_id product qty store
0 1 2 3 Monitor 2 ABC
1 2 2 4 CPU 1 DEF
2 3 1 3 Monitor 3 ABC
Place the sales1.csv file in the same folder and then run the above code.
Options :
header
The first line in our example csv file is the column headers, this is same as header=0. The first row or 0th row will be treated as column headers.
If we want to treat the first row as data and not as header then here is the code.
sales=pd.read_csv("sales1.csv",header=None)
Output is here
0 1 2 3 4 5
0 sale_id c_id p_id product qty store
1 1 2 3 Monitor 2 ABC
2 2 2 4 CPU 1 DEF
3 3 1 3 Monitor 3 ABC
In the above code the header row ( first row or 0th row ) is treated as data ( not as column headers ) .
The file have one header row at top but we want to read only data ( not the headers ) or skip the header. Note that by using header=None we will include header as first row data. To remove the header will use skiprows=1
sales=pd.read_csv("sales1.csv",header=None,skiprows=1)
Output
0 1 2 3 4 5
0 1 2 3 Monitor 2 ABC
1 2 2 4 CPU 1 DEF
2 3 1 3 Monitor 3 ABC
In above code display only 3rd and 4th columns (first column is 0th column )
sales=pd.read_csv("sales1.csv",header=None,skiprows=1, usecols=[3,4])
Output
3 4
0 Monitor 2
1 CPU 1
2 Monitor 3
na_values
Using the option na_values we can mark data as NaN and indicate them by using isnull()
blank_values = ["n/a", "na", "--"]
my_data=pd.read_csv("D:\\test-na_values.csv",na_values=blank_values)
my_data['status']=my_data['name'].isnull() # new column added
print(my_data)
Download test-na_values.csv file ⇓
All options
pd.read_csv(filepath_or_buffer,
sep=’, ‘, delimiter=None, header=’infer’,
names=None, index_col=None, usecols=None,
squeeze=False, prefix=None, mangle_dupe_cols=True,
dtype=None, engine=None, converters=None,
true_values=None, false_values=None,
skipinitialspace=False, skiprows=None,
nrows=None, na_values=None, keep_default_na=True,
na_filter=True, verbose=False,
skip_blank_lines=True, parse_dates=False,
infer_datetime_format=False, keep_date_col=False,
date_parser=None, dayfirst=False, iterator=False,
chunksize=None, compression=’infer’, thousands=None,
decimal=b’.’, lineterminator=None, quotechar='”‘,
quoting=0, escapechar=None, comment=None, encoding=None,
dialect=None, tupleize_cols=None, error_bad_lines=True,
warn_bad_lines=True, skipfooter=0, doublequote=True,
delim_whitespace=False, low_memory=True,
memory_map=False, float_precision=None)
From MySQL to CSV
This is a common requirement as we read data from MySQL database and then save the data in CSV file.
We will further extend this script to read from CSV file and store data in MySQL database.
We are going to use sqlalchemy for our MySQL database connection.
We are first connecting to MySQL database by using our connection userid, password and database name ( db_name ). Then using read_sql() to run the query to get data from student table.
We are writing the data to CSV file by using to_csv().
In the 2nd part of the script we are reading the data from CSV file by using read_csv() and creating a DataFrame. Then we are creating the table by using to_sql(). Here is the complete code.
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mysql+mysqldb://userid:password@localhost/db_name")
sql="SELECT * FROM student "
my_data = pd.read_sql(sql,engine )
my_data.to_csv('D:\my_file.csv',index=False)
### End of storing data to CSV file ###
### Reading data from CSV file and creating table in MySQL ####
student3=pd.read_csv("D:\my_file.csv")
my_data = pd.DataFrame(data=student3)
print(my_data)
### Creating new table student3 or appending existing table
my_data.to_sql(con=engine,name='student3',if_exists='append')
« Data input and output from Pandas DataFrame
« Pandas
read_excel()
to_csv()
to_excel()