read_csv()

Pandas Pandas input output

CSV file to DataFrame by read_csv() Read and display data from CSV (comman separated value ) file, sales1.csv file.


Reading data from CSV file and creating Pandas DataFrame using read_csv() in Python with options

Download CSV file sales1.csv
import pandas as pd 
sales=pd.read_csv("sales1.csv") 
print(sales)
Output is here
	sale_id  c_id  p_id  product  qty store
0        1     2     3  Monitor    2   ABC
1        2     2     4      CPU    1   DEF
2        3     1     3  Monitor    3   ABC

Place the sales1.csv file in the same folder and then run the above code.

Options :

index_col

By default the DataFrame will add one index column, if we don't want it to add and use one of the column as index column then we can add like this.
sales=pd.read_csv("sales1.csv",index_col='sale_id') 

header

The first line in our example csv file is the column headers, this is same as header=0. The first row or 0th row will be treated as column headers.

If we want to treat the first row as data and not as header then here is the code.
sales=pd.read_csv("sales1.csv",header=None)
Output is here
	0     1     2        3    4      5
0  sale_id  c_id  p_id  product  qty  store
1        1     2     3  Monitor    2    ABC
2        2     2     4      CPU    1    DEF
3        3     1     3  Monitor    3    ABC
In the above code the header row ( first row or 0th row ) is treated as data ( not as column headers ) .

The file have one header row at top but we want to read only data ( not the headers ) or skip the header. Note that by using header=None we will include header as first row data. To remove the header will use skiprows=1
sales=pd.read_csv("sales1.csv",header=None,skiprows=1) 
Output
   0  1  2        3  4    5
0  1  2  3  Monitor  2  ABC
1  2  2  4      CPU  1  DEF
2  3  1  3  Monitor  3  ABC
In above code display only 3rd and 4th columns (first column is 0th column )
sales=pd.read_csv("sales1.csv",header=None,skiprows=1, usecols=[3,4])
Output
         3  4
0  Monitor  2
1      CPU  1
2  Monitor  3

na_values

Using the option na_values we can mark data as NaN and indicate them by using isnull()
blank_values = ["n/a", "na", "--"]
my_data=pd.read_csv("D:\\test-na_values.csv",na_values=blank_values)
my_data['status']=my_data['name'].isnull() # new column added 
print(my_data)
Download test-na_values.csv file

converters

By using converters option we can parse our input data to convert it to a desired dtype using a conversion function.
Here we have one column ( student-percentage.csv ) showing marks in percentage ( 55% ) , by using one function to_float() we have converted the column data to float value while reading the csv file.
import pandas as pd 
def to_float(x):
    return float(x.strip('%'))/100
    #return int(float(x.strip('%'))) # as integer 
df=pd.read_csv("D:\\my_data\\student-percentage.csv",
               converters={'percentage':to_float})
print(df)
Output is here
   id        name  class  mark  gender  percentage
0   1    John Deo   Four    75  female        0.75
1   2    Max Ruin  Three    85    male        0.85
2   3      Arnold  Three    55    male        0.55
3   4  Krish Star   Four    60  female        0.60
4   5   John Mike   Four    60  female        0.60
Download student-percentage.csv file

chunksize

For a large number of rows we can break in chunks while reading the file, here as an example the above csv file is opened with a chunksize=2. We can read part ( or chunk ) of the total rows by this.
import pandas as pd 
df=pd.read_csv("D:\\my_data\\student-percentage.csv",chunksize=2)
for chunk in df:
    print(chunk)
Output
   id      name  class  mark  gender percentage
0   1  John Deo   Four    75  female     75.00%
1   2  Max Ruin  Three    85    male     85.00%
   id        name  class  mark  gender percentage
2   3      Arnold  Three    55    male     55.00%
3   4  Krish Star   Four    60  female     60.00%
   id       name class  mark  gender percentage
4   5  John Mike  Four    60  female     60.00%

All options

pd.read_csv(filepath_or_buffer, 
sep=’, ‘, delimiter=None, header=’infer’, 
names=None, index_col=None, usecols=None, 
squeeze=False, prefix=None, mangle_dupe_cols=True,
 dtype=None, engine=None, converters=None, 
 true_values=None, false_values=None, 
 skipinitialspace=False, skiprows=None,
 nrows=None, na_values=None, keep_default_na=True,
 na_filter=True, verbose=False,
 skip_blank_lines=True, parse_dates=False, 
 infer_datetime_format=False, keep_date_col=False,
 date_parser=None, dayfirst=False, iterator=False,
 chunksize=None, compression=’infer’, thousands=None,
 decimal=b’.’, lineterminator=None, quotechar='”‘, 
 quoting=0, escapechar=None, comment=None, encoding=None,
 dialect=None, tupleize_cols=None, error_bad_lines=True,
 warn_bad_lines=True, skipfooter=0, doublequote=True, 
 delim_whitespace=False, low_memory=True,
 memory_map=False, float_precision=None)

From MySQL to CSV

This is a common requirement as we read data from MySQL database and then save the data in CSV file.
We will further extend this script to read from CSV file and store data in MySQL database.

We are going to use sqlalchemy for our MySQL database connection.

We are first connecting to MySQL database by using our connection userid, password and database name ( db_name ). Then using read_sql() to run the query to get data from student table.

We are writing the data to CSV file by using to_csv().

In the 2nd part of the script we are reading the data from CSV file by using read_csv() and creating a DataFrame. Then we are creating the table by using to_sql(). Here is the complete code.
import pandas as pd 
from sqlalchemy import create_engine
engine = create_engine("mysql+mysqldb://userid:password@localhost/db_name")

sql="SELECT * FROM student "
my_data = pd.read_sql(sql,engine )
my_data.to_csv('D:\my_file.csv',index=False)
### End of storing data to CSV file ###

### Reading data from CSV file and creating table in MySQL ####
student3=pd.read_csv("D:\my_file.csv")
my_data = pd.DataFrame(data=student3)
print(my_data)
### Creating new table student3 or appending existing table 
my_data.to_sql(con=engine,name='student3',if_exists='append')

Data input and output from Pandas DataFrame
Pandas read_excel() to_csv() to_excel() to_string()


plus2net.com



Post your comments , suggestion , error , requirements etc here





Python Video Tutorials
Python SQLite Video Tutorials
Python MySQL Video Tutorials
Python Tkinter Video Tutorials
We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2021 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer