read_csv()

CSV file to DataFrame by read_csv() Read and display data from CSV (comman separated value ) file, student.csv file.


read_csv(): Create Pandas DataFrame using data from CSV file #B03

Download sample CSV file ( student )

import pandas as pd
df=pd.read_csv('C:\\data\\student.csv') # use your system path
print(df.head())
Output is here
	   id        name  class  mark  gender
0   1    John Deo   Four    75  female
1   2    Max Ruin  Three    85    male
2   3      Arnold  Three    55    male
3   4  Krish Star   Four    60  female
4   5   John Mike   Four    60  female
Options :

index_col

By default the DataFrame will add one index column, if we don't want it to add and use one of the column as index column then we can add like this.
df=pd.read_csv('C:\\data\\student.csv',index_col='id')

header

The first line in our example csv file is the column headers, this is same as header=0. The first row or 0th row will be treated as column headers.

If we want to treat the first row as data and not as header then here is the code.
df=pd.read_csv('C:\\data\\student.csv',header=None)
Output is here
    0           1      2     3       4
0  id        name  class  mark  gender
1   1    John Deo   Four    75  female
2   2    Max Ruin  Three    85    male
3   3      Arnold  Three    55    male
4   4  Krish Star   Four    60  female
In the above code the header row ( first row or 0th row ) is treated as data ( not as column headers ) .

The file have one header row at top but we want to read only data ( not the headers ) or skip the header. Note that by using header=None we will include header as first row data. To remove the header will use skiprows=1
df=pd.read_csv('C:\\data\\student.csv',header=None,skiprows=1)
Output
    0           1      2   3       4
0  1    John Deo   Four  75  female
1  2    Max Ruin  Three  85    male
2  3      Arnold  Three  55    male
3  4  Krish Star   Four  60  female
4  5   John Mike   Four  60  female
In above code display only 3rd and 4th columns (first column is 0th column )
df=pd.read_csv('C:\\data\\student.csv',header=None,skiprows=1,usecols=[3,4])
Output
    3       4
0  75  female
1  85    male
2  55    male
3  60  female
4  60  female

names

We can use our own headers while creating the DataFrame. Here is a list of headers and same is used by using names option.
my_names=['my_id','my_name','my_class','my_mark','my_gender']
df=pd.read_csv('student.csv',names=my_names)
Output
  my_id     my_name my_class my_mark my_gender
0    id        name    class    mark    gender
1     1    John Deo     Four      75    female
2     2    Max Ruin    Three      85      male
3     3      Arnold    Three      55      male
4     4  Krish Star     Four      60    female
To remove the original headers, use header=0
df=pd.read_csv('student.csv',names=my_names,header=0)

na_values

Using the option na_values we can mark data as NaN and indicate them by using isnull()
blank_values = ["n/a", "na", "--"]
my_data=pd.read_csv("D:\\test-na_values.csv",na_values=blank_values)
my_data['status']=my_data['name'].isnull() # new column added 
print(my_data)
Download test-na_values.csv file

skiprows

Skip the first 6 rows and then read.
df=pd.read_csv(path,skiprows=6)
Output
    6    Alex John   Four  55    male
0   7  My John Rob  Fifth  78    male
1   8       Asruid   Five  85    male
2   9      Tes Qry    Six  78    male
3  10     Big John   Four  55  female
4  11       Ronald    Six  89  female

converters

By using converters option we can parse our input data to convert it to a desired dtype using a conversion function.
Here by using one function to_float() we have converted the mark column data to float value while reading the csv file.
import pandas as pd
def to_float(x):
    return float(x.strip('%'))/100
    #return int(float(x.strip('%'))) # as integer 
df=pd.read_csv('C:\\data\\student.csv',converters={'mark':to_float})
print(df.head())
Output is here
   id        name  class  mark  gender
0   1    John Deo   Four  0.75  female
1   2    Max Ruin  Three  0.85    male
2   3      Arnold  Three  0.55    male
3   4  Krish Star   Four  0.60  female
4   5   John Mike   Four  0.60  female

chunksize

For a large number of rows we can break in chunks while reading the file, here as an example the above csv file is opened with a chunksize=2. We can read part ( or chunk ) of the total rows by this.
import pandas as pd 
df=pd.read_csv('C:\\data\\student.csv',chunksize=3)
for chunk in df:
    print(chunk)
Output
  id      name  class  mark  gender
0   1  John Deo   Four    75  female
1   2  Max Ruin  Three    85    male
2   3    Arnold  Three    55    male
   id        name class  mark  gender
3   4  Krish Star  Four    60  female
4   5   John Mike  Four    60  female
5   6   Alex John  Four    55    male

All options

pd.read_csv(filepath_or_buffer, 
sep=’, ‘, delimiter=None, header=’infer’, 
names=None, index_col=None, usecols=None, 
squeeze=False, prefix=None, mangle_dupe_cols=True,
 dtype=None, engine=None, converters=None, 
 true_values=None, false_values=None, 
 skipinitialspace=False, skiprows=None,
 nrows=None, na_values=None, keep_default_na=True,
 na_filter=True, verbose=False,
 skip_blank_lines=True, parse_dates=False, 
 infer_datetime_format=False, keep_date_col=False,
 date_parser=None, dayfirst=False, iterator=False,
 chunksize=None, compression=’infer’, thousands=None,
 decimal=b’.’, lineterminator=None, quotechar='”‘, 
 quoting=0, escapechar=None, comment=None, encoding=None,
 dialect=None, tupleize_cols=None, error_bad_lines=True,
 warn_bad_lines=True, skipfooter=0, doublequote=True, 
 delim_whitespace=False, low_memory=True,
 memory_map=False, float_precision=None)

From MySQL to CSV

This is a common requirement as we read data from MySQL database and then save the data in CSV file.
We will further extend this script to read from CSV file and store data in MySQL database.

We are going to use sqlalchemy for our MySQL database connection.

We are first connecting to MySQL database by using our connection userid, password and database name ( db_name ). Then using read_sql() to run the query to get data from student table.

We are writing the data to CSV file by using to_csv().

In the 2nd part of the script we are reading the data from CSV file by using read_csv() and creating a DataFrame. Then we are creating the table by using to_sql(). Here is the complete code.
import pandas as pd 
from sqlalchemy import create_engine
engine = create_engine("mysql+mysqldb://userid:password@localhost/db_name")

sql="SELECT * FROM student "
my_data = pd.read_sql(sql,engine )
my_data.to_csv('D:\my_file.csv',index=False)
### End of storing data to CSV file ###

### Reading data from CSV file and creating table in MySQL ####
student3=pd.read_csv("D:\my_file.csv")
my_data = pd.DataFrame(data=student3)
print(my_data)
### Creating new table student3 or appending existing table 
my_data.to_sql(con=engine,name='student3',if_exists='append')

Questions

How read_csv() is used to get data from Google GA-4 report
Data input and output from Pandas DataFrame
Pandas read_excel() to_csv() to_excel() to_string()
Sample student DataFrame
Subscribe to our YouTube Channel here


Subscribe

* indicates required
Subscribe to plus2net

    plus2net.com



    Post your comments , suggestion , error , requirements etc here





    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2024 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer