read and write to MySQL from Pandas DataFrame to_sql() by using sqlalchemy create

DataFrame and MySQL table using to_sql()

We will use sqlalchemy and its create_engine to manage our database connection from Python to MySQL. For this we need to install mysqlclient. All details of installation are given at our MySQL installation page.

Inserting DataFrame to MySQL database table by using to_sql() from Excel or CSV sources

Installing mysqlclient with connection string

Use the SQL dump to create student table with sample data.

from sqlalchemy import create_engine
my_conn = create_engine("mysql+mysqldb://userid:password@localhost/my_database") #fill details
my_conn=my_conn.connect() # add this line if error

Above code we will use in our examples to manage our connection to MySQL. Note the connection variable my_conn

Sample code to create table

Using this DataFrame we will create a new table in our MySQL database. Here df is our DataFrame.

df.to_sql(con=my_conn,name='student2',if_exists='append')

to_sql() is used to Create & insert Data to MySQL database table

name

Name of the Database table.

con

Connection to Database.

if_exists

If the table is already available then we can use if_exists to tell how to handle. It can take three values, fail( default), replace, append.

fail: (default) Generate a value error
replace: Drop the table and then insert new values
append : Add new data to the table.

index

You can see in above display , the first column is our DataFrame index. This also by default goes to our table. We can stop this by setting index=False

index:(optional) Boolean , if True ( default ) index will be added to the table, in case of False no index will transfer along with records.

df.to_sql(con=my_conn,name='student2',if_exists='append', index=False)

If the table already have index column and we set index=False then NULL data will be inserted in index column. If there is no index column and we set if_exists='append' and index=True then we will get error message.

OperationalError: (MySQLdb._exceptions.OperationalError) (1054, "Unknown column 'index' in 'field list'")

index_label

Column Label we can use for index column.

chunksize

Number of rows to be written at a time. By default all rows are written at one go. Used for large number of rows.

dtypedict or scalar, optional

Specifying the datatype for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns.

from sqlalchemy import create_engine,Float

Here we are specifying that the columns Sp_Mobile and Sp_Desktop should be SQLAlchemy Float type.

df.to_sql(con=my_conn,name=d,if_exists='replace',
	index=False,dtype={"Sp_Mobile": Float,"Sp_Desktop":Float})

Handling Date & time column

If you have Date field then you can change the column to Y-m-d format to store in mysql table.

my_data['c_date']=pd.to_datetime(my_data['c_date'],format='%d.%m.%Y')

The format option given in above code as format='%d.%m.%Y' is your available input format of original DataFrame.
Read more on how to convert to datetime field by using to_datetime()

When to use ignore and coerce in to_datetime().

If our column has blank data ( specially when reading from Excel file ) and we want to continue with rest of the data then better to use errors='coerce'. By using ignore we may not able to convert to datatime format for all the rows.

Excel to MySQL

import pandas as pd 
df = pd.read_excel('D:\emp.xlsx')
# reading data from root of D drive. 
from sqlalchemy import create_engine
my_conn = create_engine("mysql+mysqldb://userid:pw@localhost/db_name")

### Creating new table emp or appending existing table 
df.to_sql(con=my_conn,name='emp',if_exists='append')

CSV file to MySQL

import pandas as pd
from sqlalchemy import create_engine

my_conn = create_engine("mysql+mysqldb://userid:pw@localhost/db_name")
df = pd.read_csv("F:\\data\\student.csv")  # Path of the csv file, create dataframe
df.to_sql(con=my_conn, name="student5", if_exists="append")

Questions

How do I use the to_sql() function in Pandas to save a DataFrame to a SQL database?
What are the required parameters for the to_sql() function in Pandas?
Can I specify a table name while using the to_sql() function?
What is the default behavior of the to_sql() function when creating a table in the database?
How can I specify the data types for the columns when saving a DataFrame to a database using to_sql()?
Is it possible to append data to an existing table in the database using the to_sql() function?
How do I handle primary keys or unique constraints when saving a DataFrame to a database using to_sql()?
What happens if there are duplicate column names in the DataFrame when using to_sql()?
Can I control the chunk size or batch size for inserting data when using to_sql()?
How can I handle errors or exceptions when saving a DataFrame to a database using to_sql()?

DataFrame to SQlite table at Colab platform using to_sql()
Data input and output from Pandas DataFrame Download sample Excel or CSV file or create DataFrame
Pandas DataFrame read_sql read_sql_table insert data

Pandas DataFrame Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here

to_sql to Pass data from DataFrame to MySQL