We will use sqlalchemy and its create_engine to manage our database connection from Python to MySQL. For this we need to install mysqlclient. All details of installation are given at our MySQL installation page.
Inserting DataFrame to MySQL database table by using to_sql() from Excel or CSV sources
from sqlalchemy import create_engine
my_conn = create_engine("mysql+mysqldb://userid:password@localhost/my_database") #fill details
my_conn=my_conn.connect() # add this line if error
Above code we will use in our examples to manage our connection to MySQL. Note the connection variable my_conn
Sample code to create table
Using this DataFrame we will create a new table in our MySQL database. Here df is our DataFrame.
If the table is already available then we can use if_exists to tell how to handle. It can take three values, fail( default), replace, append.
fail: (default) Generate a value error replace: Drop the table and then insert new values append : Add new data to the table.
index
You can see in above display , the first column is our DataFrame index. This also by default goes to our table. We can stop this by setting index=False
index:(optional) Boolean , if True ( default ) index will be added to the table, in case of False no index will transfer along with records.
If the table already have index column and we set index=False then NULL data will be inserted in index column. If there is no index column and we set if_exists='append' and index=True then we will get error message.
OperationalError: (MySQLdb._exceptions.OperationalError) (1054, "Unknown column 'index' in 'field list'")
index_label
Column Label we can use for index column.
chunksize
Number of rows to be written at a time. By default all rows are written at one go. Used for large number of rows.
dtypedict or scalar, optional
Specifying the datatype for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns.
from sqlalchemy import create_engine,Float
Here we are specifying that the columns Sp_Mobile and Sp_Desktop should be SQLAlchemy Float type.
The format option given in above code as format='%d.%m.%Y' is your available input format of original DataFrame.
Read more on how to convert to datetime field by using to_datetime()
When to use ignore and coerce in to_datetime().
If our column has blank data ( specially when reading from Excel file ) and we want to continue with rest of the data then better to use errors='coerce'. By using ignore we may not able to convert to datatime format for all the rows.
Excel to MySQL
import pandas as pd
df = pd.read_excel('D:\emp.xlsx')
# reading data from root of D drive.
from sqlalchemy import create_engine
my_conn = create_engine("mysql+mysqldb://userid:pw@localhost/db_name")
### Creating new table emp or appending existing table
df.to_sql(con=my_conn,name='emp',if_exists='append')
CSV file to MySQL
import pandas as pd
from sqlalchemy import create_engine
my_conn = create_engine("mysql+mysqldb://userid:pw@localhost/db_name")
df = pd.read_csv("F:\\data\\student.csv") # Path of the csv file, create dataframe
df.to_sql(con=my_conn, name="student5", if_exists="append")
Questions
How do I use the to_sql() function in Pandas to save a DataFrame to a SQL database?
What are the required parameters for the to_sql() function in Pandas?
Can I specify a table name while using the to_sql() function?
What is the default behavior of the to_sql() function when creating a table in the database?
How can I specify the data types for the columns when saving a DataFrame to a database using to_sql()?
Is it possible to append data to an existing table in the database using the to_sql() function?
How do I handle primary keys or unique constraints when saving a DataFrame to a database using to_sql()?
What happens if there are duplicate column names in the DataFrame when using to_sql()?
Can I control the chunk size or batch size for inserting data when using to_sql()?
How can I handle errors or exceptions when saving a DataFrame to a database using to_sql()?