Load Large CSV Files into SQLite with Real-Time Progress Updates


Progress bar showing Data export from CSV file to SQLite database

Select any CSV file by using the file browser on click of a button. Use the read_csv() method to create a Pandas Dataframe. While reading the csv file chunksize attribute of read_csv() is used to copy data in steps and feedback given to Progress bar to reflect the same. Finally the data is stored in a selected SQLite database file.

1. Importing Required Libraries


    import pandas as pd  # For data manipulation
    import sqlite3  # For SQLite database interaction
    import tkinter as tk  # For GUI development
    from tkinter import ttk, filedialog  # For file dialog and progress bar
	
  • Pandas: A library( read_csv()) for reading CSV files and inserting data into SQLite.
  • SQLite: Provides lightweight, serverless database functionality.
  • Tkinter: Used for creating the GUI components.

SQLite3 as Python database connector

2. Function: Load CSV to SQLite with Progress Tracking

def load_csv_to_sqlite():
    # Prompt user to select a CSV file
    file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv"), ("All files", "*.*")])
    if not file_path:
        # Update status label if no file is selected
        status_label.config(text="No file selected", fg="red")
        return

    # Prompt user to select or create an SQLite database
    db_path = filedialog.asksaveasfilename(defaultextension=".db", filetypes=[("SQLite files", "*.db")])
    if not db_path:
        # Update status label if no database is selected
        status_label.config(text="No database selected", fg="red")
        return

    # Connect to the SQLite database
    my_conn = sqlite3.connect(db_path)

    # Infer table name for storing CSV data
    table_name = "data_table"

    # Load CSV in chunks
    chunksize = 1000  # Process 1000 rows at a time
    total_rows = sum(1 for _ in open(file_path)) - 1
    processed_rows = 0  # Initialize processed rows counter

    # Configure the progress bar
    progress_bar['maximum'] = total_rows

    # Read and insert data chunk by chunk
    try:
        for chunk in pd.read_csv(file_path, chunksize=chunksize):
            chunk.to_sql(table_name, my_conn, if_exists='append', index=False)
            processed_rows += len(chunk)  # Update processed rows count
            progress_bar['value'] = processed_rows
            progress_bar.update_idletasks()  # Update GUI
            status_label.config(text=f"Loaded {processed_rows}/{total_rows} rows...")
    except Exception as e:
        status_label.config(text=f"Error: {e}", fg="red")
    else:
        status_label.config(text="Data loaded successfully!", fg="green")
    finally:
        my_conn.close()
  • File Dialog: Allows the user to select the CSV and SQLite files interactively.
  • Chunked Data Loading: Processes 1000 rows at a time for efficiency and progress tracking.
  • Progress Bar: Dynamically updates to show the loading progress.

3. Creating the Tkinter GUI

my_w = tk.Tk()
my_w.title("CSV to SQLite Loader")
my_w.geometry("500x200")

# Create a frame to organize the GUI components
frame = tk.Frame(my_w, padx=20, pady=20)
frame.pack(expand=True, fill="both")

# Title label
title_label = tk.Label(frame, text="CSV to SQLite Loader with Progress Tracking", font=("Arial", 14))
title_label.pack(pady=10)

# Button to start the loading process
load_button = tk.Button(frame, text="Load CSV to SQLite", command=load_csv_to_sqlite)
load_button.pack(pady=5)

# Progress bar
progress_bar = ttk.Progressbar(frame, orient="horizontal", length=400, mode="determinate")
progress_bar.pack(pady=10)

# Status label for displaying progress and error messages
status_label = tk.Label(frame, text="", font=("Arial", 10), fg="blue")
status_label.pack(pady=5)

# Start the Tkinter main loop
my_w.mainloop()
  • Tkinter Layout: Creates and arranges GUI components like buttons, progress bar, and labels.
  • Main Loop: Keeps the application running and responsive.
---

Key Features:

  • Interactive GUI: User-friendly interface for loading CSV files into SQLite.
  • Real-Time Progress Tracking: Progress bar dynamically updates during the loading process.
  • Chunked Data Handling: Efficiently processes large files in manageable chunks.

Excel to SQLite Loader with Progress Bar using #Tkinter & #Pandas


import pandas as pd
import sqlite3
import tkinter as tk
from tkinter import ttk, filedialog

# Function to load CSV and track progress
def load_csv_to_sqlite():
    # Ask user to select a CSV file
    file_path = filedialog.askopenfilename(filetypes=
            [("CSV files", "*.csv"), ("All files", "*.*")])
    if not file_path:
        # Update status label if no file is selected
        status_label.config(text="No file selected", fg="red")
        return

    # Ask user to select or create an SQLite database
    db_path = filedialog.asksaveasfilename(defaultextension=".db", 
        filetypes=[("SQLite files", "*.db")])
    if not db_path:
        # Update status label if no database file is selected
        status_label.config(text="No database selected", fg="red")
        return

    # Connect to the SQLite database
    my_conn = sqlite3.connect(db_path)  # Create connection object

    # Infer table name for storing CSV data in the database
    table_name = "data_table"

    # Load CSV in chunks to handle large files
    chunksize = 1000  # Process 1000 rows at a time
    # Calculate total rows in the CSV (excluding header)
    total_rows = sum(1 for _ in open(file_path)) - 1 
    processed_rows = 0  # Initialize counter for processed rows

    # Configure progress bar maximum value
    progress_bar['maximum'] = total_rows

    # Read and insert data chunk by chunk
    try:
        for chunk in pd.read_csv(file_path, chunksize=chunksize):
            # Append chunk data into the SQLite table
            chunk.to_sql(table_name, my_conn, if_exists='append', index=False)
            processed_rows += len(chunk)  # Update processed rows count

            # Update progress bar
            progress_bar['value'] = processed_rows
            progress_bar.update_idletasks()  # Refresh the GUI to show progress

            # Update status label with progress
            status_label.config(text=f"Loaded {processed_rows}/{total_rows} rows...")
    except Exception as e:
        # Handle exceptions and update status label
        status_label.config(text=f"Error: {e}", fg="red")
    else:
        # Update status label when data is loaded successfully
        status_label.config(text="Data loaded successfully!", fg="green")
    finally:
        # Close the database connection
        my_conn.close()

# Create the Tkinter GUI
my_w = tk.Tk()  # Updated root to my_w
my_w.title("CSV to SQLite Loader")  # Set the title of the GUI window
my_w.geometry("500x200")  # Set the window size

# Add a frame to hold the GUI components
frame = tk.Frame(my_w, padx=20, pady=20)
frame.pack(expand=True, fill="both")

# Add a title label
title_label = tk.Label(frame, 
    text="CSV to SQLite Loader with Progress Tracking", font=("Arial", 14))
title_label.pack(pady=10)

# Add a button to load CSV and start the process
load_button = tk.Button(frame, text="Load CSV to SQLite", command=load_csv_to_sqlite)
load_button.pack(pady=5)

# Add a progress bar to show the loading progress
progress_bar = ttk.Progressbar(frame, orient="horizontal", length=400, mode="determinate")
progress_bar.pack(pady=10)

# Add a status label to display messages
status_label = tk.Label(frame, text="", font=("Arial", 10), fg="blue")
status_label.pack(pady=5)

# Run the Tkinter main loop
my_w.mainloop()

Creating extended CSV or Excel file of 10000 rows of data

Download our sample student.csv or student.excel file with 35 rows of data. You can create your own csv file with few lines of code and use the same.

import pandas as pd  # For data manipulation
import random  # For generating random values

# File paths for input and output
input_file = 'c:\\data\\student.csv'  # Input CSV file path
# input_file = 'c:\\data\\student.xlsx'  # Input Excel file path

output_file = 'c:\\data\\student_extended.csv'  # Output CSV file path
# output_file = 'c:\\data\\student_extended.xlsx'  # Output Excel file path

# Read the file into a DataFrame
df = pd.read_csv(input_file)  # For CSV files
# df = pd.read_excel(input_file)  # For Excel files

# Extend the data to 10,000 records
extended_df = pd.concat([df] * (10000 // len(df) + 1), ignore_index=True)

# Optional: Update the 'mark' column with random values
if 'mark' in extended_df.columns:  
    extended_df['mark'] = extended_df['mark'].apply(lambda x: random.randint(10, 100))

# Slice to exactly 10,000 records
extended_df = extended_df[:10000]

# Save the extended dataset
extended_df.to_csv(output_file, index=False)  # For CSV files
# extended_df.to_excel(output_file, index=False, engine='openpyxl')  # For Excel files

print(f"Extended file created: {output_file}")  # Confirm success

Explanation

  • Imports:
    • pandas: Used for handling and manipulating CSV or Excel data.
    • random: Generates random values for specific columns (e.g., "mark").
  • File Paths:
    • input_file: Specifies the path of the source file (CSV or Excel).
    • output_file: Defines where the extended dataset will be saved (CSV or Excel).
    • Uncomment the appropriate lines based on whether you're using CSV or Excel files.
  • Reading the File:
    • pd.read_csv: Reads data from a CSV file into a Pandas DataFrame.
    • pd.read_excel: Reads data from an Excel file if uncommented.
  • Extending Data:
    • pd.concat: Duplicates the data multiple times to approximate 10,000 rows.
    • Checks for the "mark" column and populates it with random integers between 10 and 100 for added variation.
  • Saving the File:
    • to_csv: Saves the extended dataset as a CSV file.
    • to_excel: Saves the dataset as an Excel file if uncommented, using the openpyxl engine for compatibility.
  • Output Confirmation:
    • Prints a success message with the location of the generated file.

Using large Excel file and showing the progress.


Progress bar showing Data export from Excel file to SQLite database


import pandas as pd
import sqlite3
import tkinter as tk
from tkinter import ttk, filedialog
from openpyxl import load_workbook

# Function to load Excel file in chunks and update progress bar
def load_excel_to_sqlite():
    # Ask user to select an Excel file
    file_path = filedialog.askopenfilename(filetypes=[("Excel files", "*.xlsx"), ("All files", "*.*")])
    if not file_path:
        status_label.config(text="No file selected", fg="red")
        return

    # Ask user to select or create an SQLite database
    db_path = filedialog.asksaveasfilename(defaultextension=".db", filetypes=[("SQLite files", "*.db")])
    if not db_path:
        status_label.config(text="No database selected", fg="red")
        return

    # Connect to the SQLite database
    conn = sqlite3.connect(db_path)
    table_name = "data_table"  # Default table name

    try:
        # Load the workbook in read-only mode
        workbook = load_workbook(file_path, read_only=True)
        sheet = workbook.active  # Get the active worksheet

        # Extract headers from the first row
        rows = sheet.iter_rows(values_only=True)
        headers = next(rows)

        # Initialize variables for progress tracking
        chunk = []  # Temporary storage for a chunk
        row_count = 0
        chunk_size = 1000  # Number of rows per chunk
        total_rows = sheet.max_row - 1  # Exclude header row

        # Configure progress bar
        progress_bar['maximum'] = total_rows

        # Process rows in chunks
        for row in rows:
            chunk.append(row)
            row_count += 1

            # When chunk size is reached, write to SQLite
            if len(chunk) == chunk_size:
                df_chunk = pd.DataFrame(chunk, columns=headers)
                df_chunk.to_sql(table_name, conn, if_exists='append', index=False)
                chunk = []  # Clear the chunk

                # Update progress bar
                progress_bar['value'] = row_count
                progress_bar.update_idletasks()
                status_label.config(text=f"Processed {row_count}/{total_rows} rows...")

        # Process remaining rows
        if chunk:
            df_chunk = pd.DataFrame(chunk, columns=headers)
            df_chunk.to_sql(table_name, conn, if_exists='append', index=False)
            progress_bar['value'] = total_rows
            progress_bar.update_idletasks()

        # Final status
        status_label.config(text="Data loaded successfully!", fg="green")
    except Exception as e:
        status_label.config(text=f"Error: {e}", fg="red")
    finally:
        conn.close()

# Create the Tkinter GUI
my_w = tk.Tk()
my_w.title("Excel to SQLite Loader with Progress Bar")
my_w.geometry("500x200")

# Frame for GUI components
frame = tk.Frame(my_w, padx=20, pady=20)
frame.pack(expand=True, fill="both")

# Title label
title_label = tk.Label(frame, text="Excel to SQLite Loader with Progress Bar", font=("Arial", 14))
title_label.pack(pady=10)

# Button to start the process
load_button = tk.Button(frame, text="Load Excel to SQLite", command=load_excel_to_sqlite)
load_button.pack(pady=5)

# Progress bar
progress_bar = ttk.Progressbar(frame, orient="horizontal", length=400, mode="determinate")
progress_bar.pack(pady=10)

# Status label
status_label = tk.Label(frame, text="", font=("Arial", 10), fg="blue")
status_label.pack(pady=5)

# Run the Tkinter main loop
my_w.mainloop()

Key Changes for Excel File Processing

  • Use of openpyxl:
    • Replaced pd.read_csv with openpyxl for handling Excel files.
    • Used load_workbook to load the Excel file in read-only mode, ensuring memory efficiency for large files.
  • Iterating Rows:
    • Implemented sheet.iter_rows(values_only=True) to iterate through Excel rows directly.
    • Extracted headers from the first row for proper DataFrame creation.
  • Chunked Processing:
    • Manually processed rows in chunks using a temporary list (chunk).
    • Data is written to SQLite after processing each chunk, ensuring step-by-step updates.
  • Progress Bar Updates:
    • Dynamically updates progress for each processed chunk.
    • Uses progress_bar['value'] and update_idletasks() to refresh the UI.
  • Final Enhancements:
    • Handles remaining rows after the last chunk to ensure all data is processed.
    • Displays detailed status messages for user feedback.
Export SQLite database table data to CSV file by using Dataframe
Tkinter Projects Projects Tkinter using Pandas DataFrame

Subscribe to our YouTube Channel here


Subscribe

* indicates required
Subscribe to plus2net

    plus2net.com







    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2024 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer