Import Kaggle Datasets into Google Colab


Introduction

Google Colab offers a powerful Python environment with free access to GPUs, ideal for machine learning and data analysis tasks. If you're working with datasets from Kaggle, you can easily connect the two platforms using the Kaggle API.

In this guide, we'll show you how to import Kaggle datasets into Google Colab in five simple steps.

Sign in to Kaggle account

Learn how to seamlessly import & use Kaggle datasets in Google Colab for your data science projects!

Step 1: Get Your Kaggle API Key

  1. Log in to your Kaggle account at kaggle.com
  2. Scroll to the API section inside Settings tab ( click your profile picture at right )
  3. Click on "Create New API Token"
  4. This downloads a file named kaggle.json to your computer
Settings link inside Kaggle account

Download the kaggle.json file to your local system.
Creating new token to download kaggle.json file

Step 2: Upload kaggle.json to Colab

Use the below code in your Colab notebook to upload the kaggle.json file.

from google.colab import files
files.upload()  # Choose kaggle.json when prompted

Step 3: Configure API Access

Move the file to the correct location and set permissions:

# Create the directory if it doesn't exist
!mkdir -p ~/.kaggle

# Move kaggle.json to the correct directory. Assumes kaggle.json is in the current working directory.
# If your kaggle.json is in a different location, please update the path below.
!mv kaggle.json ~/.kaggle/

# Set permissions for the kaggle.json file
!chmod 600 ~/.kaggle/kaggle.json

Step 4: Download a Dataset

Creating new token to download kaggle.json file

Go to the Kaggle dataset page and copy the dataset name from the URL. For example, for:

kagglehub.dataset_download("yasserh/titanic-dataset")

The dataset path is: yasserh/titanic-dataset

Use the command below to download:

!kaggle datasets download -d yasserh/titanic-dataset

Step 5: Unzip and Load the Dataset

# Unzip the downloaded dataset
!unzip -q titanic-dataset.zip
Remove other files
# Remove the zip file after extraction
!rm titanic-dataset.zip

# Remove other metadata files if they exist and are not needed
!rm -f titanic-dataset.zip.json
Load Dataset by using Pandas
import pandas as pd
df = pd.read_csv("Titanic-Dataset.csv")
df.head()
How many rows and columns are there in Dataset ?
# Get the total number of rows and columns
num_rows, num_cols = df.shape

print(f"Total number of rows: {num_rows}")
print(f"Total number of columns: {num_cols}")

Part II : Convert the Data to different formats

Using the above csv ( comma-separated values ) file we will convert the same data to different formats like SQLite database, Json and XML.

To SQLite database

By using Pandas to_sql() we are converting CSV data to SQLite database table.
import pandas as pd  
import sqlite3

# Name of the CSV file to convert  
csv_file = 'Titanic-Dataset.csv'  

# Name of the SQLite database file to create  
db_file = 'titanic.db'  

# Name of the table within the SQLite database  
table_name = 'titanic_data'

# Read the CSV file into a pandas DataFrame  
df = pd.read_csv(csv_file)

# Create an SQLite database connection  
conn = sqlite3.connect(db_file)

# Write the DataFrame to an SQLite table  
df.to_sql(table_name, conn, if_exists='replace', index=False)

# Close the connection  
conn.close()

print(f"Successfully converted '{csv_file}' to SQLite database '{db_file}' with table '{table_name}'.")
Checking Output from SQLite db
import sqlite3  
import pandas as pd

# Connect to the SQLite database  
conn = sqlite3.connect('titanic.db')

# Read the data from the table into a pandas DataFrame  
db_df = pd.read_sql_query("SELECT * FROM titanic_data LIMIT 5;", conn)

# Display the DataFrame  
display(db_df)

# Close the connection  
conn.close()

Json output

More on Json format
import pandas as pd

# Name of the CSV file to convert  
csv_file = 'Titanic-Dataset.csv'

# Name of the JSON file to create  
json_file = 'titanic.json'

# Read the CSV file into a pandas DataFrame  
df = pd.read_csv(csv_file)

# Convert the DataFrame to a JSON file  
df.to_json(json_file, orient='records', indent=4)

print(f"Successfully converted '{csv_file}' to JSON file '{json_file}'.")
Checking output
import json

# Read the first few lines of the JSON file  
with open('titanic.json', 'r') as f:  
    for i, line in enumerate(f):  
        if i < 15:  # Displaying first 15 lines for brevity  
            print(line.strip())  
        else:  
            break

XML format

More on XML data format
import pandas as pd  
import xml.etree.ElementTree as ET

# Name of the CSV file to convert  
csv_file = 'Titanic-Dataset.csv'  

# Name of the XML file to create  
xml_file = 'titanic.xml'

# Read the CSV file into a pandas DataFrame  
df = pd.read_csv(csv_file)

# Create the root element for the XML  
root = ET.Element('TitanicData')

# Iterate over DataFrame rows and add them to the XML structure  
for index, row in df.iterrows():  
    record = ET.SubElement(root, 'Record')  
    for col_name, value in row.items():  
        child = ET.SubElement(record, col_name)  
        # Convert NaN to empty string for XML representation  
        child.text = str(value) if pd.notna(value) else ''

# Create an ElementTree object  
tree = ET.ElementTree(root)

# Use pretty_print for better readability  
from xml.dom.minidom import parseString  
xml_string = parseString(ET.tostring(root)).toprettyxml(indent="    ")

with open(xml_file, "w", encoding="utf-8") as f:  
    f.write(xml_string)

print(f"Successfully converted '{csv_file}' to XML file '{xml_file}'.")
Checking output
import os

# Read the first few lines of the XML file  
# We'll read more lines than JSON due to XML's verbose structure  
num_lines_to_display = 30

if os.path.exists('titanic.xml'):  
    with open('titanic.xml', 'r') as f:  
        for i, line in enumerate(f):  
            if i < num_lines_to_display:  
                print(line.strip())  
            else:  
                break  
else:  
    print("Error: 'titanic.xml' not found.")

๐Ÿ“ฆ Want to Add Interactivity to Your Colab?

Enhance your notebook using IPyWidgets for sliders, buttons, and more.

Explore IPyWidgets โ†’



Subhendu Mohapatra โ€” author at plus2net
Subhendu Mohapatra

Author

๐ŸŽฅ Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and projectโ€‘oriented with real examples and source code.



Subscribe to our YouTube Channel here



plus2net.com







Python Video Tutorials
Python SQLite Video Tutorials
Python MySQL Video Tutorials
Python Tkinter Video Tutorials
We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles Contact us
©2000-2025   plus2net.com   All rights reserved worldwide Privacy Policy Disclaimer