Import Kaggle Datasets into Google Colab using Kaggle API

Introduction
Step 1: Get Kaggle API Key
Step 2: Upload kaggle.json to Colab
Step 3: Configure API Access
Step 4: Download Dataset
Step 5: Unzip and Use Data
Step 6: Part II Convert dataset to other formats

Introduction

Google Colab offers a powerful Python environment with free access to GPUs, ideal for machine learning and data analysis tasks. If you're working with datasets from Kaggle, you can easily connect the two platforms using the Kaggle API.

In this guide, we'll show you how to import Kaggle datasets into Google Colab in five simple steps.

Learn how to seamlessly import & use Kaggle datasets in Google Colab for your data science projects!

Step 1: Get Your Kaggle API Key

Log in to your Kaggle account at kaggle.com
Scroll to the API section inside Settings tab ( click your profile picture at right )
Click on "Create New API Token"
This downloads a file named kaggle.json to your computer

Download the kaggle.json file to your local system.
Creating new token to download kaggle.json file

Step 2: Upload `kaggle.json` to Colab

Use the below code in your Colab notebook to upload the kaggle.json file.

from google.colab import files
files.upload()  # Choose kaggle.json when prompted

Step 3: Configure API Access

Move the file to the correct location and set permissions:

# Create the directory if it doesn't exist
!mkdir -p ~/.kaggle

# Move kaggle.json to the correct directory. Assumes kaggle.json is in the current working directory.
# If your kaggle.json is in a different location, please update the path below.
!mv kaggle.json ~/.kaggle/

# Set permissions for the kaggle.json file
!chmod 600 ~/.kaggle/kaggle.json

Step 4: Download a Dataset

Go to the Kaggle dataset page and copy the dataset name from the URL. For example, for:

kagglehub.dataset_download("yasserh/titanic-dataset")

The dataset path is: yasserh/titanic-dataset

Use the command below to download:

!kaggle datasets download -d yasserh/titanic-dataset

Step 5: Unzip and Load the Dataset

# Unzip the downloaded dataset

!unzip -q titanic-dataset.zip

Remove other files

# Remove the zip file after extraction
!rm titanic-dataset.zip

# Remove other metadata files if they exist and are not needed
!rm -f titanic-dataset.zip.json

Load Dataset by using Pandas

import pandas as pd
df = pd.read_csv("Titanic-Dataset.csv")
df.head()

How many rows and columns are there in Dataset ?

# Get the total number of rows and columns
num_rows, num_cols = df.shape

print(f"Total number of rows: {num_rows}")
print(f"Total number of columns: {num_cols}")

Download the above full source code from Github or run the code in your Google colab platform.

Downloading Kaggle Dataset to Colab platform
https://github.com/plus2net/Python-basics/blob/main/kaggle_dataset_1_downloading.ipynb

Part II : Convert the Data to different formats

Using the above csv ( comma-separated values ) file we will convert the same data to different formats like SQLite database, Json and XML.

To SQLite database

By using Pandas to_sql() we are converting CSV data to SQLite database table.

import pandas as pd  
import sqlite3

# Name of the CSV file to convert  
csv_file = 'Titanic-Dataset.csv'  

# Name of the SQLite database file to create  
db_file = 'titanic.db'  

# Name of the table within the SQLite database  
table_name = 'titanic_data'

# Read the CSV file into a pandas DataFrame  
df = pd.read_csv(csv_file)

# Create an SQLite database connection  
conn = sqlite3.connect(db_file)

# Write the DataFrame to an SQLite table  
df.to_sql(table_name, conn, if_exists='replace', index=False)

# Close the connection  
conn.close()

print(f"Successfully converted '{csv_file}' to SQLite database '{db_file}' with table '{table_name}'.")

Checking Output from SQLite db

import sqlite3  
import pandas as pd

# Connect to the SQLite database  
conn = sqlite3.connect('titanic.db')

# Read the data from the table into a pandas DataFrame  
db_df = pd.read_sql_query("SELECT * FROM titanic_data LIMIT 5;", conn)

# Display the DataFrame  
display(db_df)

# Close the connection  
conn.close()

Json output

XML format

More on XML data format

import pandas as pd  
import xml.etree.ElementTree as ET

# Name of the CSV file to convert  
csv_file = 'Titanic-Dataset.csv'  

# Name of the XML file to create  
xml_file = 'titanic.xml'

# Read the CSV file into a pandas DataFrame  
df = pd.read_csv(csv_file)

# Create the root element for the XML  
root = ET.Element('TitanicData')

# Iterate over DataFrame rows and add them to the XML structure  
for index, row in df.iterrows():  
    record = ET.SubElement(root, 'Record')  
    for col_name, value in row.items():  
        child = ET.SubElement(record, col_name)  
        # Convert NaN to empty string for XML representation  
        child.text = str(value) if pd.notna(value) else ''

# Create an ElementTree object  
tree = ET.ElementTree(root)

# Use pretty_print for better readability  
from xml.dom.minidom import parseString  
xml_string = parseString(ET.tostring(root)).toprettyxml(indent="    ")

with open(xml_file, "w", encoding="utf-8") as f:  
    f.write(xml_string)

print(f"Successfully converted '{csv_file}' to XML file '{xml_file}'.")

Checking output

import os

# Read the first few lines of the XML file  
# We'll read more lines than JSON due to XML's verbose structure  
num_lines_to_display = 30

if os.path.exists('titanic.xml'):  
    with open('titanic.xml', 'r') as f:  
        for i, line in enumerate(f):  
            if i < num_lines_to_display:  
                print(line.strip())  
            else:  
                break  
else:  
    print("Error: 'titanic.xml' not found.")

📦 Want to Add Interactivity to Your Colab?

Enhance your notebook using IPyWidgets for sliders, buttons, and more.

Explore IPyWidgets →

« Back to Colab Try IPyWidgets »

Subhendu Mohapatra

Author

🎥 Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.