Querying XML with XPath in Python: A Practical Guide

XPath is a powerful query language for extracting specific data from XML documents. Python's lxml library offers comprehensive support for XPath, making it ideal for working with deeply nested or complex XML files. This tutorial explains how to use XPath in Python with practical examples.

Code Breakdown:


from lxml import etree

Step 1: Import the etree module from the lxml library, which provides support for XPath queries.


# Load the XML data
xml_data = """
<students>
    <student>
        <id>1</id>
        <name>John Deo</name>
        <mark>75</mark>
    </student>
    <student>
        <id>2</id>
        <name>Jane Smith</name>
        <mark>85</mark>
    </student>
</students>
"""
root = etree.fromstring(xml_data)

Step 2: Define your XML data as a string and parse it using etree.fromstring() to create an XML tree.

Querying with XPath


# Extract all student names
names = root.xpath("//student/name/text()")
print(names)

Explanation: The XPath expression //student/name/text() retrieves the text content of all <name> elements within <student> nodes. The output will be a list of names.

Filtering with Attributes


# Extract student with specific id
student = root.xpath("//student[id='2']/name/text()")
print(student)

Explanation: This query selects the <name> of the student whose <id> is "2". The result will be a list containing "Jane Smith".

Advanced Query Example


# Get marks greater than 80
high_scorers = root.xpath("//student[mark > 80]/name/text()")
print(high_scorers)

Explanation: This query filters students with marks greater than 80 and retrieves their names. The result will include students who meet the condition.

Sample Output XML:


<students>
    <student>
        <id>1</id>
        <name>John Deo</name>
        <mark>75</mark>
    </student>
    <student>
        <id>2</id>
        <name>Jane Smith</name>
        <mark>85</mark>
    </student>
</students>

Use Cases:

Extracting specific data points from large XML documents.
Filtering XML data based on conditions.
Navigating complex or deeply nested XML structures.

Conclusion:

XPath queries enable precise data extraction from XML documents, making it a vital tool for working with structured data. Python's lxml library simplifies these queries, allowing developers to handle XML efficiently.

XML Overview Create XML Modify XML Save XML Python Tutorials

Subhendu Mohapatra

Author

🎥 Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.

Subscribe to our YouTube Channel here

How to Use XPath for Querying XML in Python