XPath is a powerful query language for extracting specific data from XML documents. Python's lxml library offers comprehensive support for XPath, making it ideal for working with deeply nested or complex XML files. This tutorial explains how to use XPath in Python with practical examples.
from lxml import etree
Step 1: Import the etree module from the lxml library, which provides support for XPath queries.
# Load the XML data
xml_data = """
<students>
<student>
<id>1</id>
<name>John Deo</name>
<mark>75</mark>
</student>
<student>
<id>2</id>
<name>Jane Smith</name>
<mark>85</mark>
</student>
</students>
"""
root = etree.fromstring(xml_data)
Step 2: Define your XML data as a string and parse it using etree.fromstring() to create an XML tree.
# Extract all student names
names = root.xpath("//student/name/text()")
print(names)
Explanation: The XPath expression //student/name/text() retrieves the text content of all <name> elements within <student> nodes. The output will be a list of names.
# Extract student with specific id
student = root.xpath("//student[id='2']/name/text()")
print(student)
Explanation: This query selects the <name> of the student whose <id> is "2". The result will be a list containing "Jane Smith".
# Get marks greater than 80
high_scorers = root.xpath("//student[mark > 80]/name/text()")
print(high_scorers)
Explanation: This query filters students with marks greater than 80 and retrieves their names. The result will include students who meet the condition.
<students>
<student>
<id>1</id>
<name>John Deo</name>
<mark>75</mark>
</student>
<student>
<id>2</id>
<name>Jane Smith</name>
<mark>85</mark>
</student>
</students>
XPath queries enable precise data extraction from XML documents, making it a vital tool for working with structured data. Python's lxml library simplifies these queries, allowing developers to handle XML efficiently.