import pandas as pd
df = pd.read_xml('E:\\testing\\data\\student.xml', parser='etree')
print(df.head())
### Output:
id name class mark gender
0 1 John Deo Four 75 female
1 2 Max Ruin Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Star Four 60 female
4 5 John Mike Four 60 female
---
Attribute | Default Value | Description |
---|---|---|
path_or_buffer |
None | The file path or buffer containing the XML data. |
xpath |
'.' | An XPath expression to select specific nodes from the XML. |
parser |
'lxml' | The XML parser to use. Options include lxml or etree |
attrs_only |
False | Parse only the attributes at the specified xpath. |
namespaces |
None | A dictionary of namespaces used in the XML. |
df = pd.read_xml('student.xml', xpath='.//Row')
print(df)
### Output:
id name class mark gender
0 1 John Deo Four 75 female
1 2 Max Ruin Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Star Four 60 female
4 5 John Mike Four 60 female
---
from lxml import etree
import pandas as pd
# Parse the XML file
tree = etree.parse('student.xml')
# Extract data from <Row> elements
students = []
for row in tree.xpath('.//Row'): # Select all <Row> nodes
students.append({
'id': row.findtext('id'), # Extract the text of <id>
'class': row.findtext('class') # Extract the text of <class>
})
# Convert the extracted data to a DataFrame
df = pd.DataFrame(students)
print(df)
### Output:
id class
0 1 Four
1 2 Three
2 3 Three
3 4 Four
4 5 Four
---
attrs_only
parameter in read_xml()
extracts only the attributes of XML elements, ignoring their text content. This is especially useful when XML nodes contain both attributes and text. Here's an example:
import pandas as pd
# Example XML with attributes and text content
xml_data = '''
<Students>
<Student id="1" name="John Deo" class="Four">Passed</Student>
<Student id="2" name="Max Ruin" class="Three">Failed</Student>
<Student id="3" name="Arnold" class="Three">Passed</Student>
</Students>
'''
# Reading XML with attrs_only=True
df_attrs_only = pd.read_xml(xml_data, parser='etree', attrs_only=True)
print("With attrs_only=True:")
print(df_attrs_only)
# Reading XML without attrs_only (default behavior)
df_default = pd.read_xml(xml_data, parser='etree')
print("\nWithout attrs_only (default):")
print(df_default)
### Output:
With attrs_only=True
:
id name class
0 1 John Deo Four
1 2 Max Ruin Three
2 3 Arnold Three
Without attrs_only
(default behavior):
id name class Student
0 1 John Deo Four Passed
1 2 Max Ruin Three Failed
2 3 Arnold Three Passed
---
<Student>
element contains attributes (id
, name
, class
) and text content (Passed
or Failed
).attrs_only=True
:
id
, name
, class
) are included in the DataFrame.Passed
or Failed
) is ignored.attrs_only
:
Student
).attrs_only=True
when the text content of XML nodes is not relevant, and only the attributes are needed.namespaces
parameter in read_xml()
allows you to parse XML files that use namespaces. Here's an example:
import pandas as pd
# Example XML with namespaces
xml_data = '''
<ns:Students xmlns:ns="http://example.com/ns">
<ns:Student id="1" name="John Deo" class="Four">Passed</ns:Student>
<ns:Student id="2" name="Max Ruin" class="Three">Failed</ns:Student>
<ns:Student id="3" name="Arnold" class="Three">Passed</ns:Student>
</ns:Students>
'''
# Define the namespace
namespaces = {"ns": "http://example.com/ns"}
# Reading XML with the namespace
df = pd.read_xml(xml_data, xpath=".//ns:Student", namespaces=namespaces)
print(df)
### Output:
id name class Student
0 1 John Deo Four Passed
1 2 Max Ruin Three Failed
2 3 Arnold Three Passed
ns
with the URI http://example.com/ns
.<ns:Student>
and <ns:Students>
are namespaced.namespaces
parameter is a dictionary where keys are prefixes (e.g., ns
) and values are their respective URIs (e.g., http://example.com/ns
).xpath
parameter with the namespace prefix (e.g., .//ns:Student
) to target specific elements.id
, name
, class
) and text content (Passed
or Failed
).import pandas as pd
df = pd.read_xml('nested_student.xml')
print(df)
### Output:
id name marks.score1 marks.score2
0 1 John Deo 75.0 NaN
1 2 Max Ruin 85.0 80.0
2 3 Arnold 55.0 90.0
---
read_xml()
function in Pandas?read_xml()
function?attrs_only
attribute in read_xml()
?attrs_only=True
?attrs_only
is used?attrs_only
attribute help in simplifying XML data extraction?namespaces
parameter in read_xml()
?read_xml()
.xpath
parameter work in conjunction with namespaces
?Author
🎥 Join me live on YouTubePassionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.