Python XML Processing

What is XML ?

XML stands for eXtensible Markup Language . The XML standard is a flexible way to create information formats and electronically share structured data via the public Internet, as well as via corporate networks.

XML Parser

The Document Object Model (DOM) defines a standard for accessing and manipulating documents. The XML DOM defines a standard way for accessing and manipulating XML documents. It presents an XML document as a tree-structure.

XML parsing in Python

Python can parse xml documents in several ways. It has traditional dom and sax parsers. This chapter will focus on using inbuilt xml module in python for parsing XML.

Sample XML document

<data> <items> <item name="product1"></item> <item name="product2"></item> <item name="product3"></item> <item name="product4"></item> <item name="product5"></item> </items> </data>
Copy and paste the above xml code in a text file and save it as "data.xml" in working directory.

ElementTree XML API

The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. The Element type is a flexible container object, designed to store hierarchical data structures in memory. example
import xml.etree.ElementTree doc = xml.etree.ElementTree.parse('data.xml').getroot() for elem in doc.findall('items/item'): print (elem.get('name'))
product1 product2 product3 product4 product5

Minimal DOM implementation(xml.dom.minidom)

DOM Example The xml.dom.minidom is a minimal implementation of the Document Object Model interface, with an API similar to that in other languages. It is intended to be simpler than the full DOM and also significantly smaller. Programmers who are not already proficient with the DOM should consider using the xml.etree.ElementTree module for their XML processing instead. example
from xml.dom import minidom xmldoc = minidom.parse('data.xml') product_list = xmldoc.getElementsByTagName('item') print("No of Items : ", len(product_list)) for product in product_list: print(product.attributes['name'].value)
No of Items : 5 product1 product2 product3 product4 product5