Validating XML with Python: A Step-by-Step Guide

Murunga Kibaara on 2023-05-16

XML nodes are the building blocks of XML documents. They can be elements, attributes, or text nodes. Validating XML nodes is important to ensure that XML documents are well-formed and can be processed correctly.

There are two main ways to validate XML nodes using Python:

The xml.etree.ElementTree module provides a number of functions for validating XML nodes. The most commonly used function is xml.etree.ElementTree.parse(). This function takes an XML document as input and returns an ElementTree object. The ElementTree object has a number of methods for validating nodes, such as .find(), .findall(), and .get().

To validate an XML node using the xml.etree.ElementTree module, you can use the following steps:

  1. Import the xml.etree.ElementTree module.
  2. Parse the XML document into an ElementTree object.
  3. Use the .find() or .findall() method to find the node you want to validate.
  4. Use the .get() method to get the value of the node.
  5. Validate the value of the node.

For example, the following code validates the ItemCode node in the following XML document:

XML

<X110>
  <Item>
    <ItemCode>ABWS</ItemCode>
  </Item>
</X110>

Python

import xml.etree.ElementTree as ET

# Parse the XML document
tree = ET.parse('X110.xml')
# Find the ItemCode node
item_code_node = tree.find('Item/ItemCode')
# Get the value of the ItemCode node
item_code = item_code_node.text
# Validate the value of the ItemCode node
if item_code != 'ABWS':
  raise ValueError('The ItemCode value is invalid.')

The second way to validate XML nodes using Python is to use an external XML schema. An XML schema is a document that defines the structure and content of an XML document. You can use an XML schema to validate XML documents by using the xml.etree.ElementTree.XMLSchema class.

To validate an XML document using an XML schema, you can use the following steps:

  1. Import the xml.etree.ElementTree and xml.etree.ElementTree.XMLSchema modules.
  2. Load the XML schema into a XMLSchema object.
  3. Parse the XML document into an ElementTree object.
  4. Validate the ElementTree object using the .validate() method of the XMLSchema object.

For example, the following code validates the XML document from the previous example using the following XML schema:

XML

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="X110">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Item">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="ItemCode" type="xs:string"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Python

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import XMLSchema

# Load the XML schema
schema = XMLSchema('X110.xsd')
# Parse the XML document
tree = ET.parse('X110.xml')
# Validate the XML document
if not tree.validate(schema):
  raise ValueError('The XML document is invalid.')

Validating XML nodes is an important part of processing XML documents. By validating nodes, you can ensure that XML documents are well-formed and can be processed correctly.