Insecure XML Parsing (XXE Vulnerability)

MEDIUM

xml.etree.ElementTree is vulnerable to XML External Entity (XXE) attacks. Use defusedxml for safe XML parsing.

Rule Information

Language

Python

Interactive Playground

Experiment with the vulnerable code and security rule below. Edit the code to see how the rule detects different vulnerability patterns.

pathfinder scan --ruleset python/PYTHON-LANG-SEC-090 --project .

rule.py

About This Rule

Understanding the vulnerability and how it is detected

Python's xml.etree.ElementTree and related modules (xml.dom.minidom, xml.sax, xml.parsers.expat) are vulnerable to XML External Entity (XXE) injection by default. XXE attacks occur when an XML parser processes external entity references in untrusted XML content.

A malicious XML document can define external entities that reference local files (/etc/passwd, /etc/shadow, SSH private keys) or internal network resources. When the parser resolves these entities, the file contents are embedded in the parsed XML and returned to the attacker, or the parser makes HTTP/HTTPS requests to internal network addresses for server-side request forgery (SSRF).

The defusedxml library provides safe drop-in replacements for all of Python's XML parsers that disable external entities, DTD processing, and other dangerous XML features.

Security Implications

Potential attack scenarios if this vulnerability is exploited

Local File Disclosure via XXE

An XXE payload with <!ENTITY xxe SYSTEM "file:///etc/passwd"> reads local files and embeds their content in the parsed XML. Attackers can read application configuration files, SSH private keys, database credentials, and any file readable by the web server process.

Server-Side Request Forgery (SSRF)

External entity references can point to internal network URLs (http://169.254.169.254/ for AWS metadata, http://10.0.0.1/admin). The XML parser makes HTTP requests to these addresses, enabling internal network discovery and access to cloud metadata credentials.

Denial of Service via Billion Laughs

The "Billion Laughs" attack uses nested entity expansion to cause exponential memory consumption and CPU usage. A small XML document of a few kilobytes can exhaust all available memory, crashing the parser and the application.

Blind XXE via Out-of-Band Channels

When error messages are suppressed, attackers use out-of-band XXE: the entity references a URL they control (xxe.attacker.com), allowing them to see file contents via DNS or HTTP requests to their server even without direct output.

How to Fix

Recommended remediation steps

1Replace xml.etree.ElementTree with defusedxml.ElementTree for all XML parsing of untrusted input.
2Install defusedxml: pip install defusedxml. It provides safe drop-in replacements for all standard library XML parsers.
3For configuration files or SOAP/REST XML responses, validate the XML schema after parsing with a schema validator (lxml's xmlschema).
4Limit the size of XML input accepted from external sources to prevent denial-of-service attacks.
5Use JSON instead of XML for new API designs to avoid XML security complexity entirely.

Detection Scope

How Code Pathfinder analyzes your code for this vulnerability

This rule detects calls to xml.etree.ElementTree.parse(), xml.etree.ElementTree.fromstring(), xml.etree.ElementTree.fromstringlist(), and xml.etree.ElementTree.iterparse() in Python source code. These functions use the vulnerable default parser unless explicitly replaced.

Compliance & Standards

Industry frameworks and regulations that require detection of this vulnerability

OWASP Top 10

A05:2021 - Security Misconfiguration (XXE covered in A05 since OWASP 2021)

CWE Top 25

CWE-611 - Improper Restriction of XML External Entity Reference

PCI DSS v4.0

Requirement 6.2.4 - Protect against injection attacks including XXE

NIST SP 800-53

SI-10: Information Input Validation

References

External resources and documentation

CWE-611: Improper Restriction of XML External Entity Reference Python docs: xml.etree.ElementTree security note defusedxml documentation OWASP XXE Prevention Cheat Sheet OWASP XML External Entities (XXE)

Similar Rules

Explore related security rules for Python

MEDIUM

Insecure xml.dom.minidom Usage (XXE)

xml.dom.minidom is vulnerable to XML External Entity (XXE) attacks. Use defusedxml.minidom for safe XML parsing.

MEDIUM

Insecure xmlrpc Usage (XXE Risk)

xmlrpc.client.ServerProxy and xmlrpc.server modules are vulnerable to XXE attacks via malicious XML-RPC payloads. Use defusedxml.xmlrpc for protection.

Frequently Asked Questions

Common questions about Insecure XML Parsing (XXE Vulnerability)

Python's xml.etree.ElementTree uses expat for parsing. expat does not support external entities by default, which provides partial protection. However, other XML attacks (Billion Laughs, quadratic blowup) still apply. For complete protection against all XML attack vectors, use defusedxml which explicitly blocks all dangerous features.

defusedxml protects against: XXE (external entity injection), DTD retrieval, Billion Laughs (entity expansion DoS), quadratic blowup, and malicious namespace expansion. It raises defusedxml.DTDForbidden, defusedxml.EntitiesForbidden, etc. when these attacks are detected.

lxml is not safe by default. By default, lxml's etree.parse() resolves external entities. To use lxml safely, create a custom parser: parser = lxml.etree.XMLParser( resolve_entities=False, no_network=True, load_dtd=False) and pass it to etree.parse(). defusedxml.lxml wraps this configuration.

Python's xml.etree.ElementTree with expat has some built-in protection but not complete protection. The authoritative list of protections is in Python's xml module documentation. For production code processing untrusted XML, defusedxml is the recommended solution as it provides a complete, maintained defense.

defusedxml's overhead is minimal. It adds checks before and during parsing to detect dangerous patterns and raise exceptions. For well-formed, legitimate XML, the overhead is negligible. For XML that triggers protections, it raises exceptions before completing parsing, which may actually be faster than processing a malicious payload.

SOAP APIs and RSS feeds often contain XML from partially trusted sources. Even if you trust the source, a compromised upstream feed or a MITM attacker can inject XXE payloads. Use defusedxml for all XML parsing regardless of the apparent trustworthiness of the source, following the principle of defense in depth.

New feature

Get these findings posted directly on your GitHub pull requests

The Insecure XML Parsing (XXE Vulnerability) rule runs in CI and posts inline review comments on the exact lines — no dashboard, no SARIF viewer.

See how it works

Back to Python Core All Languages →

Insecure XML Parsing (XXE Vulnerability)

Rule Information

Interactive Playground

About This Rule

Security Implications

Local File Disclosure via XXE

Server-Side Request Forgery (SSRF)

Denial of Service via Billion Laughs

Blind XXE via Out-of-Band Channels

How to Fix

Detection Scope

Compliance & Standards

References

Similar Rules

Insecure xml.dom.minidom Usage (XXE)

Insecure xmlrpc Usage (XXE Risk)

Frequently Asked Questions

Is xml.etree.ElementTree always vulnerable to XXE?

What does defusedxml protect against?

Is lxml safe from XXE?

Can I check if my ElementTree usage is safe without defusedxml?

Does defusedxml have any performance impact?

What about XML parsing in XML-based APIs like SOAP or RSS feeds?

Get these findings posted directly on your GitHub pull requests