Deserialization

PyLxml

lxml is the C-backed XML / HTML parser. etree.parse / fromstring with a custom XMLParser(resolve_entities=True) is an XXE sink. Default behavior in recent lxml is safer but the API still allows unsafe configurations.

2 sinks
Taint flow0 sources 2 sinks
Sinks — dangerous call
.parse()
.fromstring()

Sinks

.parse()Sink
#
Signature
lxml.etree.parse(source, parser=None, base_url=None) -> ElementTree

Parses XML. XXE sink when parser has resolve_entities=True.

tracks:0
.fromstring()Sink
#
Signature
lxml.etree.fromstring(text, parser=None, base_url=None) -> Element

Parses XML from string. XXE sink under unsafe parser config.

tracks:0

Other Methods

.XMLParser()Neutral
#
Signature
lxml.etree.XMLParser(resolve_entities=False, no_network=True, ...) -> XMLParser

Creates an XML parser. Finding when resolve_entities=True or no_network=False.

.HTMLParser()Neutral
#
Signature
lxml.etree.HTMLParser(recover=True, ...) -> HTMLParser

HTML parser variant. Less XXE risk than XML but still processes entities.

Fully-Qualified Names

FQNField
lxmlfqns[0]
lxml.etreefqns[1]

Wrong FQN → 0 findings. Verify with: change fqns to garbage → must produce 0 results.

Import

rule.py
from codepathfinder.go_rule import PyLxml