PyYAML Unsafe Load Function

HIGH

yaml.load() and yaml.unsafe_load() can execute arbitrary Python objects during YAML parsing. Use yaml.safe_load() instead.

Rule Information

Language

Python

Interactive Playground

Experiment with the vulnerable code and security rule below. Edit the code to see how the rule detects different vulnerability patterns.

pathfinder scan --ruleset python/PYTHON-LANG-SEC-041 --project .

rule.py

About This Rule

Understanding the vulnerability and how it is detected

PyYAML's yaml.load() function, when called without an explicit Loader argument or with Loader=yaml.Loader/yaml.UnsafeLoader, can instantiate arbitrary Python objects during parsing using YAML's !!python/object and !!python/object/apply tags. This enables remote code execution when processing YAML from untrusted sources.

The vulnerability is triggered by YAML content such as: !!python/object/apply:os.system ["id"] or more sophisticated payloads using subprocess or socket. PyYAML versions before 5.1 used the unsafe loader by default; since 5.1 a warning is issued unless an explicit Loader is provided.

The safe alternative is yaml.safe_load() or yaml.load(data, Loader=yaml.SafeLoader), which only processes YAML scalars, sequences, and mappings without instantiating Python objects.

Security Implications

Potential attack scenarios if this vulnerability is exploited

Remote Code Execution via !!python Tags

The !!python/object/apply tag in YAML invokes arbitrary Python callables. An attacker who can control YAML input can execute os.system(), subprocess.Popen(), or any other callable, achieving full RCE with a single YAML document.

Configuration File Injection

Applications that load YAML configuration files and process them with yaml.load() are vulnerable if an attacker can modify the configuration file, inject content through environment variable expansion, or write to the configuration directory.

API and Webhook Payload Injection

REST APIs, CI/CD pipelines, and infrastructure-as-code tools that accept YAML input from users and parse it with yaml.load() are directly exploitable. This is a common vector in DevOps tooling.

Kubernetes and Helm Chart Injection

Tools that process Kubernetes manifests or Helm chart values using PyYAML's unsafe loader can be exploited through crafted chart values or manifest files submitted by unprivileged users.

How to Fix

Recommended remediation steps

1Replace all yaml.load() calls with yaml.safe_load() or yaml.load(data, Loader=yaml.SafeLoader).
2Never use yaml.unsafe_load() or yaml.load() with yaml.Loader/yaml.UnsafeLoader on external input.
3If custom Python objects must be serialized to YAML, use explicit schema validation rather than relying on YAML's !!python tags.
4Audit all YAML parsing in CI/CD pipelines, configuration loaders, and API endpoints that accept YAML input.
5Consider restricting YAML features to a safe subset (scalars, sequences, mappings) by using yaml.safe_load() universally.

Detection Scope

How Code Pathfinder analyzes your code for this vulnerability

This rule detects calls to yaml.load() without an explicit Loader argument or with unsafe Loader values, and calls to yaml.unsafe_load(). The rule flags these patterns as they enable Python object instantiation during YAML parsing.

Compliance & Standards

Industry frameworks and regulations that require detection of this vulnerability

CWE Top 25

CWE-502 - Deserialization of Untrusted Data

OWASP Top 10

A08:2021 - Software and Data Integrity Failures

NIST SP 800-53

SI-10: Information Input Validation

PCI DSS v4.0

Requirement 6.2.4 - Protect against deserialization attacks

References

External resources and documentation

CWE-502: Deserialization of Untrusted Data PyYAML documentation: yaml.safe_load()PyYAML CVE-2017-18342 OWASP Deserialization Cheat Sheet Exploiting Python YAML deserialization

Similar Rules

Explore related security rules for Python

HIGH

Pickle Deserialization of Untrusted Data

pickle.loads() and pickle.load() execute arbitrary Python code during deserialization. Never unpickle data from untrusted sources.

HIGH

ruamel.yaml Unsafe Loader Configuration

ruamel.yaml configured with typ='unsafe' can instantiate arbitrary Python objects during YAML parsing. Use typ='safe' or the default round-trip loader.

HIGH

jsonpickle Deserialization Detected

jsonpickle.decode() can execute arbitrary Python code during deserialization. Use the standard json module for untrusted data.

Frequently Asked Questions

Common questions about PyYAML Unsafe Load Function

yaml.safe_load() uses SafeLoader which only supports standard YAML tags and Python built-in types (str, int, float, list, dict, None, bool, datetime). yaml.load() with yaml.Loader uses the full loader that supports !!python/object and !!python/apply tags, allowing arbitrary Python objects to be instantiated during parsing.

yaml.BaseLoader loads all values as strings without interpreting any YAML tags. It is safe but does not perform type coercion (numbers remain strings). yaml.SafeLoader is usually the right choice as it handles type coercion for standard YAML types while blocking Python-specific tags.

A simple payload: !!python/object/apply:os.system ["id"]. More sophisticated payloads use subprocess.Popen with encoded commands, or construct chains through __reduce__ methods. The !! prefix denotes a YAML tag, and python/object/apply invokes the specified callable with the given arguments during parsing.

Since PyYAML 5.1, calling yaml.load() without an explicit Loader argument raises a YAMLLoadWarning. However, the warning is often ignored or suppressed in practice. The rule flags the call regardless of whether the Loader is explicitly specified, since the absence of an explicit SafeLoader indicates potential risk.

ruamel.yaml has its own unsafe loading behavior when configured with typ='unsafe'. See PYTHON-LANG-SEC-043 for ruamel.yaml-specific guidance. When using ruamel.yaml, always use the default or safe round-trip loader.

yaml.CSafeLoader is the C-accelerated version of SafeLoader and is both safe and faster when the libyaml C extension is available. Use yaml.load(data, Loader=yaml.CSafeLoader) for production performance or yaml.safe_load() (which uses CSafeLoader when available).

New feature

Get these findings posted directly on your GitHub pull requests

The PyYAML Unsafe Load Function rule runs in CI and posts inline review comments on the exact lines — no dashboard, no SARIF viewer.

See how it works

Back to Python Core All Languages →

PyYAML Unsafe Load Function

Rule Information

Interactive Playground

About This Rule

Security Implications

Remote Code Execution via !!python Tags

Configuration File Injection

API and Webhook Payload Injection

Kubernetes and Helm Chart Injection

How to Fix

Detection Scope

Compliance & Standards

References

Similar Rules

Pickle Deserialization of Untrusted Data

ruamel.yaml Unsafe Loader Configuration

jsonpickle Deserialization Detected

Frequently Asked Questions

What is the difference between yaml.load() and yaml.safe_load()?

Is yaml.load(data, Loader=yaml.BaseLoader) safe?

What does a YAML deserialization attack payload look like?

Does PyYAML's yaml.load() warn about unsafe usage?

What about ruamel.yaml — is it affected?

Should I use yaml.CSafeLoader for performance?

Get these findings posted directly on your GitHub pull requests