# PYTHON-LANG-SEC-041: PyYAML Unsafe Load Function

> **Severity:** HIGH | **CWE:** CWE-502 | **OWASP:** A08:2021

- **Language:** Python
- **Category:** Python Core
- **URL:** https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-041
- **Detection:** `pathfinder scan --ruleset python/PYTHON-LANG-SEC-041 --project .`

## Description

PyYAML's yaml.load() function, when called without an explicit Loader argument or with
Loader=yaml.Loader/yaml.UnsafeLoader, can instantiate arbitrary Python objects during
parsing using YAML's !!python/object and !!python/object/apply tags. This enables
remote code execution when processing YAML from untrusted sources.

The vulnerability is triggered by YAML content such as:
  !!python/object/apply:os.system ["id"]
or more sophisticated payloads using subprocess or socket. PyYAML versions before 5.1
used the unsafe loader by default; since 5.1 a warning is issued unless an explicit
Loader is provided.

The safe alternative is yaml.safe_load() or yaml.load(data, Loader=yaml.SafeLoader),
which only processes YAML scalars, sequences, and mappings without instantiating Python
objects.


## Vulnerable Code

```python
import pickle
import yaml
import marshal
import shelve

# SEC-041: yaml unsafe load
with open("config.yml") as f:
    config = yaml.load(f, Loader=yaml.FullLoader)
    unsafe = yaml.unsafe_load(f)
```

## Secure Code

```python
import yaml

# INSECURE: yaml.load() without SafeLoader
# data = yaml.load(user_input)
# data = yaml.load(user_input, Loader=yaml.Loader)
# data = yaml.unsafe_load(user_input)

# SECURE: Always use yaml.safe_load() for untrusted input
def parse_config(yaml_content: str) -> dict:
    data = yaml.safe_load(yaml_content)
    if not isinstance(data, dict):
        raise ValueError("Expected YAML mapping at top level")
    return data

# SECURE: Explicit SafeLoader for clarity
def load_yaml_document(content: str):
    return yaml.load(content, Loader=yaml.SafeLoader)

# SECURE: For trusted internal configuration, still use safe_load
# Unsafe loading is almost never needed for legitimate application config
def load_app_config(config_path: str) -> dict:
    with open(config_path) as f:
        return yaml.safe_load(f)

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, QueryType

class YamlModule(QueryType):
    fqns = ["yaml"]


@python_rule(
    id="PYTHON-LANG-SEC-041",
    name="PyYAML Unsafe Load",
    severity="HIGH",
    category="lang",
    cwe="CWE-502",
    tags="python,yaml,deserialization,rce,OWASP-A08,CWE-502",
    message="yaml.load() or yaml.unsafe_load() detected. Use yaml.safe_load() instead.",
    owasp="A08:2021",
)
def detect_yaml_load():
    """Detects yaml.load() and yaml.unsafe_load() calls."""
    return YamlModule.method("load", "unsafe_load")
```

## How to Fix

- Replace all yaml.load() calls with yaml.safe_load() or yaml.load(data, Loader=yaml.SafeLoader).
- Never use yaml.unsafe_load() or yaml.load() with yaml.Loader/yaml.UnsafeLoader on external input.
- If custom Python objects must be serialized to YAML, use explicit schema validation rather than relying on YAML's !!python tags.
- Audit all YAML parsing in CI/CD pipelines, configuration loaders, and API endpoints that accept YAML input.
- Consider restricting YAML features to a safe subset (scalars, sequences, mappings) by using yaml.safe_load() universally.

## Security Implications

- **Remote Code Execution via !!python Tags:** The !!python/object/apply tag in YAML invokes arbitrary Python callables. An attacker
who can control YAML input can execute os.system(), subprocess.Popen(), or any other
callable, achieving full RCE with a single YAML document.

- **Configuration File Injection:** Applications that load YAML configuration files and process them with yaml.load() are
vulnerable if an attacker can modify the configuration file, inject content through
environment variable expansion, or write to the configuration directory.

- **API and Webhook Payload Injection:** REST APIs, CI/CD pipelines, and infrastructure-as-code tools that accept YAML input
from users and parse it with yaml.load() are directly exploitable. This is a common
vector in DevOps tooling.

- **Kubernetes and Helm Chart Injection:** Tools that process Kubernetes manifests or Helm chart values using PyYAML's unsafe
loader can be exploited through crafted chart values or manifest files submitted by
unprivileged users.


## FAQ

**Q: What is the difference between yaml.load() and yaml.safe_load()?**

yaml.safe_load() uses SafeLoader which only supports standard YAML tags and Python
built-in types (str, int, float, list, dict, None, bool, datetime). yaml.load() with
yaml.Loader uses the full loader that supports !!python/object and !!python/apply
tags, allowing arbitrary Python objects to be instantiated during parsing.


**Q: Is yaml.load(data, Loader=yaml.BaseLoader) safe?**

yaml.BaseLoader loads all values as strings without interpreting any YAML tags. It is
safe but does not perform type coercion (numbers remain strings). yaml.SafeLoader is
usually the right choice as it handles type coercion for standard YAML types while
blocking Python-specific tags.


**Q: What does a YAML deserialization attack payload look like?**

A simple payload: !!python/object/apply:os.system ["id"]. More sophisticated payloads
use subprocess.Popen with encoded commands, or construct chains through __reduce__
methods. The !! prefix denotes a YAML tag, and python/object/apply invokes the
specified callable with the given arguments during parsing.


**Q: Does PyYAML's yaml.load() warn about unsafe usage?**

Since PyYAML 5.1, calling yaml.load() without an explicit Loader argument raises a
YAMLLoadWarning. However, the warning is often ignored or suppressed in practice. The
rule flags the call regardless of whether the Loader is explicitly specified, since the
absence of an explicit SafeLoader indicates potential risk.


**Q: What about ruamel.yaml — is it affected?**

ruamel.yaml has its own unsafe loading behavior when configured with typ='unsafe'.
See PYTHON-LANG-SEC-043 for ruamel.yaml-specific guidance. When using ruamel.yaml,
always use the default or safe round-trip loader.


**Q: Should I use yaml.CSafeLoader for performance?**

yaml.CSafeLoader is the C-accelerated version of SafeLoader and is both safe and
faster when the libyaml C extension is available. Use yaml.load(data, Loader=yaml.CSafeLoader)
for production performance or yaml.safe_load() (which uses CSafeLoader when available).


## References

- [CWE-502: Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html)
- [PyYAML documentation: yaml.safe_load()](https://pyyaml.org/wiki/PyYAMLDocumentation)
- [PyYAML CVE-2017-18342](https://nvd.nist.gov/vuln/detail/CVE-2017-18342)
- [OWASP Deserialization Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html)
- [Exploiting Python YAML deserialization](https://www.exploit-db.com/docs/english/47655-yaml-deserialization-attack-in-python.pdf)

---

Source: https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-041
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
