# PYTHON-LANG-SEC-044: marshal Deserialization Detected

> **Severity:** MEDIUM | **CWE:** CWE-502 | **OWASP:** A08:2021

- **Language:** Python
- **Category:** Python Core
- **URL:** https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-044
- **Detection:** `pathfinder scan --ruleset python/PYTHON-LANG-SEC-044 --project .`

## Description

Python's marshal module serializes Python objects in a binary format used internally for
.pyc bytecode files and Python's import system. The module can serialize code objects,
which can be executed using exec() or eval(). Unlike pickle, marshal cannot serialize
arbitrary class instances, but it can serialize Python bytecode (code objects) that
execute arbitrary Python when eval()-ed or exec()-ed.

Python's documentation explicitly states: "The marshal module is not intended to be secure
against erroneous or maliciously constructed data. Never unmarshal data received from an
untrusted or unauthenticated source."

Marshal is used in Python's internal bytecode caching (.pyc files). Applications that use
marshal.loads() to deserialize data from external sources may be vulnerable to code execution
if the input contains serialized code objects.


## Vulnerable Code

```python
import pickle
import yaml
import marshal
import shelve

# SEC-044: marshal
code_obj = marshal.loads(b"data")
```

## Secure Code

```python
import json

# INSECURE: marshal.loads() on external data
# import marshal
# obj = marshal.loads(external_data)

# SECURE: Use JSON for external data interchange
def load_cached_data(cache_bytes: bytes) -> dict:
    return json.loads(cache_bytes.decode("utf-8"))

# SECURE: Use pickle with HMAC for signed internal caching of Python objects
import pickle
import hmac
import hashlib
import os

SECRET_KEY = os.environ.get("CACHE_SECRET_KEY", "").encode()

def load_signed_cache(data: bytes) -> object:
    if len(data) < 32:
        raise ValueError("Invalid cache data")
    signature = data[:32]
    payload = data[32:]
    expected_sig = hmac.new(SECRET_KEY, payload, hashlib.sha256).digest()
    if not hmac.compare_digest(signature, expected_sig):
        raise ValueError("Invalid cache signature")
    return pickle.loads(payload)

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, QueryType

class MarshalModule(QueryType):
    fqns = ["marshal"]


@python_rule(
    id="PYTHON-LANG-SEC-044",
    name="marshal Usage Detected",
    severity="MEDIUM",
    category="lang",
    cwe="CWE-502",
    tags="python,marshal,deserialization,CWE-502",
    message="marshal.loads/load detected. Marshal is not secure against erroneous or malicious data.",
    owasp="A08:2021",
)
def detect_marshal():
    """Detects marshal.loads/load/dump/dumps usage."""
    return MarshalModule.method("loads", "load")
```

## How to Fix

- Never use marshal.loads() or marshal.load() for data received from external sources, networks, or file uploads.
- Use JSON or MessagePack for external data interchange instead of marshal.
- For internal Python object caching, use pickle with HMAC signing rather than marshal.
- If marshal is used for .pyc caching, ensure the cache directory is not writable by untrusted users and has proper filesystem permissions.
- Audit all uses of the marshal module to confirm they only process data from trusted, internal Python processes.

## Security Implications

- **Code Object Deserialization:** marshal can serialize and deserialize Python code objects. An attacker who can control
marshal input can provide a crafted code object that, when executed via exec() or eval()
on the deserialized object, runs arbitrary Python code with the process's privileges.

- **Bytecode Execution via Code Objects:** marshal.loads() on attacker-controlled data can produce a code object containing
malicious bytecode. If the application subsequently executes this object (e.g., in a
dynamic import or eval context), the attacker achieves code execution.

- **Process Crash via Malformed Data:** The Python documentation warns that marshal is not safe against erroneous data. Malformed
marshal streams can crash the Python interpreter with segmentation faults or cause memory
corruption in the CPython implementation.

- **Bytecode Cache Poisoning:** Applications that cache compiled code objects using marshal in shared storage (Redis,
memcached, filesystem) are vulnerable if an attacker can write to that storage, replacing
legitimate bytecode with malicious code objects.


## FAQ

**Q: Is marshal less dangerous than pickle?**

marshal cannot serialize arbitrary class instances as pickle can, which limits some
attack vectors. However, marshal CAN serialize code objects (Python bytecode), which
can be executed to run arbitrary Python. For external data, both are dangerous and
should be replaced with JSON or similar safe formats.


**Q: What does Python use marshal for internally?**

Python uses marshal internally to read and write .pyc bytecode cache files. This is
a trusted, internal use where the .pyc files are generated by the Python compiler
from trusted source code. The risk arises only when marshal.loads() is used on
externally sourced data.


**Q: Can I use marshal for IPC between trusted Python processes?**

Marshal between trusted processes in the same deployment is lower risk but is still
not recommended due to the code object serialization risk and lack of versioning.
For trusted IPC, pickle with HMAC signing provides similar Python object support
with additional integrity protection. JSON is preferred for all new IPC designs.


**Q: What Python types can marshal serialize?**

marshal supports: None, booleans, integers, floating point, complex numbers, bytes,
strings, tuples, lists, dicts, sets, frozensets, and code objects. It does not support
class instances, arbitrary objects, or circular references. The code object type is
the primary security concern.


**Q: How does marshal compare to pickle for performance?**

marshal is generally faster than pickle for the types it supports and is used in
Python's performance-critical import system. However, for application-level data
interchange, the performance difference is rarely significant enough to justify using
marshal over safer alternatives.


**Q: Are .pyc files a security risk if writable?**

Yes. If an attacker can write to .pyc cache files, they can inject malicious bytecode
that executes when Python imports the corresponding module. .pyc directories should
be writable only by the application deployment process, not by the application runtime
or external users.


## References

- [CWE-502: Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html)
- [Python docs: marshal module warning](https://docs.python.org/3/library/marshal.html)
- [OWASP Deserialization Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html)
- [OWASP Top 10 A08:2021 Software and Data Integrity Failures](https://owasp.org/Top10/A08_2021-Software_and_Data_Integrity_Failures/)
- [Python bytecode security research](https://nedbatchelder.com/blog/201803/is_python_a_scripting_language.html)

---

Source: https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-044
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
