marshal Deserialization Detected

MEDIUM

marshal.loads() and marshal.load() are not secure against erroneous or malicious data and should not be used to deserialize untrusted input.

Rule Information

Language

Python

Interactive Playground

Experiment with the vulnerable code and security rule below. Edit the code to see how the rule detects different vulnerability patterns.

pathfinder scan --ruleset python/PYTHON-LANG-SEC-044 --project .

rule.py

About This Rule

Understanding the vulnerability and how it is detected

Python's marshal module serializes Python objects in a binary format used internally for .pyc bytecode files and Python's import system. The module can serialize code objects, which can be executed using exec() or eval(). Unlike pickle, marshal cannot serialize arbitrary class instances, but it can serialize Python bytecode (code objects) that execute arbitrary Python when eval()-ed or exec()-ed.

Python's documentation explicitly states: "The marshal module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source."

Marshal is used in Python's internal bytecode caching (.pyc files). Applications that use marshal.loads() to deserialize data from external sources may be vulnerable to code execution if the input contains serialized code objects.

Security Implications

Potential attack scenarios if this vulnerability is exploited

Code Object Deserialization

marshal can serialize and deserialize Python code objects. An attacker who can control marshal input can provide a crafted code object that, when executed via exec() or eval() on the deserialized object, runs arbitrary Python code with the process's privileges.

Bytecode Execution via Code Objects

marshal.loads() on attacker-controlled data can produce a code object containing malicious bytecode. If the application subsequently executes this object (e.g., in a dynamic import or eval context), the attacker achieves code execution.

Process Crash via Malformed Data

The Python documentation warns that marshal is not safe against erroneous data. Malformed marshal streams can crash the Python interpreter with segmentation faults or cause memory corruption in the CPython implementation.

Bytecode Cache Poisoning

Applications that cache compiled code objects using marshal in shared storage (Redis, memcached, filesystem) are vulnerable if an attacker can write to that storage, replacing legitimate bytecode with malicious code objects.

How to Fix

Recommended remediation steps

1Never use marshal.loads() or marshal.load() for data received from external sources, networks, or file uploads.
2Use JSON or MessagePack for external data interchange instead of marshal.
3For internal Python object caching, use pickle with HMAC signing rather than marshal.
4If marshal is used for .pyc caching, ensure the cache directory is not writable by untrusted users and has proper filesystem permissions.
5Audit all uses of the marshal module to confirm they only process data from trusted, internal Python processes.

Detection Scope

How Code Pathfinder analyzes your code for this vulnerability

This rule detects calls to marshal.loads() and marshal.load() from the Python marshal module. All call sites are flagged since marshal is explicitly documented as unsafe for untrusted data and its use for external data is almost always a security concern.

Compliance & Standards

Industry frameworks and regulations that require detection of this vulnerability

CWE Top 25

CWE-502 - Deserialization of Untrusted Data

OWASP Top 10

A08:2021 - Software and Data Integrity Failures

NIST SP 800-53

SI-10: Information Input Validation

PCI DSS v4.0

Requirement 6.2.4 - Protect against deserialization attacks

References

External resources and documentation

CWE-502: Deserialization of Untrusted Data Python docs: marshal module warning OWASP Deserialization Cheat Sheet OWASP Top 10 A08:2021 Software and Data Integrity Failures Python bytecode security research

Similar Rules

Explore related security rules for Python

HIGH

Pickle Deserialization of Untrusted Data

pickle.loads() and pickle.load() execute arbitrary Python code during deserialization. Never unpickle data from untrusted sources.

HIGH

PyYAML Unsafe Load Function

yaml.load() and yaml.unsafe_load() can execute arbitrary Python objects during YAML parsing. Use yaml.safe_load() instead.

MEDIUM

shelve Module Usage Detected

shelve.open() uses pickle internally for value serialization and is not safe for storing or retrieving data from untrusted sources.

Frequently Asked Questions

Common questions about marshal Deserialization Detected

marshal cannot serialize arbitrary class instances as pickle can, which limits some attack vectors. However, marshal CAN serialize code objects (Python bytecode), which can be executed to run arbitrary Python. For external data, both are dangerous and should be replaced with JSON or similar safe formats.

Python uses marshal internally to read and write .pyc bytecode cache files. This is a trusted, internal use where the .pyc files are generated by the Python compiler from trusted source code. The risk arises only when marshal.loads() is used on externally sourced data.

Marshal between trusted processes in the same deployment is lower risk but is still not recommended due to the code object serialization risk and lack of versioning. For trusted IPC, pickle with HMAC signing provides similar Python object support with additional integrity protection. JSON is preferred for all new IPC designs.

marshal supports: None, booleans, integers, floating point, complex numbers, bytes, strings, tuples, lists, dicts, sets, frozensets, and code objects. It does not support class instances, arbitrary objects, or circular references. The code object type is the primary security concern.

marshal is generally faster than pickle for the types it supports and is used in Python's performance-critical import system. However, for application-level data interchange, the performance difference is rarely significant enough to justify using marshal over safer alternatives.

Yes. If an attacker can write to .pyc cache files, they can inject malicious bytecode that executes when Python imports the corresponding module. .pyc directories should be writable only by the application deployment process, not by the application runtime or external users.

New feature

Get these findings posted directly on your GitHub pull requests

The marshal Deserialization Detected rule runs in CI and posts inline review comments on the exact lines — no dashboard, no SARIF viewer.

See how it works

Back to Python Core All Languages →