Pickle Deserialization of Untrusted Data

HIGH

pickle.loads() and pickle.load() execute arbitrary Python code during deserialization. Never unpickle data from untrusted sources.

Rule Information

Language
Python
Category
Python Core
Author
Shivasurya
Shivasurya
Last Updated
2026-03-22
Tags
pythonpickledeserializationarbitrary-code-executionCWE-502OWASP-A08
CWE References

Interactive Playground

Experiment with the vulnerable code and security rule below. Edit the code to see how the rule detects different vulnerability patterns.

pathfinder scan --ruleset python/PYTHON-LANG-SEC-040 --project .
1
2
3
4
5
6
7
8
9
10
rule.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

About This Rule

Understanding the vulnerability and how it is detected

Python's pickle module serializes and deserializes Python objects by encoding them as a stream of opcodes that are executed by a virtual stack machine during unpickling. The __reduce__() and __reduce_ex__() methods on objects can define arbitrary Python code to be executed when the object is deserialized.

This means that deserializing a pickle stream from an untrusted source is equivalent to executing arbitrary Python code. An attacker who can control the pickled data can achieve full remote code execution, read files, spawn processes, and exfiltrate data. There is no safe subset of pickle operations — the entire pickle format is a code execution vector.

The Python documentation explicitly warns: "The pickle module is not secure. Only unpickle data you trust." Use JSON, MessagePack, or Protocol Buffers for data interchange with untrusted parties.

Security Implications

Potential attack scenarios if this vulnerability is exploited

1

Arbitrary Code Execution via __reduce__

The pickle __reduce__ protocol allows any pickleable object to specify a callable and arguments to be invoked during deserialization. An attacker crafts a pickle stream that calls os.system(), subprocess.Popen(), or exec() with malicious arguments, achieving RCE simply by having the pickle stream deserialized.

2

No Sanitization Is Possible

Unlike SQL injection or XSS where sanitization can be effective, there is no way to safely sanitize or validate a pickle stream before deserializing it. Parsing the pickle stream to check for dangerous opcodes requires implementing a pickle interpreter, which can itself be bypassed by encoding techniques.

3

Session and Cache Poisoning

Applications that store pickled objects in Redis, Memcached, or cookies for session management are vulnerable if an attacker can write to those stores. Session poisoning via pickle injection in shared cache stores has been used in real attacks.

4

File Upload and Deserialization Chain

Applications that accept file uploads and deserialize them with pickle (e.g., ML model files, scientific data, serialized objects) are vulnerable to malicious uploads that execute code on the server when the file is loaded.

How to Fix

Recommended remediation steps

  • 1Replace pickle with JSON, MessagePack, or Protocol Buffers for data received from any external source.
  • 2For ML model serialization, use format-specific safe formats: ONNX, SavedModel (TensorFlow), TorchScript, or weights_only=True for PyTorch.
  • 3If pickle must be used for internal IPC, sign all pickle payloads with HMAC using a secret key and verify the signature before deserializing.
  • 4Never accept pickle data in file uploads, API endpoints, message queues, or any interface reachable by external parties.
  • 5For scientific data (NumPy, pandas), use safe alternatives: np.load() with allow_pickle=False, df.to_parquet()/pd.read_parquet(), or HDF5.

Detection Scope

How Code Pathfinder analyzes your code for this vulnerability

This rule detects calls to pickle.loads(), pickle.load(), pickle.Unpickler(), and equivalent methods from the pickle module (including cPickle). All call sites are flagged since the safety depends entirely on the trust level of the data source, which requires human review.

Compliance & Standards

Industry frameworks and regulations that require detection of this vulnerability

CWE Top 25
CWE-502 - Deserialization of Untrusted Data in the MITRE CWE Top 25
OWASP Top 10
A08:2021 - Software and Data Integrity Failures
NIST SP 800-53
SI-10: Information Input Validation
PCI DSS v4.0
Requirement 6.2.4 - Protect against injection attacks including deserialization

References

External resources and documentation

Similar Rules

Explore related security rules for Python

Frequently Asked Questions

Common questions about Pickle Deserialization of Untrusted Data

Python's documentation suggests subclassing pickle.Unpickler and overriding find_class() to restrict which classes can be deserialized. This provides some protection but is difficult to implement correctly and has been bypassed in practice using creative opcode sequences. For truly untrusted data, use a different serialization format entirely.
Pickle is safer when stored data is only written by your own code and the storage system is protected from external writes. However, if an attacker can inject data into your database (via SQL injection, for example), they can plant malicious pickle payloads. Defense in depth suggests using JSON even for internal storage.
Signing pickled data with HMAC and verifying the signature before deserializing is a valid mitigation for trusted sender scenarios (e.g., signed cookies in web frameworks). Django's signed cookie framework does this. The key must be kept secret and rotated if compromised. This is not a substitute for avoiding pickle with truly untrusted data.
No. joblib uses pickle internally and has the same code execution risk. For scikit-learn models, use ONNX export via sklearn2onnx for deployment. For numpy arrays, use np.save()/np.load() with allow_pickle=False. For pandas DataFrames, use Parquet or CSV with explicit dtypes.
PyTorch .pt/.pth files are pickle-based and can execute code. From PyTorch 2.0+, use torch.load(f, weights_only=True) to load only tensor data without executing arbitrary pickle opcodes. For third-party model files, use ONNX runtime or model format-specific safe loaders.
For structured data interchange: JSON (universal), MessagePack (binary, compact), Protocol Buffers (schema-validated, efficient). For Python-specific types with schema validation: Pydantic models with JSON. For scientific data: Arrow/Parquet, HDF5 with h5py, NumPy's npz format with allow_pickle=False. Choose based on your type requirements and performance needs.

New feature

Get these findings posted directly on your GitHub pull requests

The Pickle Deserialization of Untrusted Data rule runs in CI and posts inline review comments on the exact lines — no dashboard, no SARIF viewer.

See how it works