dill Deserialization Detected

HIGH

dill.loads() and dill.load() extend pickle with broader serialization capabilities and can execute arbitrary code when deserializing untrusted data.

Rule Information

Language

Python

Interactive Playground

Experiment with the vulnerable code and security rule below. Edit the code to see how the rule detects different vulnerability patterns.

pathfinder scan --ruleset python/PYTHON-LANG-SEC-046 --project .

rule.py

About This Rule

Understanding the vulnerability and how it is detected

dill is a Python package that extends the standard pickle module with broader serialization capabilities, supporting lambda functions, generators, closures, nested functions, and other Python objects that pickle cannot serialize. dill uses pickle's serialization mechanism as its foundation and is fully compatible with pickle's security issues.

Like pickle, dill can execute arbitrary Python code during deserialization. dill actually extends the attack surface compared to standard pickle because it can serialize and deserialize additional constructs including lambda functions and closures containing executable code.

dill is commonly used in scientific computing (multiprocessing with lambdas, distributed computing) and machine learning (serializing model training functions). These use cases involve trusted internal data, but dill.loads() on external data is just as dangerous as pickle.loads().

Security Implications

Potential attack scenarios if this vulnerability is exploited

Extended Code Execution via Closures and Lambdas

dill can serialize lambda functions, closures, and generators that contain executable code. An attacker crafting a dill payload can embed malicious lambda functions or closures that execute arbitrary code when deserialized, in addition to all of pickle's existing attack vectors.

Distributed Computing Attack Surface

dill is commonly used with multiprocessing and distributed frameworks (Ray, Dask, Apache Spark's Python serialization) to serialize functions for distribution across workers. If an attacker can inject dill-serialized payloads into the task queue, they can execute code on all worker nodes.

ML Model Poisoning

Machine learning pipelines that serialize model training functions, preprocessing steps, or custom loss functions using dill are vulnerable to model poisoning if the serialized files can be replaced. Loading a malicious dill file from an untrusted model repository triggers code execution.

Lambda-based Payload Evasion

dill's ability to serialize lambdas and closures enables more sophisticated attack payloads that may evade signature-based detection designed to look for common pickle gadget chains, since the malicious code is embedded in function bytecode rather than class instantiation sequences.

How to Fix

Recommended remediation steps

1Never use dill.loads() or dill.load() on data from external sources, including network payloads, file uploads, or user-provided files.
2For distributed computing task serialization, restrict task definitions to developer-controlled function references rather than serialized closures from user input.
3For ML model portability, use format-specific safe serialization (ONNX, TorchScript, SavedModel) instead of dill-serialized Python function objects.
4If dill must be used for internal distributed computing, ensure task payloads are signed with HMAC and only processed within a trusted network boundary.
5Audit all dill usage in data science and ML pipelines to confirm no external data flows through dill.loads().

Detection Scope

How Code Pathfinder analyzes your code for this vulnerability

This rule detects calls to dill.loads() and dill.load() from the dill package. All call sites are flagged since dill has the same code execution risk as pickle with an extended attack surface from its additional serialization capabilities.

Compliance & Standards

Industry frameworks and regulations that require detection of this vulnerability

CWE Top 25

CWE-502 - Deserialization of Untrusted Data

OWASP Top 10

A08:2021 - Software and Data Integrity Failures

NIST SP 800-53

SI-10: Information Input Validation

PCI DSS v4.0

Requirement 6.2.4 - Protect against deserialization attacks

References

External resources and documentation

CWE-502: Deserialization of Untrusted Data dill documentation Python docs: pickle security warning OWASP Deserialization Cheat Sheet OWASP Top 10 A08:2021 Software and Data Integrity Failures

Similar Rules

Explore related security rules for Python

HIGH

Pickle Deserialization of Untrusted Data

pickle.loads() and pickle.load() execute arbitrary Python code during deserialization. Never unpickle data from untrusted sources.

MEDIUM

marshal Deserialization Detected

marshal.loads() and marshal.load() are not secure against erroneous or malicious data and should not be used to deserialize untrusted input.

MEDIUM

shelve Module Usage Detected

shelve.open() uses pickle internally for value serialization and is not safe for storing or retrieving data from untrusted sources.

Frequently Asked Questions

Common questions about dill Deserialization Detected

Standard pickle cannot serialize lambda functions, closures, generators, or nested functions. dill can serialize all of these, expanding the set of possible attack payloads. An attacker can embed malicious code in a lambda or closure that executes upon deserialization, potentially evading defenses designed for standard pickle gadget chains.

Yes. dill is widely used for multiprocessing (to serialize lambda functions for worker processes), distributed computing (to ship function closures to remote workers), and ML/scientific computing (to checkpoint model training functions). These are all legitimate use cases involving trusted internal data, not external input.

cloudpickle (used by Apache Spark, Ray) and pathos.multiprocessing both use dill or similar extended pickle mechanisms for function serialization. All share the same security risk: loading a serialized object from an untrusted source can execute arbitrary code. The same guidance applies: never deserialize externally sourced data with these libraries.

HMAC signing with a secret key before transmission and verification before deserialization provides a reasonable mitigation for trusted sender scenarios. If the sender is trusted and the secret key is properly protected, this reduces the risk to that of a key compromise. However, this requires careful key management and is error-prone.

For model weights: use PyTorch's torch.save() with weights_only=True when loading, or ONNX for cross-framework portability. For preprocessing pipelines: serialize parameters (not functions) as JSON and reconstruct the pipeline deterministically. For custom layers: use model architecture code in version-controlled source files, not serialized function objects.

dill does not provide a safe subset or restricted loader like PyYAML's SafeLoader. The only safe approach with dill is to ensure the data being deserialized comes from a trusted source with cryptographic integrity protection. For untrusted data, there is no safe way to use dill.loads().

New feature

Get these findings posted directly on your GitHub pull requests

The dill Deserialization Detected rule runs in CI and posts inline review comments on the exact lines — no dashboard, no SARIF viewer.

See how it works

Back to Python Core All Languages →