Interactive Playground
Experiment with the vulnerable code and security rule below. Edit the code to see how the rule detects different vulnerability patterns.
pathfinder scan --ruleset python/PYTHON-LANG-SEC-046 --project .About This Rule
Understanding the vulnerability and how it is detected
dill is a Python package that extends the standard pickle module with broader serialization capabilities, supporting lambda functions, generators, closures, nested functions, and other Python objects that pickle cannot serialize. dill uses pickle's serialization mechanism as its foundation and is fully compatible with pickle's security issues.
Like pickle, dill can execute arbitrary Python code during deserialization. dill actually extends the attack surface compared to standard pickle because it can serialize and deserialize additional constructs including lambda functions and closures containing executable code.
dill is commonly used in scientific computing (multiprocessing with lambdas, distributed computing) and machine learning (serializing model training functions). These use cases involve trusted internal data, but dill.loads() on external data is just as dangerous as pickle.loads().
Security Implications
Potential attack scenarios if this vulnerability is exploited
Extended Code Execution via Closures and Lambdas
dill can serialize lambda functions, closures, and generators that contain executable code. An attacker crafting a dill payload can embed malicious lambda functions or closures that execute arbitrary code when deserialized, in addition to all of pickle's existing attack vectors.
Distributed Computing Attack Surface
dill is commonly used with multiprocessing and distributed frameworks (Ray, Dask, Apache Spark's Python serialization) to serialize functions for distribution across workers. If an attacker can inject dill-serialized payloads into the task queue, they can execute code on all worker nodes.
ML Model Poisoning
Machine learning pipelines that serialize model training functions, preprocessing steps, or custom loss functions using dill are vulnerable to model poisoning if the serialized files can be replaced. Loading a malicious dill file from an untrusted model repository triggers code execution.
Lambda-based Payload Evasion
dill's ability to serialize lambdas and closures enables more sophisticated attack payloads that may evade signature-based detection designed to look for common pickle gadget chains, since the malicious code is embedded in function bytecode rather than class instantiation sequences.
How to Fix
Recommended remediation steps
- 1Never use dill.loads() or dill.load() on data from external sources, including network payloads, file uploads, or user-provided files.
- 2For distributed computing task serialization, restrict task definitions to developer-controlled function references rather than serialized closures from user input.
- 3For ML model portability, use format-specific safe serialization (ONNX, TorchScript, SavedModel) instead of dill-serialized Python function objects.
- 4If dill must be used for internal distributed computing, ensure task payloads are signed with HMAC and only processed within a trusted network boundary.
- 5Audit all dill usage in data science and ML pipelines to confirm no external data flows through dill.loads().
Detection Scope
How Code Pathfinder analyzes your code for this vulnerability
This rule detects calls to dill.loads() and dill.load() from the dill package. All call sites are flagged since dill has the same code execution risk as pickle with an extended attack surface from its additional serialization capabilities.
Compliance & Standards
Industry frameworks and regulations that require detection of this vulnerability
References
External resources and documentation
Similar Rules
Explore related security rules for Python
Pickle Deserialization of Untrusted Data
pickle.loads() and pickle.load() execute arbitrary Python code during deserialization. Never unpickle data from untrusted sources.
marshal Deserialization Detected
marshal.loads() and marshal.load() are not secure against erroneous or malicious data and should not be used to deserialize untrusted input.
shelve Module Usage Detected
shelve.open() uses pickle internally for value serialization and is not safe for storing or retrieving data from untrusted sources.
Frequently Asked Questions
Common questions about dill Deserialization Detected
New feature
Get these findings posted directly on your GitHub pull requests
The dill Deserialization Detected rule runs in CI and posts inline review comments on the exact lines — no dashboard, no SARIF viewer.