# PYTHON-DJANGO-SEC-072: Django Insecure Deserialization of Request Data

> **Severity:** CRITICAL | **CWE:** CWE-502 | **OWASP:** A08:2021

- **Language:** Python
- **Category:** Django
- **URL:** https://codepathfinder.dev/registry/python/django/PYTHON-DJANGO-SEC-072
- **Detection:** `pathfinder scan --ruleset python/PYTHON-DJANGO-SEC-072 --project .`

## Description

This rule detects insecure deserialization vulnerabilities in Django applications
where untrusted user input from HTTP request parameters flows into unsafe
deserialization functions: pickle.loads(), yaml.load(), dill.loads(), shelve.open()
with user-controlled keys, or similar functions that execute code during deserialization.

Python's pickle module, by design, executes arbitrary code during deserialization.
YAML's yaml.load() with the default Loader executes Python constructors (!!python/object)
during loading. These are not bugs -- they are features for legitimate internal use --
but they become critical vulnerabilities when applied to user-controlled data.

An attacker who can supply a crafted pickle or YAML payload can execute arbitrary
Python code on the server during the deserialization call, with the same privileges
as the Django process. This is equivalent in severity to eval() or exec() injection.


## Vulnerable Code

```python
import pickle
import yaml
from django.views.decorators.csrf import csrf_exempt

# SEC-072: insecure deserialization
def vulnerable_pickle(request):
    data = request.POST.get('data')
    obj = pickle.loads(data)
    return obj


def vulnerable_yaml(request):
    content = request.POST.get('config')
    obj = yaml.load(content)
    return obj
```

## Secure Code

```python
from django.http import JsonResponse
import json
import yaml

def process_config(request):
    data_str = request.body.decode('utf-8', errors='replace')
    # SECURE: Use json.loads() which only parses data structures, no code execution
    try:
        config = json.loads(data_str)
    except json.JSONDecodeError:
        return JsonResponse({'error': 'Invalid JSON'}, status=400)
    # Validate structure
    if not isinstance(config, dict):
        return JsonResponse({'error': 'Config must be an object'}, status=400)
    return JsonResponse({'status': 'processed', 'keys': list(config.keys())})

def process_yaml_config(request):
    yaml_str = request.body.decode('utf-8', errors='replace')
    # SECURE: yaml.safe_load() only parses standard YAML types, no Python constructors
    try:
        config = yaml.safe_load(yaml_str)
    except yaml.YAMLError:
        return JsonResponse({'error': 'Invalid YAML'}, status=400)
    if not isinstance(config, dict):
        return JsonResponse({'error': 'Config must be a mapping'}, status=400)
    return JsonResponse({'config': config})

def load_preferences(request):
    # SECURE: Store preferences as JSON in the database, not as serialized objects
    from myapp.models import UserPreference
    prefs = UserPreference.objects.filter(user=request.user).values('key', 'value')
    return JsonResponse({'preferences': list(prefs)})

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, flows, QueryType
from codepathfinder.presets import PropagationPresets

class PickleModule(QueryType):
    fqns = ["pickle", "_pickle", "cPickle"]

class YamlModule(QueryType):
    fqns = ["yaml"]

class DillModule(QueryType):
    fqns = ["dill"]

_DJANGO_SOURCES = [
    calls("request.GET.get"),
    calls("request.POST.get"),
    calls("request.GET"),
    calls("request.POST"),
    calls("request.COOKIES.get"),
    calls("request.FILES.get"),
    calls("*.GET.get"),
    calls("*.POST.get"),
]


@python_rule(
    id="PYTHON-DJANGO-SEC-072",
    name="Django Insecure Deserialization of Request Data",
    severity="CRITICAL",
    category="django",
    cwe="CWE-502",
    tags="python,django,deserialization,pickle,yaml,OWASP-A08,CWE-502",
    message="Request data flows to unsafe deserialization. Use JSON instead of pickle/yaml.",
    owasp="A08:2021",
)
def detect_django_insecure_deserialization():
    """Detects request data flowing to pickle/yaml/dill/shelve deserialization."""
    return flows(
        from_sources=_DJANGO_SOURCES,
        to_sinks=[
            PickleModule.method("loads", "load"),
            YamlModule.method("load", "unsafe_load"),
            DillModule.method("loads", "load"),
            calls("shelve.open"),
        ],
        sanitized_by=[],
        propagates_through=PropagationPresets.standard(),
        scope="global",
    )
```

## How to Fix

- Replace pickle.loads(), dill.loads(), and similar calls with json.loads() for data exchange between systems.
- When YAML is required, always use yaml.safe_load() which restricts parsing to standard YAML types and does not execute Python constructors.
- Never deserialize data from HTTP requests using pickle, dill, or yaml.load() with any Loader that supports Python object construction.
- Store application state and user data in the database using Django's ORM, not as serialized Python objects.
- If pickle must be used for caching or internal IPC, sign the serialized data with HMAC and verify the signature before deserializing to prevent tampering.

## Security Implications

- **Direct Remote Code Execution via Pickle:** Python's pickle module executes __reduce__ methods during deserialization.
A crafted pickle payload can define a __reduce__ that calls os.system(),
subprocess.Popen(), or any other function at deserialization time. This is
one of the most well-known RCE vectors in Python web applications.

- **YAML Deserialization RCE via Python Tags:** yaml.load() with the default or FullLoader can execute Python constructors
using YAML tags like !!python/object/apply:os.system ['whoami']. An attacker
who controls YAML input can execute arbitrary OS commands at load time.
yaml.safe_load() restricts to standard YAML types only and is not vulnerable.

- **dill and shelve Deserialization:** The dill library is a superset of pickle with even more serialization
capabilities, making dill.loads() with user input equally dangerous.
shelve.open() uses pickle internally, so opening shelve databases with
user-controlled keys against user-provided data creates the same risk.

- **Persistent Backdoor Installation:** Deserialization RCE can be used to install persistent backdoors: writing
malicious code to application directories, modifying the codebase, adding
admin accounts, or establishing reverse shell connections that persist
after the initial attack.


## FAQ

**Q: Why is pickle.loads() with user input always critical regardless of context?**

Python's pickle documentation explicitly states: "The pickle module is not
secure. Only unpickle data you trust." pickle.loads() executes __reduce__
methods during deserialization. A crafted payload with a custom __reduce__
can call os.system(), exec(), or any other function. There is no safe way
to call pickle.loads() on data from untrusted sources -- the only fix is
to use a different serialization format.


**Q: What is the difference between yaml.load() and yaml.safe_load()?**

yaml.load() with most Loaders (default, FullLoader) processes YAML Python-specific
tags like !!python/object and !!python/object/apply that instantiate Python objects
and call functions during loading. yaml.safe_load() restricts parsing to standard
YAML types: mappings, sequences, strings, numbers, booleans, null. It raises a
ConstructorError for any !!python/ tags, making it safe for user-controlled input.


**Q: Are there cases where pickle is safe to use in Django applications?**

Pickle is safe for internal use: caching Python objects in Redis or Memcached where
the data never leaves the application's trust boundary, deserializing objects that
the application itself serialized in the same request or trusted internal pipeline,
and scientific computing workflows where data provenance is controlled. The key
requirement is that the serialized data must never be influenced by user input.


**Q: Can signing pickle data with HMAC make it safe to deserialize from users?**

HMAC signing provides integrity protection: it ensures the data was not modified
after signing. If users can receive a signed pickle blob and send it back, HMAC
prevents them from modifying the blob but does not prevent them from replaying
it. More importantly, if users ever obtain the signing key or a signing oracle,
the protection is lost. For external data exchange, JSON is always safer.


**Q: Is Django's built-in session framework vulnerable to this?**

Django's database-backed session backend stores session data as JSON by default,
which is safe. Older Django configurations may use pickle for session serialization
(configured via SESSION_SERIALIZER), but current Django defaults to JSON. If your
settings include SESSION_SERIALIZER = 'django.contrib.sessions.serializers.PickleSerializer',
change it to the JSON serializer immediately.


**Q: How does this vulnerability compare to SQL injection in severity?**

Insecure deserialization via pickle is generally more severe than SQL injection.
SQL injection is limited to database operations (unless the database has OS-level
features). Pickle deserialization executes arbitrary Python code in the application
process directly, with access to all imports, environment variables, the filesystem,
and network. It is equivalent to eval() injection -- direct Remote Code Execution.


## References

- [CWE-502: Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html)
- [OWASP Deserialization Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html)
- [Python pickle security warning](https://docs.python.org/3/library/pickle.html#module-pickle)
- [PyYAML safe_load vs load](https://pyyaml.org/wiki/PyYAMLDocumentation)
- [OWASP Software and Data Integrity Failures](https://owasp.org/Top10/A08_2021-Software_and_Data_Integrity_Failures/)
- [Exploiting Python Deserialization Vulnerabilities](https://davidhamann.de/2020/04/05/exploiting-python-pickle/)

---

Source: https://codepathfinder.dev/registry/python/django/PYTHON-DJANGO-SEC-072
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
