# PYTHON-LANG-SEC-042: jsonpickle Deserialization Detected

> **Severity:** HIGH | **CWE:** CWE-502 | **OWASP:** A08:2021

- **Language:** Python
- **Category:** Python Core
- **URL:** https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-042
- **Detection:** `pathfinder scan --ruleset python/PYTHON-LANG-SEC-042 --project .`

## Description

jsonpickle is a Python library that extends JSON serialization to support arbitrary Python
objects by encoding type information and object state in JSON format. During deserialization,
jsonpickle.decode() uses the type information embedded in the JSON to reconstruct Python
objects, which involves calling __new__() and __setstate__() or __reduce__() on the
reconstructed types.

This means that jsonpickle.decode() can instantiate arbitrary Python classes and execute
their initialization code, making it equivalent to pickle deserialization in terms of
security risk. An attacker who can control the JSON input can craft a payload that calls
os.system() or other dangerous functions during the decode() call.

For data interchange with untrusted parties, use standard json.loads() which only produces
Python built-in types (dict, list, str, int, float, bool, None).


## Vulnerable Code

```python
import pickle
import yaml
import marshal
import shelve

# SEC-042: jsonpickle
import jsonpickle
decoded = jsonpickle.decode('{"py/object": "os.system"}')
```

## Secure Code

```python
import json

# INSECURE: jsonpickle.decode() on untrusted input
# import jsonpickle
# obj = jsonpickle.decode(user_json)

# SECURE: Use standard json.loads() for untrusted data
def parse_api_request(json_str: str) -> dict:
    data = json.loads(json_str)
    if not isinstance(data, dict):
        raise ValueError("Expected JSON object")
    return data

# SECURE: Use pydantic for typed deserialization with validation
from pydantic import BaseModel

class RequestPayload(BaseModel):
    user_id: int
    action: str
    metadata: dict

def parse_typed_request(json_str: str) -> RequestPayload:
    return RequestPayload.model_validate_json(json_str)

# If jsonpickle is needed for internal serialization of trusted objects:
# Use it only between trusted components, never for external input
import jsonpickle

def serialize_for_internal_cache(obj) -> str:
    # ACCEPTABLE: Only used for trusted internal objects, never decoded from external input
    return jsonpickle.encode(obj)

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, QueryType

class PickleModule(QueryType):
    fqns = ["pickle", "_pickle", "cPickle"]

class JsonPickleModule(QueryType):
    fqns = ["jsonpickle"]


@python_rule(
    id="PYTHON-LANG-SEC-042",
    name="jsonpickle Usage Detected",
    severity="HIGH",
    category="lang",
    cwe="CWE-502",
    tags="python,jsonpickle,deserialization,rce,CWE-502",
    message="jsonpickle.decode() detected. jsonpickle can execute arbitrary code. Use json instead.",
    owasp="A08:2021",
)
def detect_jsonpickle():
    """Detects jsonpickle.decode/loads usage."""
    return JsonPickleModule.method("decode", "loads")
```

## How to Fix

- Replace jsonpickle.decode() with json.loads() for all data received from external sources, including API requests, file uploads, and message queue payloads.
- Use pydantic or marshmallow for typed deserialization with validation when structured Python objects are needed from external data.
- If jsonpickle is used for internal serialization between trusted components, ensure the serialized data never flows back from external sources.
- Audit all jsonpickle usage in API handlers, webhook processors, and data import functions.
- Consider migrating from jsonpickle to a schema-defined format (Protocol Buffers, Avro) for internal object serialization.

## Security Implications

- **Arbitrary Class Instantiation:** jsonpickle encodes Python type information as {"py/object": "os.system"} or similar
structures. During decode(), these types are imported and instantiated. An attacker
crafts JSON that triggers instantiation of dangerous classes with malicious arguments.

- **JSON Format Disguise:** jsonpickle data looks like valid JSON, which may bypass security controls that check
content type or basic JSON validity. The malicious payload is hidden within the
structural JSON that appears benign to casual inspection.

- **API Endpoint Exploitation:** REST APIs that use jsonpickle to deserialize request bodies or parameters are directly
exploitable. Any endpoint accepting application/json content that is decoded with
jsonpickle is a remote code execution vector.

- **Cache and Storage Poisoning:** Applications storing jsonpickle-serialized objects in Redis, databases, or message
queues are vulnerable if an attacker can inject data into those stores, as the
deserialization will execute the embedded code when the value is read.


## FAQ

**Q: Is jsonpickle really as dangerous as pickle?**

Yes. jsonpickle uses pickle's __reduce__ mechanism internally to reconstruct objects.
Any object that pickle can deserialize dangerously, jsonpickle can too. The JSON
encoding provides an illusion of safety because JSON is often considered safe, but
jsonpickle's type resolution during decode makes it equivalent to pickle.


**Q: Why would anyone use jsonpickle over standard json?**

jsonpickle handles Python types that JSON cannot represent: datetime, Decimal, sets,
custom classes, numpy arrays, and objects with circular references. It is useful for
debugging, logging complex Python objects, or serializing state for trusted internal
use. The problem arises when it is used to deserialize data from untrusted sources.


**Q: Can I use jsonpickle.decode() with keys=False for safety?**

The keys parameter controls dictionary key handling, not object instantiation safety.
There is no jsonpickle parameter that disables arbitrary class instantiation. If you
need to use jsonpickle for internal purposes, never decode data that originated from
external sources.


**Q: What if I'm using jsonpickle in a Django REST API?**

Any Django REST API endpoint that calls jsonpickle.decode() on request data is
critically vulnerable to remote code execution. Replace with DRF serializers,
pydantic models, or standard json.loads() with manual field validation immediately.


**Q: How do I migrate from jsonpickle to a safe alternative?**

For external data: use json.loads() + pydantic/marshmallow for typed validation.
For internal object serialization: evaluate Protocol Buffers, Apache Avro, or custom
JSON serializers that explicitly handle each supported type. For the transition period,
use jsonpickle only for encoding (never decoding) and migrate decoders first.


**Q: Does this rule detect jsonpickle used through other libraries?**

The rule detects direct calls to jsonpickle.decode() and jsonpickle.loads(). Libraries
that wrap jsonpickle internally may not be detected. Audit your dependencies for
jsonpickle usage and review any library that accepts JSON or dict input and returns
typed Python objects.


## References

- [CWE-502: Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html)
- [jsonpickle documentation](https://jsonpickle.github.io/)
- [OWASP Deserialization Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html)
- [OWASP Top 10 A08:2021 Software and Data Integrity Failures](https://owasp.org/Top10/A08_2021-Software_and_Data_Integrity_Failures/)
- [Python pickle insecurity - the jsonpickle variant](https://owasp.org/www-community/vulnerabilities/Deserialization_of_untrusted_data)

---

Source: https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-042
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
