Lambda Remote Code Execution via Pickle Deserialization

CRITICAL

Lambda event data flows to pickle.loads() or pickle.load(), enabling arbitrary Python code execution during deserialization of attacker-controlled bytes.

Rule Information

Language

Python

Interactive Playground

Experiment with the vulnerable code and security rule below. Edit the code to see how the rule detects different vulnerability patterns.

pathfinder scan --ruleset python/PYTHON-LAMBDA-SEC-023 --project .

rule.py

About This Rule

Understanding the vulnerability and how it is detected

This rule detects unsafe deserialization vulnerabilities in AWS Lambda functions where attacker-controlled event data flows into pickle.loads(), pickle.load(), cPickle.loads(), or related pickle deserialization functions.

Python's pickle module serializes and deserializes arbitrary Python objects. The pickle protocol can encode __reduce__() method calls on arbitrary classes, allowing a crafted pickle payload to execute any Python code during deserialization. This is documented behavior: the pickle documentation explicitly states "The pickle module is not secure. Only unpickle data you trust."

Lambda functions that receive pickle-encoded data via the event dictionary are completely compromised if an attacker can control the bytes being deserialized. Event sources including API Gateway request bodies (event.get("body")), SQS message bodies (event["Records"][0]["body"]), S3 object contents fetched based on event metadata, and SNS message bodies are all attacker-controllable in public-facing deployments.

In the Lambda environment, pickle deserialization of attacker-controlled data immediately gives the attacker the full capabilities of the execution environment: the execution role's AWS credentials, boto3 SDK access, the /tmp filesystem, and outbound network access. The attack completes during the deserialization call, before any application logic runs.

Security Implications

Potential attack scenarios if this vulnerability is exploited

Code Execution During Deserialization

Pickle deserialization executes Python code encoded in the payload during the loads() call itself. No application logic needs to run; the attacker's code executes before the deserialized object is even used. This makes pickle deserialization of event data an unconditional RCE with no mitigating conditions.

Immediate AWS Credential Exfiltration

A crafted pickle payload can import os, read the Lambda's environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN), and exfiltrate them via an outbound HTTP request — all within the deserialization call before lambda_handler() returns. The execution role's permissions are compromised immediately.

Persistent Compromise of Warm Execution Environments

Pickle deserialization can modify the Lambda process's global state, install hooks in imported modules, or write files to /tmp that persist across warm invocations. An attacker can install persistence mechanisms that affect all subsequent requests processed by the same execution environment until it is recycled.

SQS and SNS Trigger Exploitation

Lambda functions triggered by SQS queues or SNS topics may process base64-encoded message bodies. If the Lambda decodes and deserializes message bodies with pickle, an attacker who can publish to the SQS queue or SNS topic (which may be publicly accessible) can execute arbitrary code in the Lambda without ever sending an API Gateway request.

How to Fix

Recommended remediation steps

1Replace all pickle deserialization of Lambda event data with json.loads() or another safe serialization format (msgpack, protobuf, avro) that does not execute code during parsing.
2Never use pickle to deserialize data received from any AWS event source (API Gateway, SQS, SNS, S3, DynamoDB Streams) regardless of whether the source appears trusted.
3For internal Lambda-to-Lambda communication that currently uses pickle, replace with JSON serialization or AWS-native formats (EventBridge schema registry, SQS with JSON).
4If pickle must be used for internal state (e.g., ML model objects in /tmp), ensure the pickle data is written by the Lambda itself and never derived from event input.
5Apply least-privilege IAM policies to the Lambda execution role to limit the AWS APIs accessible if exploitation occurs.

Detection Scope

How Code Pathfinder analyzes your code for this vulnerability

This rule performs inter-procedural taint analysis with global scope. Sources are Lambda event dictionary access calls: calls("event.get"), calls("event.__getitem__"), including event.get("body"), event.get("queryStringParameters"), event.get("pathParameters"), and event["Records"]. Sinks are calls("pickle.loads"), calls("pickle.load"), calls("cPickle.loads"), and calls("cPickle.load") with tainted input tracked via .tracks(0). There are no recognized sanitizers for pickle deserialization — any Lambda event data reaching pickle.loads() is a confirmed critical vulnerability. The analysis follows taint through base64.b64decode(), bytes conversions, variable assignments, and module boundaries.

Compliance & Standards

Industry frameworks and regulations that require detection of this vulnerability

OWASP Top 10

A08:2021 - Software and Data Integrity Failures

CWE Top 25

CWE-502 - Deserialization of Untrusted Data

PCI DSS v4.0

Requirement 6.2.4 - protect against injection and deserialization attacks

NIST SP 800-53

SI-10: Information Input Validation; SI-3: Malicious Code Protection

AWS Security Best Practices

Never deserialize untrusted data with pickle; use safe formats

References

External resources and documentation

CWE-502: Deserialization of Untrusted Data OWASP Deserialization Cheat Sheet Python pickle security warning AWS Lambda Security Best Practices OWASP Software and Data Integrity Failures AWS Lambda Execution Environment

Similar Rules

Explore related security rules for Python

CRITICAL

Lambda Code Injection via eval() or exec()

Lambda event data flows to eval() or exec(), enabling arbitrary Python code execution with the full permissions of the Lambda execution environment.

HIGH

Lambda XSS via Tainted HTML Response Body

Lambda event data is embedded directly in an HTML response body returned to API Gateway, enabling Cross-Site Scripting attacks against end users.

Frequently Asked Questions

Common questions about Lambda Remote Code Execution via Pickle Deserialization

Pickle deserialization executes Python code during the loads() call itself, before the deserialized object is returned. Even if input validation runs before the loads() call, the validation must correctly detect all possible malicious pickle payloads to be effective, which is not feasible. The pickle format is complex and attacker-controlled bytes can encode arbitrary __reduce__() calls that execute code. The only safe approach is to never call pickle.loads() on untrusted bytes.

No. SQS queues can receive messages from multiple sources, and queue access policies may be broader than intended. If any entity that can publish to the SQS queue is untrusted (or could be compromised), pickle deserialization of queue messages is unsafe. For internal SQS communication where only specific Lambda functions publish, using JSON is still strongly recommended as defense-in-depth.

For structured data: json.loads() and json.dumps() are safe and human-readable. For binary-efficient formats: msgpack provides compact serialization without code execution. For schema-defined formats: protobuf and Apache Avro with a fixed schema registry prevent arbitrary object deserialization. For ML model exchange between Lambda functions, use ONNX, TensorFlow SavedModel, or joblib only on data written by the Lambda itself (never from event input).

Base64 encoding does not provide any security for pickle data; it is merely a transport encoding. If the Lambda receives base64-encoded bytes and calls base64.b64decode() followed by pickle.loads(), an attacker who can influence the bytes (even indirectly through the upstream service) can execute arbitrary code. Replace the inter-service serialization format with JSON or another safe format.

Cryptographic signatures (e.g., HMAC over the pickle bytes) can verify that the payload was produced by a trusted party, making exploitation much harder if the signing key is secret. However, this approach has significant operational risk: if the signing key is compromised (e.g., via Lambda environment variable exfiltration from a different vulnerability), all signed pickle payloads become exploitable. Using JSON eliminates the risk entirely without key management complexity.

New feature

Get these findings posted directly on your GitHub pull requests

The Lambda Remote Code Execution via Pickle Deserialization rule runs in CI and posts inline review comments on the exact lines — no dashboard, no SARIF viewer.

See how it works

Back to AWS Lambda All Languages →