# PYTHON-JWT-SEC-005: JWT User Input in Payload

> **Severity:** LOW | **CWE:** CWE-522, CWE-200 | **OWASP:** A02:2021, A04:2021

- **Language:** Python
- **Category:** JWT
- **URL:** https://codepathfinder.dev/registry/python/jwt/PYTHON-JWT-SEC-005
- **Detection:** `pathfinder scan --ruleset python/PYTHON-JWT-SEC-005 --project .`

## Description

This rule uses taint analysis to trace user-controlled data from web framework request
objects into jwt.encode() calls. When user input flows directly into a JWT payload
without validation, two things can go wrong: the user might submit sensitive data
(passwords, personal information) that ends up base64-encoded in a readable token,
or an attacker might inject claims that affect authorization decisions.

Unlike the other JWT rules that use pattern matching, this rule tracks actual data
flow. It starts from Flask request methods (request.args.get, request.form.get) and
Django request methods (request.GET.get, request.POST.get), follows the data through
variable assignments, and flags when it reaches jwt.encode(). It uses standard
propagation presets to track how data moves through the code.

This is the most precise JWT rule in the set -- it only fires when there's a proven
data flow from user input to token creation. No data flow, no finding.


## Vulnerable Code

```python
from flask import Flask, request
import jwt

app = Flask(__name__)

@app.route('/token')
def create_token():
    # Vulnerable: user input flows directly into JWT payload
    user_data = request.args.get('user')
    return jwt.encode({"sub": user_data}, "key", algorithm="HS256")
```

## Secure Code

```python
from flask import Flask, request
import jwt
import os

app = Flask(__name__)
SECRET = os.environ["JWT_SECRET_KEY"]

@app.route('/token')
def create_token():
    username = request.args.get('username')

    # SECURE: Validate input, then use only known-safe identifiers
    user = db.get_user_by_username(username)
    if not user:
        return {"error": "User not found"}, 404

    # SECURE: Build payload from validated server-side data, not raw input
    token = jwt.encode(
        {"user_id": user.id, "role": user.role, "exp": expires_at},
        SECRET,
        algorithm="HS256"
    )
    return {"token": token}

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, flows, QueryType
from codepathfinder.presets import PropagationPresets

class JWTModule(QueryType):
    fqns = ["jwt"]


@python_rule(
    id="PYTHON-JWT-SEC-005",
    name="JWT Exposed Data (Audit)",
    severity="LOW",
    category="jwt",
    cwe="CWE-522",
    tags="python,jwt,data-exposure,audit,CWE-522",
    message="Data passed to jwt.encode(). JWT payloads are base64-encoded, not encrypted.",
    owasp="A02:2021",
)
def detect_jwt_exposed_data():
    """Audit: detects any data flowing into jwt.encode()."""
    return flows(
        from_sources=[
            calls("request.GET.get"),
            calls("request.POST.get"),
            calls("request.args.get"),
            calls("request.form.get"),
        ],
        to_sinks=[
            JWTModule.method("encode"),
        ],
        sanitized_by=[],
        propagates_through=PropagationPresets.standard(),
        scope="local",
    )
```

## How to Fix

- Never pass raw user input directly into jwt.encode() -- validate it against your database or business logic first
- Build JWT payloads from server-side data, not from request parameters -- look up the user by their input, then use the server's record of their ID and role
- Whitelist which fields can appear in the JWT payload and reject anything else
- Validate input types and lengths before including them in any token
- Consider whether the data needs to be in the token at all -- can the consuming service look it up by user_id instead?

## Security Implications

- **Sensitive Data Exposure via Token:** If a user submits their password, SSN, or credit card number through a form,
and that data flows into jwt.encode(), it ends up in a JWT that's visible to
anyone who intercepts it. JWTs are not encrypted -- the payload is just
base64-encoded JSON.

- **Claim Injection:** If user input becomes a JWT claim without validation, an attacker could inject
authorization-relevant claims. For example, if the "role" field comes from user
input, the attacker can set it to "admin". The token will be validly signed
because the server created it, but with attacker-controlled claims.

- **Token Size Attacks:** Unsanitized user input in JWT payloads can make tokens arbitrarily large.
Large tokens can cause issues with cookie size limits (4KB), header size limits
in proxies and load balancers, and increased bandwidth usage.


## FAQ

**Q: How does this rule differ from a simple grep for jwt.encode?**

This rule uses taint analysis -- it tracks where the data comes from, not just
where it ends up. It starts at Flask/Django request methods, follows the data
through variable assignments, and only flags when that user-controlled data
actually reaches jwt.encode(). If jwt.encode() uses server-side data that
doesn't come from user input, this rule won't flag it.


**Q: Does this rule work across multiple files?**

Currently it uses scope=local, which tracks data flow within a single function.
If the Flask route calls a helper function that does the jwt.encode(), the flow
won't be detected at local scope. Switch to scope=global for cross-function
tracking, but note this requires the inter-procedural analysis engine.


**Q: What request methods does this rule track as sources?**

Flask: request.args.get, request.form.get. Django: request.GET.get,
request.POST.get. These cover query parameters and form data. To add more
sources (request.json, request.cookies, request.headers), modify the
from_sources list in the rule file.


**Q: Is it always wrong to put user input in a JWT?**

Not always, but it requires care. A validated username that maps to a known user
is fine -- as long as you look up the user server-side and use your own data for
the payload (user.id, user.role). What's dangerous is taking raw input like
request.args.get("data") and putting it directly into the token without
validation.


**Q: How do I run this rule in CI/CD?**

Run: pathfinder ci --ruleset python/jwt --project .
It outputs SARIF, JSON, or CSV. On GitHub, it posts inline review comments directly
on pull requests pointing to the exact lines. No dashboard needed.


**Q: Can I add custom sources and sinks to this rule?**

Yes. The rule file is plain Python. Fork it and add your custom input methods to
from_sources (e.g., calls("request.json.get"), calls("request.cookies.get")),
your encoding functions to to_sinks, or your validation functions to sanitized_by.
You can version-control the modified rule alongside your app code.


**Q: Why is this rule LOW severity?**

The data exposure depends on what the user submits. If it's just a username,
the risk is low. If it's a password or credit card number, the risk is high.
Since the rule can't determine what data is in the input, it's classified as
LOW to avoid alert fatigue. The finding still deserves review to confirm what
data flows through.


## References

- [CWE-522: Insufficiently Protected Credentials](https://cwe.mitre.org/data/definitions/522.html)
- [CWE-200: Exposure of Sensitive Information](https://cwe.mitre.org/data/definitions/200.html)
- [RFC 8725: JSON Web Token Best Current Practices](https://tools.ietf.org/html/rfc8725)
- [OWASP Input Validation Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html)
- [Flask Request Object Documentation](https://flask.palletsprojects.com/en/latest/api/#flask.Request)
- [PyJWT Documentation](https://pyjwt.readthedocs.io/en/stable/)

---

Source: https://codepathfinder.dev/registry/python/jwt/PYTHON-JWT-SEC-005
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
