# PYTHON-DJANGO-SEC-031: Django SSRF via urllib

> **Severity:** HIGH | **CWE:** CWE-918 | **OWASP:** A10:2021

- **Language:** Python
- **Category:** Django
- **URL:** https://codepathfinder.dev/registry/python/django/PYTHON-DJANGO-SEC-031
- **Detection:** `pathfinder scan --ruleset python/PYTHON-DJANGO-SEC-031 --project .`

## Description

This rule detects Server-Side Request Forgery (SSRF) vulnerabilities in Django
applications where untrusted user input from HTTP request parameters flows into
urllib.request.urlopen() or urllib.request.Request() from Python's standard library.

urllib is the standard Python HTTP client and is commonly used in Django applications
that do not have the third-party requests library as a dependency, or in older
codebases. It is subject to the same SSRF risks as any HTTP client: an attacker who
controls the URL can direct server-side requests to cloud metadata endpoints, internal
services, localhost, or file:// URLs.

urllib's urlopen() additionally supports several URL schemes beyond http/https,
including file:// and ftp://, which increases the attack surface compared to some
other HTTP clients. The file:// scheme in particular can be used to read arbitrary
files from the server's filesystem.


## Vulnerable Code

```python
from urllib.request import urlopen

def fetch_url(request):
    url = request.GET.get('url')
    resp = urlopen(url)
    return resp.read()
```

## Secure Code

```python
from django.http import JsonResponse
import urllib.request
from urllib.parse import urlparse
import ipaddress

ALLOWED_DOMAINS = frozenset({'api.partner.com', 'cdn.example.com'})
BLOCKED_NETWORKS = [
    ipaddress.ip_network('10.0.0.0/8'),
    ipaddress.ip_network('172.16.0.0/12'),
    ipaddress.ip_network('192.168.0.0/16'),
    ipaddress.ip_network('169.254.0.0/16'),  # Link-local / cloud metadata
    ipaddress.ip_network('127.0.0.0/8'),     # Loopback
]

def is_safe_url(url):
    parsed = urlparse(url)
    # Block non-http/https schemes including file://, ftp://, gopher://
    if parsed.scheme not in ('http', 'https'):
        return False
    if not parsed.hostname:
        return False
    if parsed.hostname in ALLOWED_DOMAINS:
        return True
    try:
        ip = ipaddress.ip_address(parsed.hostname)
        return not any(ip in net for net in BLOCKED_NETWORKS)
    except ValueError:
        return False  # Unknown hostname not in allowlist

def fetch_resource(request):
    url = request.GET.get('url', '')
    # SECURE: Validate URL before using urllib
    if not is_safe_url(url):
        return JsonResponse({'error': 'URL not permitted'}, status=400)
    with urllib.request.urlopen(url, timeout=10) as response:
        content = response.read(4096).decode('utf-8', errors='replace')
    return JsonResponse({'content': content})

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, flows, QueryType
from codepathfinder.presets import PropagationPresets

class UrllibModule(QueryType):
    fqns = ["urllib.request"]

_DJANGO_SOURCES = [
    calls("request.GET.get"),
    calls("request.POST.get"),
    calls("request.GET"),
    calls("request.POST"),
    calls("request.COOKIES.get"),
    calls("request.FILES.get"),
    calls("*.GET.get"),
    calls("*.POST.get"),
]


@python_rule(
    id="PYTHON-DJANGO-SEC-031",
    name="Django SSRF via urllib",
    severity="HIGH",
    category="django",
    cwe="CWE-918",
    tags="python,django,ssrf,urllib,OWASP-A10,CWE-918",
    message="User input flows to urllib.request.urlopen(). Validate and restrict URLs.",
    owasp="A10:2021",
)
def detect_django_ssrf_urllib():
    """Detects Django request data flowing to urllib calls."""
    return flows(
        from_sources=_DJANGO_SOURCES,
        to_sinks=[
            UrllibModule.method("urlopen", "Request").tracks(0),
        ],
        sanitized_by=[
            calls("urllib.parse.urlparse"),
            calls("validators.url"),
        ],
        propagates_through=PropagationPresets.standard(),
        scope="global",
    )
```

## How to Fix

- Validate user-supplied URLs against a strict allowlist of trusted domains before calling urlopen().
- Block the file://, gopher://, and ftp:// schemes -- accept only http and https.
- Block requests to private IP ranges, loopback addresses, and link-local networks (169.254.0.0/16).
- Resolve hostnames to IPs at validation time to prevent DNS rebinding attacks from bypassing scheme and domain checks.
- Consider replacing urllib with the requests library and its higher-level session management, which makes timeout and redirect controls more explicit.

## Security Implications

- **Cloud Metadata Service Access via Standard Library:** urllib.request.urlopen('http://169.254.169.254/latest/meta-data/iam/...') 
fetches AWS IAM credentials from the instance metadata service. Unlike the
requests library, urllib is always available in any Python environment, making
this vector present in any Django deployment regardless of installed packages.

- **Local File Disclosure via file:// Scheme:** urllib.request.urlopen() supports the file:// URL scheme by default. An attacker
can supply a URL like file:///etc/passwd or file:///app/settings.py to read
arbitrary files from the server filesystem, potentially exposing Django's
SECRET_KEY, database credentials, and other configuration secrets.

- **Internal Service Access and Port Scanning:** The application server can reach internal network services that are not exposed
to the internet. An attacker can use urlopen() to scan internal ports (observing
connection errors vs. successful responses) and access internal HTTP services
without authentication.

- **Gopher Protocol Exploitation:** Some versions of urllib support the gopher:// scheme which can be used for
SSRF attacks against Redis, Memcached, and other services that use text-based
protocols. If gopher:// is accessible, attackers can send arbitrary TCP payloads
to internal services.


## FAQ

**Q: Why does urllib support file:// and is that dangerous in SSRF context?**

Python's urllib.request.urlopen() was designed as a general URL opener supporting
multiple schemes including file://, ftp://, and http://. In an SSRF context,
file:// support is critical to block because it allows attackers to read arbitrary
files from the server using a URL like file:///etc/passwd or file:///app/settings.py.
Always restrict accepted schemes to http and https only.


**Q: How does this differ from PYTHON-DJANGO-SEC-030 targeting the requests library?**

SEC-031 targets Python's standard library urllib.request module, which is
available without any third-party dependencies. SEC-030 targets the requests
library (pip install requests). Both implement SSRF but with different APIs.
Some Django projects use urllib for simple HTTP calls to avoid adding a
dependency; others use requests for more ergonomic HTTP handling. Both need
the same URL validation logic.


**Q: Is urllib used directly in modern Django applications or mainly in legacy code?**

Both. urllib is used directly in Django management commands, middleware, and
utility functions that perform simple HTTP calls without the complexity of
session management. It is also prevalent in older Django applications predating
widespread requests library adoption, and in code that avoids third-party
dependencies for security or licensing reasons.


**Q: Does using urllib.parse.urlparse() to check the scheme prevent SSRF?**

Partial protection only. Checking that the scheme is 'http' or 'https' blocks
file://, gopher://, and ftp:// attacks but does not prevent requests to internal
IP ranges, cloud metadata, or localhost. A complete SSRF defense requires both
scheme validation AND IP range blocking (or strict domain allowlisting with
hostname resolution at validation time).


**Q: Can I use urllib.robotparser or other urllib submodules that internally call urlopen?**

Yes, and those secondary calls create SSRF risk if user input influences what
URL is fetched. Review all urllib submodule usages where user-controlled data
could influence the URL argument, not just direct urlopen() calls.


**Q: What is the performance overhead of adding URL validation before every urlopen() call?**

URL parsing and IP range checking are microsecond-level operations. Network
DNS resolution for hostname-to-IP validation at blocking time adds one DNS
lookup per request, which is typically 1-10ms. This is negligible compared
to the network round-trip of the HTTP request itself. The security benefit
far outweighs the performance cost.


## References

- [CWE-918: Server-Side Request Forgery](https://cwe.mitre.org/data/definitions/918.html)
- [OWASP SSRF Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html)
- [OWASP Server-Side Request Forgery](https://owasp.org/www-community/attacks/Server_Side_Request_Forgery)
- [Python urllib.request documentation](https://docs.python.org/3/library/urllib.request.html)
- [Django Security](https://docs.djangoproject.com/en/stable/topics/security/)
- [PortSwigger SSRF Labs](https://portswigger.net/web-security/ssrf)

---

Source: https://codepathfinder.dev/registry/python/django/PYTHON-DJANGO-SEC-031
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
