# PYTHON-LANG-SEC-061: Insecure urllib.request.urlopen() Usage

> **Severity:** MEDIUM | **CWE:** CWE-319 | **OWASP:** A02:2021

- **Language:** Python
- **Category:** Python Core
- **URL:** https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-061
- **Detection:** `pathfinder scan --ruleset python/PYTHON-LANG-SEC-061 --project .`

## Description

urllib.request.urlopen() is Python's standard library function for making HTTP and HTTPS
requests. When called with an HTTP (non-HTTPS) URL, all data is transmitted in plaintext.
When called with an HTTPS URL but with an insecure SSL context (e.g., one created with
ssl._create_unverified_context()), certificate verification is bypassed.

Unlike the requests library, urllib.request.urlopen() does not provide as clear a safety
indicator for certificate verification. The SSL context must be explicitly created and
passed, making it easier to accidentally use an insecure context.

This rule audits all urlopen() calls to ensure HTTPS URLs are used and that custom SSL
contexts passed to the function do not disable certificate verification.


## Vulnerable Code

```python
import requests as http_requests
import urllib.request
import ftplib
import telnetlib

# SEC-061: urllib insecure
urllib.request.urlopen("http://example.com")
urllib.request.urlretrieve("http://example.com/file", "local.txt")
```

## Secure Code

```python
import ssl
import urllib.request

# INSECURE: HTTP URL or unverified context
# response = urllib.request.urlopen("http://api.example.com/data")
# response = urllib.request.urlopen(url, context=ssl._create_unverified_context())

# SECURE: HTTPS URL with default SSL context (certificate verification enabled)
def fetch_data(url: str) -> bytes:
    if not url.startswith("https://"):
        raise ValueError("Only HTTPS URLs are allowed")
    ctx = ssl.create_default_context()
    with urllib.request.urlopen(url, context=ctx, timeout=30) as response:
        return response.read()

# SECURE: With custom CA for corporate environments
def fetch_with_custom_ca(url: str, ca_cert: str) -> bytes:
    if not url.startswith("https://"):
        raise ValueError("Only HTTPS URLs are allowed")
    ctx = ssl.create_default_context(cafile=ca_cert)
    with urllib.request.urlopen(url, context=ctx, timeout=30) as response:
        return response.read()

# PREFERRED: Use the requests library for simpler, safer HTTP
import requests

def fetch_resource(url: str) -> bytes:
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    return response.content

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, QueryType

class UrllibModule(QueryType):
    fqns = ["urllib.request"]


@python_rule(
    id="PYTHON-LANG-SEC-061",
    name="Insecure urllib.urlopen",
    severity="MEDIUM",
    category="lang",
    cwe="CWE-319",
    tags="python,urllib,http,insecure-transport,CWE-319",
    message="urllib.request.urlopen() detected. Ensure HTTPS URLs are used.",
    owasp="A02:2021",
)
def detect_urllib_insecure():
    """Detects urllib.request.urlopen and urlretrieve calls."""
    return UrllibModule.method("urlopen", "urlretrieve")
```

## How to Fix

- Ensure all urlopen() calls use https:// URLs; validate the URL scheme before calling urlopen().
- Always pass a timeout argument to urlopen() to prevent hanging connections from causing resource exhaustion.
- Pass an explicitly created ssl.create_default_context() to urlopen() for HTTPS requests to ensure certificate verification.
- Consider using the requests library instead, which has cleaner API, better defaults, and easier certificate configuration.
- Never pass ssl._create_unverified_context() or any context with verify_mode=CERT_NONE to urlopen().

## Security Implications

- **Plaintext HTTP Transmission:** When urlopen() is called with http:// URLs, all data including headers, authentication,
and request/response bodies is transmitted without encryption, exposing it to network
observers and MITM attackers.

- **Insecure SSL Context Bypass:** Passing an SSL context created with ssl._create_unverified_context() to urlopen()
disables certificate verification. The connection is encrypted but the server is not
authenticated, enabling MITM attacks.

- **Redirect Following to HTTP:** urllib.request.urlopen() follows HTTP redirects by default. An HTTPS connection that
redirects to an HTTP URL may transmit the subsequent request (including cookies and
headers) in plaintext.

- **No Timeout Default:** urlopen() has no timeout by default, which can cause the application to hang
indefinitely on slow or unresponsive servers, potentially enabling denial-of-service
through resource exhaustion.


## FAQ

**Q: Does urllib.request.urlopen() verify certificates by default for HTTPS URLs?**

Yes. Since Python 3.4.3, urllib.request.urlopen() with an HTTPS URL verifies
certificates using the system CA store by default. However, if a custom SSL context
is passed (especially one created with ssl._create_unverified_context()), verification
is whatever the context specifies. Always use ssl.create_default_context() for custom
contexts to maintain the secure default.


**Q: Should I use urllib.request or the requests library?**

For application code, the requests library provides a cleaner API with better defaults
for authentication, session management, retries, and certificate configuration. urllib.request
is more appropriate for scripts where minimizing dependencies is important or when
using urllib-specific features like custom handlers and openers.


**Q: How do I add authentication headers with urllib.request?**

Use urllib.request.Request() with a headers dict: req = Request(url, headers={"Authorization": f"Bearer {token}"}), then pass it to urlopen(). Alternatively, use urllib.request.HTTPBasicAuthHandler for basic auth, or the requests library which makes header management much simpler.


**Q: What timeout value should I use for urlopen()?**

Use a timeout appropriate for the expected response time: 10-30 seconds for typical
web APIs, up to 60-120 seconds for large file downloads or slow services. Always
handle socket.timeout exceptions. Never omit the timeout parameter in production code.


**Q: Does this rule flag urlopen() for local file:// URLs?**

urllib.request.urlopen() also supports file:// URLs for local filesystem access.
Local file URLs do not involve network transmission and have different security
considerations (path traversal). This rule focuses on HTTP/HTTPS security, but
file:// URLs in urlopen() should also be reviewed for path traversal risks.


**Q: How does urllib.request.urlretrieve() compare in security?**

urllib.request.urlretrieve() has the same HTTP/HTTPS considerations as urlopen()
and is additionally deprecated in Python 3. Replace it with urlopen() or the
requests library. PYTHON-LANG-SEC-061 covers urlretrieve() as well.


## References

- [CWE-319: Cleartext Transmission of Sensitive Information](https://cwe.mitre.org/data/definitions/319.html)
- [Python docs: urllib.request.urlopen()](https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen)
- [OWASP TLS Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Transport_Layer_Security_Cheat_Sheet.html)
- [Python docs: ssl.create_default_context()](https://docs.python.org/3/library/ssl.html#ssl.create_default_context)
- [OWASP Top 10 A02:2021 Cryptographic Failures](https://owasp.org/Top10/A02_2021-Cryptographic_Failures/)

---

Source: https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-061
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
