# PYTHON-DJANGO-SEC-041: Django Path Traversal via os.path.join()

> **Severity:** HIGH | **CWE:** CWE-22 | **OWASP:** A01:2021

- **Language:** Python
- **Category:** Django
- **URL:** https://codepathfinder.dev/registry/python/django/PYTHON-DJANGO-SEC-041
- **Detection:** `pathfinder scan --ruleset python/PYTHON-DJANGO-SEC-041 --project .`

## Description

This rule detects path traversal vulnerabilities in Django applications where untrusted
user input from HTTP request parameters flows into os.path.join() before being used in
file system operations.

os.path.join() is commonly believed to be safe for constructing file paths because it
handles path separator differences. However, os.path.join() has a critical security
behavior: if any component is an absolute path (starts with /), it discards all
previous components and uses the absolute path as the new base. This means
os.path.join('/var/uploads', '/etc/passwd') returns '/etc/passwd'.

Additionally, os.path.join() does not strip or normalize '../' traversal sequences,
so os.path.join('/var/uploads', '../../../etc/passwd') returns a path that resolves
outside the intended directory. These behaviors make os.path.join() with user input
a path traversal vulnerability unless the result is validated with os.path.realpath().


## Vulnerable Code

```python
import os

# SEC-040: path traversal via open
def vulnerable_open(request):
    filename = request.GET.get('file')
    with open(filename) as f:
        return f.read()


# SEC-041: path traversal via os.path.join
def vulnerable_path_join(request):
    user_path = request.GET.get('path')
    full_path = os.path.join('/uploads', user_path)
    with open(full_path) as f:
        return f.read()
```

## Secure Code

```python
from django.http import FileResponse, JsonResponse, Http404
import os

MEDIA_ROOT = '/var/www/app/media'

def serve_user_file(request):
    filename = request.GET.get('filename', '')
    # SECURE step 1: Strip directory traversal sequences with basename
    safe_filename = os.path.basename(filename)
    if not safe_filename:
        raise Http404("File not found")
    # SECURE step 2: Join with base directory
    candidate_path = os.path.join(MEDIA_ROOT, safe_filename)
    # SECURE step 3: Resolve realpath and verify it is within MEDIA_ROOT
    real_candidate = os.path.realpath(candidate_path)
    real_media_root = os.path.realpath(MEDIA_ROOT)
    if not real_candidate.startswith(real_media_root + os.sep):
        raise Http404("File not found")
    if not os.path.isfile(real_candidate):
        raise Http404("File not found")
    return FileResponse(open(real_candidate, 'rb'))

def list_allowed_exports(request):
    export_name = request.GET.get('export', '')
    # SECURE ALTERNATIVE: Use allowlist to eliminate path construction entirely
    ALLOWED_EXPORTS = {'users', 'orders', 'products'}
    if export_name not in ALLOWED_EXPORTS:
        return JsonResponse({'error': 'Export not found'}, status=404)
    filepath = os.path.join('/app/exports', f'{export_name}.csv')
    with open(filepath) as f:
        return JsonResponse({'data': f.read()})

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, flows, QueryType
from codepathfinder.presets import PropagationPresets

_DJANGO_SOURCES = [
    calls("request.GET.get"),
    calls("request.POST.get"),
    calls("request.GET"),
    calls("request.POST"),
    calls("request.COOKIES.get"),
    calls("request.FILES.get"),
    calls("*.GET.get"),
    calls("*.POST.get"),
]


@python_rule(
    id="PYTHON-DJANGO-SEC-041",
    name="Django Path Traversal via os.path.join()",
    severity="HIGH",
    category="django",
    cwe="CWE-22",
    tags="python,django,path-traversal,os-path,OWASP-A01,CWE-22",
    message="User input flows to os.path.join() then to file operations. Validate paths.",
    owasp="A01:2021",
)
def detect_django_path_traversal_join():
    """Detects Django request data in os.path.join() reaching file operations."""
    return flows(
        from_sources=_DJANGO_SOURCES,
        to_sinks=[
            calls("open"),
            calls("os.path.join"),
        ],
        sanitized_by=[
            calls("os.path.realpath"),
            calls("os.path.abspath"),
            calls("os.path.basename"),
        ],
        propagates_through=PropagationPresets.standard(),
        scope="global",
    )
```

## How to Fix

- After os.path.join(), always call os.path.realpath() to resolve symlinks and normalize the path, then verify the result starts with the intended base directory + os.sep.
- Apply os.path.basename() to user-provided filename components before passing them to os.path.join() to strip absolute path and traversal sequences.
- Use an explicit allowlist of permitted filenames or file IDs stored in the database rather than constructing paths from user input at all.
- Avoid accepting file paths as request parameters; instead, accept file IDs that map to paths stored securely in the database.
- When serving media files, use Django's storage API (default_storage.url()) rather than constructing raw filesystem paths.

## Security Implications

- **Absolute Path Override via os.path.join() Semantics:** os.path.join('/uploads', user_input) where user_input is '/etc/passwd' returns
'/etc/passwd' -- the upload directory prefix is silently discarded. This is a
common misunderstanding of os.path.join() safety. Attackers who know this
behavior can supply absolute paths to bypass any attempt to restrict access
to a subdirectory.

- **Relative Traversal via ../ Sequences:** os.path.join('/uploads', '../../../etc/shadow') returns a path that, when
passed to open(), resolves to /etc/shadow. Standard traversal payloads work
unchanged through os.path.join() because the function makes no attempt to
normalize or restrict component values.

- **Symlink-Based Escape Even After basename():** If symbolic links are present within the intended directory, a path constructed
with os.path.join() may resolve outside it even when basename() was applied
first. The realpath() check is essential to catch symlink-based traversal
that escapes the intended directory boundary.

- **Write Access to Critical Files:** If os.path.join() output feeds into open() in write mode, log rotation scripts,
or file deletion operations, path traversal enables overwriting configuration
files, log files, or application code. Writing to .py files in auto-reload
servers causes code execution.


## FAQ

**Q: Why is os.path.join() not safe with user input if it handles separators correctly?**

os.path.join() handles path separators for portability, not for security. Its
documented behavior includes: if an argument is an absolute path (starts with /
on Unix), all preceding components are discarded. And it does not strip ../ 
components. So os.path.join('/safe/dir', '/etc/passwd') returns '/etc/passwd'
and os.path.join('/safe/dir', '../../etc/passwd') returns a path outside /safe/dir.
Neither is safe with user input without realpath verification.


**Q: Does os.path.normpath() prevent path traversal instead of realpath()?**

os.path.normpath() collapses ../ sequences but does not resolve symbolic links.
An attacker can use a symlink within the allowed directory that points outside it.
os.path.realpath() resolves all symlinks to their true filesystem locations,
making the startswith() check reliable. Use realpath() not normpath() for
security-critical path validation.


**Q: Our application uses os.path.join() extensively for static file serving. Is all of that at risk?**

Only usages where user-controlled data is an argument to os.path.join() are at
risk. If the paths are constructed entirely from configuration constants and
hardcoded strings, they are not vulnerable. This rule only flags cases where
tainted data from request parameters reaches os.path.join() arguments.


**Q: Can I use Django's django.utils.safestring or URL encoding to sanitize the filename?**

No. HTML/URL escaping functions are for preventing XSS or URL-encoding issues,
not path traversal. They do not strip ../ or absolute path prefixes from filenames.
The correct sanitizers for path traversal are os.path.basename() and
os.path.realpath() with a directory boundary check.


**Q: What is the best approach for file download features in Django?**

Store files using Django's FileField with UUID-based names (use upload_to with
a callable that generates UUIDs). Serve downloads by looking up the FileField
value from the model (which contains a safe, validated storage path) using the
file ID from the URL, not a user-provided filename. This design eliminates the
need for any path construction from user input.


**Q: How do we handle os.path.join() in custom storage backends?**

Custom storage backends that use os.path.join() for path construction should
implement the same basename + realpath + startswith validation pattern. Django's
built-in FileSystemStorage does this correctly; custom backends should follow
the same pattern. Review any custom storage backend implementations for path
traversal vulnerabilities during security audits.


## References

- [CWE-22: Path Traversal](https://cwe.mitre.org/data/definitions/22.html)
- [OWASP Path Traversal](https://owasp.org/www-community/attacks/Path_Traversal)
- [Python os.path.join() documentation](https://docs.python.org/3/library/os.path.html#os.path.join)
- [Python os.path.realpath() documentation](https://docs.python.org/3/library/os.path.html#os.path.realpath)
- [Django File Storage API](https://docs.djangoproject.com/en/stable/topics/files/)
- [OWASP File Upload Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/File_Upload_Cheat_Sheet.html)

---

Source: https://codepathfinder.dev/registry/python/django/PYTHON-DJANGO-SEC-041
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
