# PYTHON-LANG-SEC-103: Regex DoS Risk

> **Severity:** LOW | **CWE:** CWE-1333 | **OWASP:** A06:2021

- **Language:** Python
- **Category:** Python Core
- **URL:** https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-103
- **Detection:** `pathfinder scan --ruleset python/PYTHON-LANG-SEC-103 --project .`

## Description

This rule flags calls to re.compile(), re.match(), re.search(), and re.findall()
as audit items. Regular expressions with nested quantifiers, overlapping
alternatives, or unbounded repetition can cause catastrophic backtracking where
matching time grows exponentially with input length.

An attacker who controls the input string can craft a payload that causes a single
regex match to consume 100% CPU for minutes or hours, effectively creating a
denial-of-service condition. This is known as ReDoS (Regular Expression Denial
of Service).

The rule operates at audit level because not all regex patterns are vulnerable.
Review each flagged pattern for nested quantifiers like (a+)+, (a|a)+, or
(a*)*. Consider using the re2 library for untrusted input, which guarantees
linear-time matching.


## Vulnerable Code

```python
import uuid
import os
import re
import logging
import logging.config

# SEC-103: regex DoS
pattern = re.compile(r"(a+)+$")
re.match(r"(a|b)*c", user_input)
re.search(r"(\d+\.)+", text)
```

## Secure Code

```python
import re

# SAFE: Simple pattern without nested quantifiers
email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

# SAFER: Use re2 for untrusted input (linear-time guarantee)
# import re2
# pattern = re2.compile(r'complex_pattern')

# SAFE: Set a timeout with re.match
# Python 3.11+ supports timeout parameter
# re.match(pattern, user_input, timeout=1.0)

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, QueryType

class ReModule(QueryType):
    fqns = ["re"]


@python_rule(
    id="PYTHON-LANG-SEC-103",
    name="Regex DoS Risk",
    severity="LOW",
    category="lang",
    cwe="CWE-1333",
    tags="python,regex,redos,denial-of-service,CWE-1333",
    message="re.compile/match/search detected. Audit regex patterns for catastrophic backtracking.",
    owasp="A06:2021",
)
def detect_regex_dos():
    """Detects re.compile/match/search calls — audit for regex DoS."""
    return ReModule.method("compile", "match", "search", "findall")
```

## How to Fix

- Audit regex patterns for nested quantifiers like (a+)+, (a|a)+, or (.*a){10}
- Use atomic groups or possessive quantifiers where supported to prevent backtracking
- Consider the re2 library (google-re2) for patterns that process untrusted input, as it guarantees linear-time matching
- Set input length limits before applying regex patterns to user-controlled strings
- Use Python 3.11+ timeout parameter in re.match() and re.search() for defense in depth

## Security Implications

- **Denial of Service via Catastrophic Backtracking:** A regex pattern like ^(a+)+$ takes exponential time on inputs like
"aaaaaaaaaaaaaaaaab". Each additional 'a' doubles the matching time. An
attacker can send a short string that locks up a worker thread for minutes.

- **Application Unavailability:** In web applications, a ReDoS attack ties up request-handling threads. A
handful of crafted requests can exhaust the thread pool and make the
application unresponsive to legitimate users.

- **Amplification in Validation Logic:** Regex patterns in input validation (email, URL, phone number) are common
ReDoS targets because they process user-controlled input directly. A
vulnerability in a validation regex affects every request that triggers it.


## FAQ

**Q: Does every regex pattern have a ReDoS risk?**

No. Simple patterns without nested quantifiers or overlapping alternatives
are safe. The risk comes from patterns like (a+)+, (a|a)+, or (.+)+
where the engine can try exponentially many matching paths.


**Q: How do I test if my regex is vulnerable?**

Try matching your pattern against a string that almost matches but doesn't.
For example, if your pattern expects a valid email, try a long string of
valid-looking characters followed by an invalid character. If matching time
grows noticeably with input length, the pattern is vulnerable.


**Q: Does Python's re module support timeouts?**

Python 3.11+ added a timeout parameter to re.match(), re.search(), and
other functions. For older versions, run regex matching in a separate
thread with a timeout or use the google-re2 library.


**Q: Why is this rule LOW severity?**

Not all regex usage is vulnerable. This is an audit rule that flags regex
calls for review. The actual severity depends on whether the pattern has
nested quantifiers and whether the input is user-controlled.


**Q: How do I run this rule in CI/CD?**

Run: pathfinder ci --ruleset python/lang --project .


## References

- [CWE-1333: Inefficient Regular Expression Complexity](https://cwe.mitre.org/data/definitions/1333.html)
- [OWASP ReDoS Prevention](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)
- [Python re module documentation](https://docs.python.org/3/library/re.html)
- [Google RE2 library](https://github.com/google/re2)

---

Source: https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-103
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
