# PYTHON-LANG-SEC-032: Insecure Hash via hashlib.new()

> **Severity:** MEDIUM | **CWE:** CWE-327 | **OWASP:** A02:2021

- **Language:** Python
- **Category:** Python Core
- **URL:** https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-032
- **Detection:** `pathfinder scan --ruleset python/PYTHON-LANG-SEC-032 --project .`

## Description

Python's hashlib.new() creates a hash object for the algorithm specified by a string name.
Unlike direct constructors such as hashlib.md5() or hashlib.sha256(), the algorithm name
in hashlib.new() is a runtime string that may be derived from configuration, user input,
or other external sources.

This creates two distinct risks: first, if the algorithm name is derived from untrusted
input, an attacker can select a weak or broken algorithm; second, even in code that uses
hardcoded names, the dynamic nature means the algorithm choice is not visible at the call
site without examining the string value.

All insecure algorithm names (md5, sha1, sha224, sha3_224) should be replaced with sha256,
sha3_256, sha512, or sha3_512. Algorithm names should never be derived from user input.


## Vulnerable Code

```python
import hashlib

# SEC-032: hashlib.new with insecure algo
h = hashlib.new("md5", b"data")
```

## Secure Code

```python
import hashlib

# INSECURE: hashlib.new() with insecure or dynamic algorithm
# h = hashlib.new("md5", data)
# h = hashlib.new(config["hash_algorithm"], data)  # dynamic!

# SECURE: Use direct constructors with strong algorithms
def hash_data(data: bytes) -> str:
    return hashlib.sha256(data).hexdigest()

# SECURE: If algorithm must be configurable, validate against an allowlist
ALLOWED_HASH_ALGORITHMS = {"sha256", "sha384", "sha512", "sha3_256", "sha3_512", "blake2b"}

def hash_with_config(data: bytes, algorithm: str) -> str:
    if algorithm not in ALLOWED_HASH_ALGORITHMS:
        raise ValueError(f"Hash algorithm not allowed: {algorithm}")
    return hashlib.new(algorithm, data).hexdigest()

# SECURE: Use direct constructors for clarity and static analysis
def compute_file_hash(data: bytes) -> str:
    return hashlib.sha3_256(data).hexdigest()

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, QueryType

class HashlibModule(QueryType):
    fqns = ["hashlib"]


@python_rule(
    id="PYTHON-LANG-SEC-032",
    name="Insecure Hash via hashlib.new()",
    severity="MEDIUM",
    category="lang",
    cwe="CWE-327",
    tags="python,weak-hash,hashlib-new,CWE-327",
    message="hashlib.new() with insecure algorithm. Use SHA-256 or SHA-3.",
    owasp="A02:2021",
)
def detect_hashlib_new_insecure():
    """Detects hashlib.new() which may use insecure algorithms."""
    return HashlibModule.method("new")
```

## How to Fix

- Replace hashlib.new("md5"), hashlib.new("sha1"), and hashlib.new("sha224") with hashlib.sha256() or hashlib.sha3_256().
- If the hash algorithm must be configurable, validate the algorithm name against an explicit allowlist of strong algorithms before calling hashlib.new().
- Never derive the algorithm name from user input, HTTP parameters, or database values.
- Prefer direct constructors (hashlib.sha256()) over hashlib.new("sha256") for clarity and to ensure static analysis tools can detect the algorithm.
- Document the cryptographic purpose of each hash operation to make algorithm selection auditable.

## Security Implications

- **Dynamic Algorithm Selection Attack:** If the algorithm name passed to hashlib.new() is derived from user input or external
configuration, an attacker can select a broken algorithm (md5, sha1) to weaken
integrity checks or make hashes easier to preimage or collide.

- **Weak Algorithm via Hardcoded Insecure Name:** Code calling hashlib.new("md5") or hashlib.new("sha1") has the same cryptographic
weaknesses as calling hashlib.md5() or hashlib.sha1() directly. The runtime string
form may escape static analysis tools that only check direct constructor calls.

- **Algorithm Downgrade in Protocol Negotiation:** In protocols that negotiate hash algorithms and use hashlib.new() to instantiate
the negotiated algorithm, an attacker performing a downgrade attack can negotiate
a weak algorithm, undermining the security of the entire protocol session.

- **Inconsistent Algorithm Enforcement:** Using hashlib.new() with algorithm names from configuration makes it difficult to
audit which algorithms are in use. Different deployments may use different algorithms,
creating inconsistent security guarantees across environments.


## FAQ

**Q: Why flag hashlib.new() when the algorithm string might be sha256?**

hashlib.new() is flagged regardless of the algorithm string because: (1) the algorithm
may be a variable rather than a literal, requiring review to confirm it is always a
strong algorithm; (2) even with a strong algorithm, the dynamic form may be confused
with an insecure one in code review; (3) direct constructors are clearer and more
accessible to static analysis.


**Q: What algorithms are safe to use with hashlib.new()?**

Safe algorithms include sha256, sha384, sha512, sha3_224, sha3_256, sha3_384, sha3_512,
blake2b, and blake2s. Insecure algorithms include md5, sha1, sha224, and md4. Note that
sha3_224 and sha224 provide only 112-bit security against collision attacks, which is
below modern recommendations.


**Q: Can the algorithm name in hashlib.new() come from user input safely?**

Only if validated against a strict allowlist of approved algorithm names. Never pass
a raw user input string to hashlib.new() without validation, as users could request
md5, sha1, or an unsupported algorithm name causing an error or using a weak hash.


**Q: What is the difference between hashlib.new("sha256") and hashlib.sha256()?**

They produce identical results. hashlib.sha256() is more explicit, more readable,
and easier for static analysis tools to identify. hashlib.new("sha256") is useful
when the algorithm name needs to be determined at runtime. For hardcoded algorithms,
always prefer the direct constructor form.


**Q: Does hashlib.new() support all algorithms available on the OpenSSL backend?**

Yes. hashlib.new() can access all hash algorithms provided by the underlying OpenSSL
library, not just the ones with dedicated constructors. This means it can use algorithms
not available as direct constructors but may also inadvertently use deprecated or
experimental OpenSSL algorithms. Use hashlib.algorithms_guaranteed for the list of
always-available algorithms.


**Q: How do I find all hashlib.new() calls using insecure algorithms in my codebase?**

Run Code Pathfinder's PYTHON-LANG-SEC-032 rule to find all hashlib.new() calls. Then
manually review each call to check whether the algorithm name is a hardcoded string
and whether it is a strong algorithm. Supplement with a text search for hashlib.new("md5"),
hashlib.new("sha1"), and hashlib.new("sha224") to find obviously weak usages.


## References

- [CWE-327: Use of a Broken or Risky Cryptographic Algorithm](https://cwe.mitre.org/data/definitions/327.html)
- [Python docs: hashlib.new()](https://docs.python.org/3/library/hashlib.html#hashlib.new)
- [OWASP Cryptographic Storage Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cryptographic_Storage_Cheat_Sheet.html)
- [NIST SP 800-131A Revision 2](https://csrc.nist.gov/publications/detail/sp/800-131a/rev-2/final)
- [OWASP Top 10 A02:2021 Cryptographic Failures](https://owasp.org/Top10/A02_2021-Cryptographic_Failures/)

---

Source: https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-032
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
