# PYTHON-CRYPTO-SEC-012: Insecure MD5 Hash (PyCryptodome)

> **Severity:** MEDIUM | **CWE:** CWE-327, CWE-328 | **OWASP:** A02:2021

- **Language:** Python
- **Category:** Cryptography
- **URL:** https://codepathfinder.dev/registry/python/cryptography/PYTHON-CRYPTO-SEC-012
- **Detection:** `pathfinder scan --ruleset python/PYTHON-CRYPTO-SEC-012 --project .`

## Description

Detects usage of MD5 via the PyCryptodome or PyCrypto library's `Crypto.Hash.MD5.new()`
or `Cryptodome.Hash.MD5.new()` constructor. MD5 produces a 128-bit digest and has been
considered cryptographically broken since 2004 (Wang et al., chosen-prefix collision
attacks). In 2008 the Flame malware exploited MD5 weaknesses in a Windows Update
certificate to execute arbitrary code on patched systems.

PyCryptodome is commonly used in Python projects for cryptographic operations and is the
maintained successor to the original PyCrypto library. Both `Crypto.Hash.MD5` (PyCryptodome
in drop-in compatibility mode) and `Cryptodome.Hash.MD5` (PyCryptodome standalone) are
covered by this rule.

MD5 must not be used for digital signatures, data integrity verification, password hashing,
or HMAC-based authentication. It remains acceptable for non-security checksums such as
cache keys, file deduplication identifiers, or content-addressable storage where an
attacker producing a collision confers no security benefit.


## Vulnerable Code

```python
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.backends import default_backend

# SEC-012: MD5 in PyCryptodome
from Crypto.Hash import MD5
h_md5 = MD5.new(b"data")
```

## Secure Code

```python
from Crypto.Hash import SHA256, SHA3_256

# SECURE: SHA-256 for integrity checking
h = SHA256.new()
h.update(b"data to hash")
digest = h.hexdigest()

# SECURE: SHA-3 for stronger collision resistance
h = SHA3_256.new()
h.update(b"document bytes")
digest = h.hexdigest()

# SECURE: HMAC with SHA-256 for message authentication
from Crypto.Hash import HMAC
import os
key = os.urandom(32)
mac = HMAC.new(key, digestmod=SHA256)
mac.update(b"message to authenticate")
tag = mac.hexdigest()

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, flows, QueryType
from codepathfinder.presets import PropagationPresets

class PyCryptoHashMD5(QueryType):
    fqns = ["Crypto.Hash.MD5", "Cryptodome.Hash.MD5"]


@python_rule(
    id="PYTHON-CRYPTO-SEC-012",
    name="Insecure MD5 Hash (PyCryptodome)",
    severity="MEDIUM",
    category="cryptography",
    cwe="CWE-327",
    tags="python,pycryptodome,md5,weak-hash,CWE-327",
    message="MD5 is cryptographically broken. Use SHA-256 or SHA-3 instead.",
    owasp="A02:2021",
)
def detect_md5_hash_pycrypto():
    """Detects MD5 in PyCryptodome."""
    return PyCryptoHashMD5.method("new")
```

## How to Fix

- Replace Crypto.Hash.MD5.new() with Crypto.Hash.SHA256.new() for all integrity and authentication use cases.
- For password hashing, do not use any raw hash function including SHA-256 — use Argon2 (argon2-cffi), bcrypt, or scrypt which are designed to be slow and memory-intensive.
- For message authentication, use Crypto.Hash.HMAC with SHA-256 as the digest module instead of bare MD5.
- MD5 may remain in non-security contexts (cache keys, deduplication) where collision resistance carries no security consequence — add an explicit comment documenting this intent.
- When migrating stored MD5 checksums (e.g., in a database), rehash with SHA-256 on next verified access and deprecate the MD5 code path with a sunset date.

## Security Implications

- **undefined:** 
- **undefined:** 
- **undefined:** 
- **undefined:** 

## FAQ

**Q: Is MD5 ever safe to use with PyCryptodome?**

MD5 is acceptable for non-security purposes such as cache keys, file deduplication, or content-addressable storage where an attacker benefiting from a collision is not a concern. It must not be used for signatures, integrity verification, password hashing, or authentication in security contexts.

**Q: What is the difference between this rule and PYTHON-CRYPTO-SEC-010?**

PYTHON-CRYPTO-SEC-010 targets MD5 used via the `cryptography` library hazmat interface (hashes.MD5()). This rule (PYTHON-CRYPTO-SEC-012) targets MD5 in PyCryptodome/PyCrypto (Crypto.Hash.MD5.new() or Cryptodome.Hash.MD5.new()). The underlying weakness is identical; the rules differ by library.

**Q: PyCryptodome still includes MD5 — does that mean it is safe?**

PyCryptodome includes MD5 for legacy compatibility and protocol support, not because it is safe for new security-sensitive applications. The presence of an algorithm in a library does not imply it is recommended.

**Q: Why not use SHA-256 for password hashing?**

SHA-256 is fast by design, which is a property attackers exploit for brute-force and dictionary attacks. Use Argon2id (argon2-cffi), bcrypt, or scrypt, which are deliberately slow and memory-intensive to limit attacker throughput.

**Q: Does this rule detect MD5 used inside another algorithm, such as PBKDF1?**

This rule specifically matches calls to Crypto.Hash.MD5.new(). If MD5 is used indirectly inside a higher-level PyCryptodome KDF, it would require a separate rule targeting that KDF.

**Q: How do I run this rule in CI/CD?**

Run `code-pathfinder scan --ruleset python/cryptography/PYTHON-CRYPTO-SEC-012 --path ./src` in your pipeline. Add `--format sarif` to produce SARIF output compatible with GitHub Advanced Security and similar platforms.

**Q: Should I prefer the cryptography library over PyCryptodome?**

The `cryptography` library is generally recommended for new Python projects due to its active maintenance and explicit hazmat separation. PyCryptodome is well-maintained and widely used, but either library is acceptable when strong algorithms (SHA-256, AES-GCM) are used correctly.

## References

- [CWE-327: Use of a Broken or Risky Cryptographic Algorithm](https://cwe.mitre.org/data/definitions/327.html)
- [CWE-328: Use of Weak Hash](https://cwe.mitre.org/data/definitions/328.html)
- [Wang et al. 2004: How to Break MD5 and Other Hash Functions](https://link.springer.com/chapter/10.1007/978-3-540-28628-8_19)
- [Stevens et al. 2009: Short Chosen-Prefix Collisions for MD5](https://link.springer.com/chapter/10.1007/978-3-642-03356-8_8)
- [NIST SP 800-131A Rev 2: Transitioning the Use of Cryptographic Algorithms](https://csrc.nist.gov/publications/detail/sp/800-131a/rev-2/final)
- [NIST SP 800-107: Recommendation for Applications Using Approved Hash Algorithms](https://csrc.nist.gov/publications/detail/sp/800-107/rev-1/final)
- [OWASP Cryptographic Failures (A02:2021)](https://owasp.org/Top10/A02_2021-Cryptographic_Failures/)
- [PyCryptodome documentation: Crypto.Hash.MD5](https://pycryptodome.readthedocs.io/en/latest/src/hash/md5.html)

---

Source: https://codepathfinder.dev/registry/python/cryptography/PYTHON-CRYPTO-SEC-012
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
