Insecure MD5 Hash (cryptography)

MEDIUM

MD5 is cryptographically broken due to collision attacks since 2004. Use SHA-256 or SHA-3 instead.

Rule Information

Language

Python

Interactive Playground

Experiment with the vulnerable code and security rule below. Edit the code to see how the rule detects different vulnerability patterns.

pathfinder scan --ruleset python/PYTHON-CRYPTO-SEC-010 --project .

rule.py

About This Rule

Understanding the vulnerability and how it is detected

Detects usage of MD5 via the `cryptography` library's hazmat primitives interface (`hashes.MD5()`). MD5 produces a 128-bit digest and has been considered cryptographically broken since 2004 when Wang et al. demonstrated practical chosen-prefix collision attacks. By 2008, rogue CA certificates were forged using MD5 collisions in under hours of computation. Today, MD5 collisions can be produced in seconds on commodity hardware.

MD5 must not be used for digital signatures, certificate validation, HMAC-based authentication, or data integrity verification in security contexts. It remains acceptable for non-security purposes such as cache keys, file deduplication, or content-addressable storage where collision resistance is not a security requirement.

This rule specifically targets `cryptography.hazmat.primitives.hashes.MD5` instantiation, which is the hazmat (Hazardous Materials) layer indicating the caller is expected to understand the risks — yet MD5 is still dangerous regardless of the API used.

Security Implications

Potential attack scenarios if this vulnerability is exploited

How to Fix

Recommended remediation steps

1Replace hashes.MD5() with hashes.SHA256() or hashes.SHA3_256() for all integrity and signing use cases.
2For password hashing, do not use any raw hash function — use a memory-hard KDF such as Argon2 (argon2-cffi), bcrypt, or scrypt instead.
3For HMAC authentication, use HMAC with SHA-256 or SHA-3 (cryptography.hazmat.primitives.hmac with hashes.SHA256()).
4MD5 may remain in place for purely non-security uses (cache keys, file deduplication) where collision resistance carries no security consequence — document this explicitly.
5When migrating existing MD5-hashed data (e.g., stored checksums), re-hash with SHA-256 on first verified access and deprecate the MD5 path.

Detection Scope

How Code Pathfinder analyzes your code for this vulnerability

Matches any call to `CryptoHashes.method("MD5")` where `CryptoHashes` is a QueryType resolving fully-qualified names under `cryptography.hazmat.primitives.hashes`. This catches `hashes.MD5()` regardless of how the `hashes` module is imported or aliased. The rule fires on instantiation of the MD5 hash object, not on specific method calls made on the resulting digest object.

Compliance & Standards

Industry frameworks and regulations that require detection of this vulnerability

OWASP Top 10

A02:2021 - Cryptographic Failures

PCI DSS v4.0

Requirement 4.2.1 -- use strong cryptography

NIST SP 800-131A

MD5 and SHA-1 disallowed for digital signatures

NIST SP 800-53

SC-13: Cryptographic Protection

References

External resources and documentation

CWE-327: Use of a Broken or Risky Cryptographic Algorithm CWE-328: Use of Weak Hash Wang et al. 2004: How to Break MD5 and Other Hash Functions Stevens et al. 2009: Short Chosen-Prefix Collisions for MD5 NIST SP 800-131A Rev 2: Transitioning the Use of Cryptographic Algorithms NIST SP 800-107: Recommendation for Applications Using Approved Hash Algorithms OWASP Cryptographic Failures (A02:2021)cryptography library hazmat hashes documentation

Similar Rules

Explore related security rules for Python

HIGH

RC4 (ARC4) Cipher Usage via cryptography Library

Detects use of the RC4 stream cipher through the cryptography library's ARC4 algorithm, which has known keystream biases and is prohibited by RFC 7465.

HIGH

RC4 (ARC4) Cipher Usage via PyCryptodome

Detects use of the RC4 stream cipher through PyCryptodome's ARC4 module, which has known keystream biases and is prohibited by RFC 7465.

HIGH

Blowfish Cipher Usage via cryptography Library

Detects use of the Blowfish cipher through the cryptography library, which has a 64-bit block size making it vulnerable to Sweet32 birthday attacks after approximately 32GB of data.

Frequently Asked Questions

Common questions about Insecure MD5 Hash (cryptography)

MD5 is safe for non-security checksums such as file deduplication, cache invalidation keys, or content-addressable storage where an attacker gaining from a collision provides no security benefit. It must not be used for digital signatures, certificate hashing, HMAC, password storage, or any context where collision resistance matters.

SHA-256 (and all raw hash functions) are designed to be fast. Speed is an advantage for an attacker performing brute-force or dictionary attacks. Password hashing requires a deliberately slow, memory-hard function — use Argon2, bcrypt, or scrypt. PBKDF2 with SHA-256 is acceptable when Argon2 is unavailable, but requires at least 600,000 iterations per NIST SP 800-132.

If MD5 is mandated by an external specification you cannot change, document it clearly, isolate the usage, and add compensating controls (e.g., an outer integrity layer using SHA-256 HMAC). Flag the dependency for removal when the protocol allows migration.

No. This rule targets the `cryptography` library's hazmat primitives. For hashlib.md5() detection, see the hashlib-specific rules in this ruleset.

Run `code-pathfinder scan --ruleset python/cryptography/PYTHON-CRYPTO-SEC-010 --path ./src` in your pipeline. Add `--format sarif` to produce SARIF output compatible with GitHub Advanced Security and similar platforms.

MEDIUM reflects that MD5 is context-dependent — collision attacks are practical but require attacker interaction at the point of signing or hashing. Rules targeting MD4 and MD2 are rated HIGH because those algorithms offer no practical security even in constrained scenarios.

Technically yes, but using the hazmat interface for non-security purposes adds unnecessary complexity. Prefer `hashlib.md5()` for checksums to make the non-security intent explicit. The hazmat interface signals cryptographic use, which increases the chance of future misuse.

New feature

Get these findings posted directly on your GitHub pull requests

The Insecure MD5 Hash (cryptography) rule runs in CI and posts inline review comments on the exact lines — no dashboard, no SARIF viewer.

See how it works

Back to Cryptography All Languages →

Insecure MD5 Hash (cryptography)

Rule Information

Interactive Playground

About This Rule

Security Implications

How to Fix

Detection Scope

Compliance & Standards

References

Similar Rules

RC4 (ARC4) Cipher Usage via cryptography Library

RC4 (ARC4) Cipher Usage via PyCryptodome

Blowfish Cipher Usage via cryptography Library

Frequently Asked Questions

Is MD5 ever safe to use?

Why not just use SHA-256 for password hashing too?

I need MD5 for a legacy protocol or format — what should I do?

Does this rule fire on hashlib.md5() from the standard library?

How do I run this rule in CI/CD?

What is the severity and why MEDIUM rather than HIGH?

Can the cryptography library's hazmat MD5 be used safely for non-cryptographic purposes?

Get these findings posted directly on your GitHub pull requests