# PYTHON-CRYPTO-SEC-013: Insecure MD4 Hash (PyCryptodome)

> **Severity:** HIGH | **CWE:** CWE-327, CWE-328 | **OWASP:** A02:2021

- **Language:** Python
- **Category:** Cryptography
- **URL:** https://codepathfinder.dev/registry/python/cryptography/PYTHON-CRYPTO-SEC-013
- **Detection:** `pathfinder scan --ruleset python/PYTHON-CRYPTO-SEC-013 --project .`

## Description

Detects usage of MD4 via PyCryptodome's `Crypto.Hash.MD4.new()` or `Cryptodome.Hash.MD4.new()`
constructor. MD4 was designed by Ron Rivest in 1990 as a faster predecessor to MD5. By 1995,
Hans Dobbertin had demonstrated full collision attacks against MD4. Today, MD4 collisions
can be computed in milliseconds on a laptop using algebraic attacks. There is no scenario
in which MD4 provides meaningful security.

MD4 appears in legacy protocols including NTLMv1/NTLMv2 Windows authentication (as a
component) and older LAN Manager hashes. Its presence in PyCryptodome exists solely to
support parsing and interoperability with these legacy systems, not for new security
implementations.

This rule is rated HIGH severity because MD4 is more completely broken than MD5 — the
attack complexity is lower, preimage resistance is weaker, and no context exists where
MD4 is an acceptable security primitive today. If MD4 appears in a codebase, it should
be treated as a critical finding requiring immediate remediation or explicit documented
justification (e.g., NTLM protocol compatibility with a defined migration timeline).


## Vulnerable Code

```python
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.backends import default_backend

# SEC-013: MD4 in PyCryptodome
from Crypto.Hash import MD4
h_md4 = MD4.new(b"data")
```

## Secure Code

```python
from Crypto.Hash import SHA256, SHA3_256

# SECURE: SHA-256 for general integrity checking
h = SHA256.new()
h.update(b"data to hash")
digest = h.hexdigest()

# SECURE: SHA-3 for applications requiring stronger collision resistance
h = SHA3_256.new()
h.update(b"document bytes")
digest = h.hexdigest()

# SECURE: HMAC with SHA-256 for message authentication codes
from Crypto.Hash import HMAC
import os
key = os.urandom(32)
mac = HMAC.new(key, digestmod=SHA256)
mac.update(b"message to authenticate")
tag = mac.hexdigest()

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, flows, QueryType
from codepathfinder.presets import PropagationPresets

class PyCryptoHashMD4(QueryType):
    fqns = ["Crypto.Hash.MD4", "Cryptodome.Hash.MD4"]


@python_rule(
    id="PYTHON-CRYPTO-SEC-013",
    name="Insecure MD4 Hash (PyCryptodome)",
    severity="HIGH",
    category="cryptography",
    cwe="CWE-327",
    tags="python,pycryptodome,md4,weak-hash,CWE-327",
    message="MD4 is severely broken. Use SHA-256 or SHA-3 instead.",
    owasp="A02:2021",
)
def detect_md4_hash_pycrypto():
    """Detects MD4 in PyCryptodome."""
    return PyCryptoHashMD4.method("new")
```

## How to Fix

- Replace Crypto.Hash.MD4.new() with Crypto.Hash.SHA256.new() for all integrity and authentication use cases — SHA-256 provides 256-bit collision resistance with no known weaknesses.
- For password hashing, use Argon2id (argon2-cffi), bcrypt, or scrypt — not SHA-256 or any raw hash function, as these are too fast for password storage.
- If MD4 is required by a legacy protocol such as NTLM, document the usage explicitly with a migration plan and compensating controls (e.g., require modern NTLMv2 negotiation, enforce network-level authentication).
- Audit all callers of Crypto.Hash.MD4 to determine whether they are in a security-sensitive code path — given MD4's total brokenness, even non-obvious uses (e.g., as a PRF component) pose risk.
- For message authentication, use Crypto.Hash.HMAC with SHA-256 as the digest module.

## Security Implications

- **undefined:** 
- **undefined:** 
- **undefined:** 
- **undefined:** 

## FAQ

**Q: Is MD4 ever safe to use?**

No. Unlike MD5 or SHA-1, there is no non-security context in which MD4 provides a meaningful property. Even for trivial checksums it is better to use a non-broken algorithm for future-proofing. The only legitimate reason to call Crypto.Hash.MD4.new() is to implement or parse a legacy protocol (e.g., NTLM) that you cannot change — and even then, a migration timeline should be documented.

**Q: My application uses NTLM authentication and needs MD4 — what should I do?**

If you are implementing NTLM for Windows domain authentication compatibility, document the MD4 usage explicitly as a protocol-mandated legacy dependency. Ensure NTLMv2 is enforced (not NTLMv1), apply the defense-in-depth controls recommended by Microsoft, and plan migration to Kerberos or modern authentication protocols. Do not extend the use of MD4 beyond the minimum required for protocol compliance.

**Q: How does MD4 compare to MD5 in terms of how broken it is?**

MD4 is significantly more broken than MD5. MD5 collisions became practical in 2004 and require seconds; MD4 collisions were demonstrated in 1995 and now take milliseconds using algebraic differential techniques. MD4 also has weaker preimage resistance than MD5. There is no scenario where MD4 is preferable to MD5, and MD5 itself is unacceptable for security.

**Q: Why is this rated HIGH while MD5 is rated MEDIUM?**

The severity difference reflects practical attack cost. MD5 collisions require seconds to minutes of computation and some specialized knowledge. MD4 collisions require milliseconds and are trivially reproducible using published tools. The window for exploiting MD4 is significantly wider, and the algorithm has no redeemable security properties.

**Q: How do I run this rule in CI/CD?**

Run `code-pathfinder scan --ruleset python/cryptography/PYTHON-CRYPTO-SEC-013 --path ./src` in your pipeline. Add `--format sarif` to produce SARIF output compatible with GitHub Advanced Security and similar platforms.

**Q: Does PyCryptodome still include MD4?**

Yes, PyCryptodome includes MD4 for legacy protocol support. Its inclusion does not imply it is safe for new security applications.

**Q: What if I am computing an NT hash for testing NTLM handshakes in a security tool?**

Security tooling that intentionally implements broken algorithms for attack simulation or protocol testing should suppress this finding with a documented suppression comment explaining the context. The suppression should be scoped as narrowly as possible and reviewed in code review.

## References

- [CWE-327: Use of a Broken or Risky Cryptographic Algorithm](https://cwe.mitre.org/data/definitions/327.html)
- [CWE-328: Use of Weak Hash](https://cwe.mitre.org/data/definitions/328.html)
- [Dobbertin 1995: Cryptanalysis of MD4](https://link.springer.com/chapter/10.1007/3-540-60590-8_1)
- [RFC 1320: The MD4 Message-Digest Algorithm (historic)](https://www.rfc-editor.org/rfc/rfc1320)
- [NIST SP 800-131A Rev 2: Transitioning the Use of Cryptographic Algorithms](https://csrc.nist.gov/publications/detail/sp/800-131a/rev-2/final)
- [OWASP Cryptographic Failures (A02:2021)](https://owasp.org/Top10/A02_2021-Cryptographic_Failures/)
- [PyCryptodome documentation: Crypto.Hash.MD4](https://pycryptodome.readthedocs.io/en/latest/src/hash/md4.html)

---

Source: https://codepathfinder.dev/registry/python/cryptography/PYTHON-CRYPTO-SEC-013
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
