# PYTHON-CRYPTO-SEC-014: Insecure MD2 Hash (PyCryptodome)

> **Severity:** HIGH | **CWE:** CWE-327, CWE-328 | **OWASP:** A02:2021

- **Language:** Python
- **Category:** Cryptography
- **URL:** https://codepathfinder.dev/registry/python/cryptography/PYTHON-CRYPTO-SEC-014
- **Detection:** `pathfinder scan --ruleset python/PYTHON-CRYPTO-SEC-014 --project .`

## Description

Detects usage of MD2 via PyCryptodome's `Crypto.Hash.MD2.new()` or `Cryptodome.Hash.MD2.new()`
constructor. MD2 was designed by Ron Rivest in 1989 and is the oldest of the MD family.
It was deprecated by the IETF in RFC 6149 in 2011 with the explicit recommendation that
it "SHOULD NOT be used."

MD2 produces a 128-bit digest but uses an 8-bit oriented design with a permutation table
that makes it extremely slow compared to all modern alternatives. Muller (2004) demonstrated
a preimage attack reducing work to 2^104, well below MD2's intended 2^128 security level.
Knudsen and Mathiassen published further weaknesses. The algorithm has no collision
resistance properties appropriate for modern use.

Unlike MD5 which sees legitimate non-security use as a fast checksum, MD2 is so slow and
so thoroughly analyzed for weaknesses that there is no context in which it is preferable
to a modern algorithm. Any appearance of MD2 in a codebase indicates either extreme
legacy code (e.g., old X.509 certificate parsing from early-1990s PKI) or a mistake.

This rule is rated HIGH severity because MD2 combines known cryptographic weaknesses
with zero remaining legitimate use cases and no performance rationale for its selection.


## Vulnerable Code

```python
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.backends import default_backend

# SEC-014: MD2 in PyCryptodome
from Crypto.Hash import MD2
h_md2 = MD2.new(b"data")
```

## Secure Code

```python
from Crypto.Hash import SHA256, SHA3_256

# SECURE: SHA-256 for general integrity checking
h = SHA256.new()
h.update(b"data to hash")
digest = h.hexdigest()

# SECURE: SHA-3 for applications requiring stronger collision resistance
h = SHA3_256.new()
h.update(b"document bytes")
digest = h.hexdigest()

# SECURE: HMAC with SHA-256 for message authentication
from Crypto.Hash import HMAC
import os
key = os.urandom(32)
mac = HMAC.new(key, digestmod=SHA256)
mac.update(b"message to authenticate")
tag = mac.hexdigest()

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, flows, QueryType
from codepathfinder.presets import PropagationPresets

class PyCryptoHashMD2(QueryType):
    fqns = ["Crypto.Hash.MD2", "Cryptodome.Hash.MD2"]


@python_rule(
    id="PYTHON-CRYPTO-SEC-014",
    name="Insecure MD2 Hash (PyCryptodome)",
    severity="HIGH",
    category="cryptography",
    cwe="CWE-327",
    tags="python,pycryptodome,md2,weak-hash,CWE-327",
    message="MD2 is severely broken. Use SHA-256 or SHA-3 instead.",
    owasp="A02:2021",
)
def detect_md2_hash_pycrypto():
    """Detects MD2 in PyCryptodome."""
    return PyCryptoHashMD2.method("new")
```

## How to Fix

- Replace Crypto.Hash.MD2.new() with Crypto.Hash.SHA256.new() — SHA-256 is faster, stronger, and universally supported.
- For password hashing, use Argon2id (argon2-cffi), bcrypt, or scrypt rather than any raw hash function including SHA-256.
- If the code is parsing legacy X.509 certificates that use MD2withRSA signatures, use a modern TLS/PKI library that correctly rejects MD2-signed certificates rather than reimplementing the verification.
- Audit all call sites thoroughly — any code using MD2 is likely very old and may contain other cryptographic weaknesses worth reviewing.
- Use Crypto.Hash.HMAC with SHA-256 as the digest module for message authentication codes rather than any MD-family algorithm.

## Security Implications

- **undefined:** 
- **undefined:** 
- **undefined:** 
- **undefined:** 

## FAQ

**Q: Is MD2 ever safe to use?**

No. MD2 has known preimage attacks, no performance advantage, and was formally deprecated by RFC 6149 in 2011. There is no context — including non-security checksums — where MD2 is preferable to a modern algorithm. Replace all uses unconditionally.

**Q: How is MD2 different from MD4 and MD5?**

MD2 is the oldest (1989) and uses an 8-bit oriented design with a permutation lookup table. It is significantly slower than MD4 and MD5 on general-purpose hardware. It also has weaker preimage resistance due to Muller's 2004 attack. MD4 and MD5 were designed as faster, more modern alternatives to MD2 but are themselves now broken.

**Q: Why would MD2 appear in a modern Python codebase?**

Most commonly: (1) copied from extremely old legacy code without review, (2) parsing or re-implementing early-1990s X.509 certificate formats, or (3) academic or protocol testing code. None of these justify using MD2 in production security paths.

**Q: What does RFC 6149 say about MD2?**

RFC 6149 (2011) titled 'MD2 to Historic Status' explicitly states that MD2 'SHOULD NOT be used' and that 'It is not suitable for use in new applications.' This is the IETF's strongest deprecation language short of 'MUST NOT.'

**Q: Why is this rated HIGH while MD5 is MEDIUM?**

MD5 retains limited legitimate non-security uses (fast checksums) and its collision attacks, while practical, require more effort than MD2 attacks. MD2 has known preimage attacks (not just collision attacks), no performance benefits, and zero legitimate modern use cases. The combination of deeper weakness and no redemptive use case justifies HIGH severity.

**Q: How do I run this rule in CI/CD?**

Run `code-pathfinder scan --ruleset python/cryptography/PYTHON-CRYPTO-SEC-014 --path ./src` in your pipeline. Add `--format sarif` to produce SARIF output compatible with GitHub Advanced Security and similar platforms.

**Q: I found MD2 in a third-party library I depend on — what should I do?**

Check whether the library uses MD2 in a security-sensitive path. If it does, file a bug with the library maintainer, pin to a version that does not use MD2 if possible, or replace the dependency. If the library uses MD2 only for parsing legacy formats and your code does not pass untrusted data through that path, document the residual risk.

## References

- [CWE-327: Use of a Broken or Risky Cryptographic Algorithm](https://cwe.mitre.org/data/definitions/327.html)
- [CWE-328: Use of Weak Hash](https://cwe.mitre.org/data/definitions/328.html)
- [RFC 6149: MD2 to Historic Status (IETF, 2011)](https://www.rfc-editor.org/rfc/rfc6149)
- [RFC 1319: The MD2 Message-Digest Algorithm (original spec)](https://www.rfc-editor.org/rfc/rfc1319)
- [Muller 2004: The MD2 Hash Function Is Not One-Way](https://link.springer.com/chapter/10.1007/978-3-540-30539-2_30)
- [NIST SP 800-131A Rev 2: Transitioning the Use of Cryptographic Algorithms](https://csrc.nist.gov/publications/detail/sp/800-131a/rev-2/final)
- [OWASP Cryptographic Failures (A02:2021)](https://owasp.org/Top10/A02_2021-Cryptographic_Failures/)
- [PyCryptodome documentation: Crypto.Hash.MD2](https://pycryptodome.readthedocs.io/en/latest/src/hash/md2.html)

---

Source: https://codepathfinder.dev/registry/python/cryptography/PYTHON-CRYPTO-SEC-014
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
