# PYTHON-CRYPTO-SEC-001: RC4 (ARC4) Cipher Usage via cryptography Library

> **Severity:** HIGH | **CWE:** CWE-327 | **OWASP:** A02:2021

- **Language:** Python
- **Category:** Cryptography
- **URL:** https://codepathfinder.dev/registry/python/cryptography/PYTHON-CRYPTO-SEC-001
- **Detection:** `pathfinder scan --ruleset python/PYTHON-CRYPTO-SEC-001 --project .`

## Description

This rule detects instantiation of the ARC4 (RC4) algorithm via the `cryptography`
library's `cryptography.hazmat.primitives.ciphers.algorithms.ARC4` class. RC4 was
once widely deployed in SSL/TLS, WEP, and WPA, but it has been cryptographically
broken for over a decade.

RC4 produces a biased keystream: the first few bytes are strongly correlated with
the key, and the output distribution is non-uniform across the full byte range. The
BEAST, RC4NOMORE, and related attacks exploit these biases to recover plaintext from
captured ciphertext. RFC 7465 explicitly prohibits RC4 in all versions of TLS, and
NIST SP 800-131A disallows it for federal use.

The rule matches `CryptoCipherAlgorithms.method("ARC4")` — any call to the ARC4
constructor in the hazmat algorithms module. There is no safe configuration for RC4;
the algorithm itself is the problem, regardless of key size.


## Vulnerable Code

```python
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms
from cryptography.hazmat.backends import default_backend

# ARC4/RC4 is a broken stream cipher with known biases
arc4_key = b'\x00' * 16
cipher = Cipher(algorithms.ARC4(arc4_key), mode=None, backend=default_backend())
encryptor = cipher.encryptor()
ciphertext = encryptor.update(b"secret data")
```

## Secure Code

```python
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os

# SECURE: AES-256-GCM provides both confidentiality and integrity
key = AESGCM.generate_key(bit_length=256)
aesgcm = AESGCM(key)
nonce = os.urandom(12)
ct = aesgcm.encrypt(nonce, b"sensitive data", associated_data=None)

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, flows, QueryType
from codepathfinder.presets import PropagationPresets

class CryptoCipherAlgorithms(QueryType):
    fqns = ["cryptography.hazmat.primitives.ciphers.algorithms"]


@python_rule(
    id="PYTHON-CRYPTO-SEC-001",
    name="Insecure ARC4 (RC4) Cipher",
    severity="HIGH",
    category="cryptography",
    cwe="CWE-327",
    tags="python,cryptography,arc4,rc4,weak-cipher,CWE-327",
    message="ARC4/RC4 is a broken stream cipher. Use AES-GCM or ChaCha20Poly1305 instead.",
    owasp="A02:2021",
)
def detect_arc4_cipher():
    """Detects ARC4 cipher usage in cryptography and pycryptodome."""
    return CryptoCipherAlgorithms.method("ARC4")
```

## How to Fix

- Replace ARC4 with AES-256-GCM (AESGCM from cryptography.hazmat.primitives.ciphers.aead) for authenticated encryption
- Use ChaCha20-Poly1305 (ChaCha20Poly1305) as an alternative when AES hardware acceleration is unavailable
- Never reuse a key with any stream cipher -- generate a cryptographically random key per encryption operation
- Migrate any data previously encrypted with RC4 by decrypting and re-encrypting under AES-256-GCM with a new key
- Run this rule in CI to prevent RC4 from being reintroduced during future development or library upgrades

## Security Implications

- **Biased Keystream Enables Plaintext Recovery:** RC4's keystream is statistically biased from the very first byte. Attackers who
observe enough ciphertext encrypted under the same or related keys can
statistically recover the plaintext. The RC4NOMORE attack demonstrated full
HTTP cookie recovery in under 75 hours against a live HTTPS connection using
only network traffic capture.

- **Prohibited in TLS by RFC 7465:** RFC 7465 (2015) mandates that TLS implementations MUST NOT negotiate any
RC4-based cipher suite. Code that uses RC4 in a network context directly
violates this requirement and will fail compliance reviews for any system
handling PCI DSS, HIPAA, or FedRAMP data.

- **No Authentication -- Ciphertext is Malleable:** RC4 provides only stream confidentiality, with no built-in integrity or
authentication. An attacker who can flip bits in the ciphertext will produce
predictable, corresponding bit flips in the plaintext, enabling undetected
message tampering without knowledge of the key.

- **Key Reuse Eliminates All Security:** If the same RC4 key is ever reused across two messages, an attacker who XORs
the two ciphertexts removes the keystream entirely. Both plaintexts can then
be recovered using statistical analysis. WEP was broken at scale this way,
and the same risk applies to any RC4-encrypted data at rest or in transit.


## FAQ

**Q: Why is RC4 considered broken if it was used in WPA and SSL for years?**

RC4 was widely deployed before researchers fully characterized its keystream
biases. The BEAST attack (2011), the RC4NOMORE attack (2015), and two decades
of academic analysis have conclusively shown that RC4's output is non-uniform
enough to recover plaintext from realistic traffic volumes. RFC 7465 formalized
the prohibition in 2015. Legacy deployment does not imply safety.


**Q: Can I add a random IV or use key derivation to make RC4 safe?**

No. The weakness is in the keystream generation algorithm itself, not in how
the key is selected or derived. A random IV prevents identical keystreams across
sessions, but the per-byte biases in the RC4 output remain exploitable. The
correct fix is to replace the algorithm entirely -- use AES-GCM or
ChaCha20-Poly1305.


**Q: How do I migrate data that was already encrypted with RC4?**

Decrypt all RC4-protected data using the existing key, then re-encrypt with
AES-256-GCM using a freshly generated 256-bit key. Securely destroy the old
RC4 key after migration. For data in transit (e.g., archived TLS sessions),
captured ciphertext cannot be retroactively fixed -- disable RC4 cipher suites
at the TLS negotiation layer and rotate session keys going forward.


**Q: Does this rule fire on RC4 mentioned in comments or variable names?**

No. The rule matches the ARC4 constructor call, not string literals or comments.
A variable named arc4_key or a comment explaining RC4's weaknesses will not
trigger a finding. Only actual instantiation of the ARC4 class fires.


**Q: What does failing a compliance audit due to RC4 usage look like?**

PCI DSS QSAs flag RC4 as a violation of Requirement 4.2.1 during on-site
assessments. FedRAMP assessors treat it as a HIGH finding under NIST SP 800-53
SC-13. For TLS, external scanners will assign an F grade if RC4 cipher suites
are negotiable. Running this rule in CI provides auditable evidence that RC4
is not present in the codebase.


**Q: Is there a performance argument for RC4 over modern AEAD ciphers?**

No. ChaCha20-Poly1305 is faster than RC4 on hardware without AES-NI instruction
support, and AES-GCM with AES-NI acceleration is faster still. The cryptography
library provides both via its AEAD primitives. There is no performance
justification for choosing RC4 on any modern platform.


## References

- [CWE-327: Use of a Broken or Risky Cryptographic Algorithm](https://cwe.mitre.org/data/definitions/327.html)
- [RFC 7465: Prohibiting RC4 Cipher Suites](https://www.rfc-editor.org/rfc/rfc7465)
- [RC4NOMORE: Practically breaking WPA-TKIP and TLS using RC4 biases](https://www.rc4nomore.com/)
- [NIST SP 800-131A Rev 2: Transitioning the Use of Cryptographic Algorithms](https://csrc.nist.gov/publications/detail/sp/800-131a/rev-2/final)
- [OWASP Cryptographic Failures](https://owasp.org/Top10/A02_2021-Cryptographic_Failures/)

---

Source: https://codepathfinder.dev/registry/python/cryptography/PYTHON-CRYPTO-SEC-001
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
