# PYTHON-LANG-SEC-100: UUID Version 1 Leaks MAC Address

> **Severity:** LOW | **CWE:** CWE-200 | **OWASP:** A05:2021

- **Language:** Python
- **Category:** Python Core
- **URL:** https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-100
- **Detection:** `pathfinder scan --ruleset python/PYTHON-LANG-SEC-100 --project .`

## Description

Python's uuid.uuid1() generates a UUID using the current timestamp combined with the
host machine's MAC address (or a random node if the MAC address cannot be determined).
The MAC address is embedded in bits 48–63 of the UUID and is visible in plain text
when the UUID is converted to its canonical string form.

This leaks hardware identity information that can be used to track the physical machine
generating UUIDs across different sessions or deployments. In cloud environments, the
MAC address may identify the specific VM instance or network interface. In on-premises
deployments, it directly identifies physical hardware. When UUIDs are used as session
tokens, CSRF tokens, or other security-sensitive identifiers, the MAC address embedded
in them also provides attackers with predictability information since the node component
is constant.

The safe alternative is uuid.uuid4(), which is generated entirely from cryptographically
random bytes with no hardware-derived components, providing both better privacy and
better uniqueness properties for security-sensitive use cases.


## Vulnerable Code

```python
import uuid
import os
import re
import logging
import logging.config

# SEC-100: uuid1 (leaks MAC)
uid = uuid.uuid1()
```

## Secure Code

```python
import uuid

# SECURE: Use uuid4() for random UUIDs with no hardware-derived components
def generate_session_id() -> str:
    return str(uuid.uuid4())

def generate_api_key() -> str:
    return uuid.uuid4().hex  # 32-character hex string, no MAC address

def generate_request_id() -> str:
    # uuid4 is cryptographically random and contains no identifying information
    return str(uuid.uuid4())

# SECURE: If you need time-ordered UUIDs, use uuid7 from a third-party library
# pip install uuid6
# from uuid6 import uuid7
# def generate_ordered_id():
#     return str(uuid7())  # time-ordered, no MAC address

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, QueryType

class UUIDModule(QueryType):
    fqns = ["uuid"]


@python_rule(
    id="PYTHON-LANG-SEC-100",
    name="Insecure UUID Version (uuid1)",
    severity="LOW",
    category="lang",
    cwe="CWE-200",
    tags="python,uuid,mac-address,insufficiently-random,CWE-200",
    message="uuid.uuid1() leaks the host MAC address and uses predictable timestamps. Use uuid.uuid4() for random UUIDs.",
    owasp="A05:2021",
)
def detect_uuid1():
    """Detects uuid.uuid1() which leaks MAC address."""
    return UUIDModule.method("uuid1")
```

## How to Fix

- Replace uuid.uuid1() with uuid.uuid4() for all security-sensitive identifiers including session tokens, API keys, CSRF tokens, and password reset links.
- Audit all UUID generation in your codebase and replace uuid1 calls where the MAC address disclosure is unacceptable.
- If time-ordering is required, consider UUID v7 (available in third-party libraries) which uses random bytes for the node component instead of the MAC address.
- Review API responses, logs, and database schemas for existing uuid1-generated values that may have already leaked MAC addresses.
- Document any intentional use of uuid1 where MAC address embedding is acceptable, and add a comment explaining the justification.

## Security Implications

- **Hardware Identity Disclosure:** The MAC address embedded in uuid1() UUIDs identifies the network interface that generated
them. If UUIDs are exposed in API responses, logs, URLs, or database exports, an attacker
can extract the MAC address and use it to track the specific machine or container across
deployments, identify cloud VM instance types, or correlate activity across sessions.

- **Predictability in Security Tokens:** Because the node component (MAC address) of uuid1() is constant for a given host and
the timestamp component is sequential, uuid1() output is partially predictable. If
uuid1() is used to generate session tokens, password reset tokens, or API keys, an
attacker who observes one token can narrow the search space for other tokens generated
around the same time.

- **Privacy Regulation Violations:** GDPR and other privacy regulations treat hardware identifiers including MAC addresses as
personal data when they can be linked to an individual. Embedding MAC addresses in
user-visible UUIDs and logging or transmitting them may constitute a privacy violation
requiring data protection impact assessment and user notification obligations.

- **Container and Infrastructure Fingerprinting:** In containerized environments, the MAC address in uuid1() can help attackers distinguish
between different nodes in a cluster, track which node processed a given request, or
identify infrastructure layout when combined with other signals. This information assists
in lateral movement planning and targeted attacks against specific nodes.


## FAQ

**Q: Is uuid.uuid1() always a security problem?**

uuid.uuid1() is a low-severity concern rather than an immediate critical vulnerability.
The main risks are information disclosure (MAC address leakage) and partial predictability.
If UUIDs are used for internal database primary keys that are never exposed externally,
the risk is lower. The concern escalates significantly when uuid1() is used for security
tokens such as session IDs, API keys, or password reset links.


**Q: What information is actually revealed by uuid1()?**

The last 12 hexadecimal characters (6 bytes) of a uuid1() string represent the node,
which is typically the host's MAC address. For example in
"550e8400-e29b-41d4-a716-446655440000", the node is "446655440000". This can be
looked up in IEEE OUI databases to identify the hardware vendor and, for cloud
instances, often the specific instance type or provider.


**Q: Does Python always use the real MAC address?**

Python first attempts to find a real MAC address using platform-specific methods.
If it cannot determine the MAC address, it falls back to a randomly generated 48-bit
node with the multicast bit set. In many cloud and container environments, Python
successfully finds the virtual network interface MAC address, so the leakage risk
is real in those environments.


**Q: Can I use uuid.uuid1() with a custom node to avoid MAC leakage?**

Yes, you can pass a random integer as the node parameter: uuid.uuid1(node=random.getrandbits(48)).
However, this still leaves the temporal predictability concern since the timestamp
component is sequential. For security-sensitive use cases, uuid.uuid4() is simpler
and avoids both concerns entirely.


**Q: What about UUID v5 or v3?**

UUID v3 and v5 are name-based and deterministic — they generate the same UUID for
the same input. They do not embed a MAC address, but they are not suitable as random
security tokens. Use them only when you need a stable, reproducible identifier for
a known namespace and name. For unpredictable security tokens, always use uuid.uuid4().


## References

- [Python docs: uuid module](https://docs.python.org/3/library/uuid.html)
- [RFC 4122: UUID Format Specification](https://datatracker.ietf.org/doc/html/rfc4122)
- [CWE-200: Exposure of Sensitive Information](https://cwe.mitre.org/data/definitions/200.html)
- [UUID v7 Draft Specification](https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html)
- [OWASP Information Exposure](https://owasp.org/www-community/vulnerabilities/Information_exposure_through_query_strings_in_url)

---

Source: https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-100
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
