# PYTHON-LANG-SEC-094: csv.writer Audit (Formula Injection Risk)

> **Severity:** LOW | **CWE:** CWE-1236 | **OWASP:** A03:2021

- **Language:** Python
- **Category:** Python Core
- **URL:** https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-094
- **Detection:** `pathfinder scan --ruleset python/PYTHON-LANG-SEC-094 --project .`

## Description

CSV formula injection (also known as CSV injection or formula injection) occurs when
user-supplied data containing spreadsheet formula characters (=, +, -, @, tab, newline)
is exported to CSV format and subsequently opened in a spreadsheet application such as
Microsoft Excel or Google Sheets.

Spreadsheet applications interpret cells starting with =, +, -, or @ as formulas and
execute them. An attacker who can inject data starting with =HYPERLINK("http://evil.com/",
"Click me") or =cmd|'/C calc.exe'!A0 (on Windows with DDE enabled) can cause the
spreadsheet to make outbound network requests, execute system commands, or display
deceptive content when the CSV is opened.

Python's csv.writer() does not protect against formula injection. Use defusedcsv or
sanitize field values by prepending a single quote or tab to fields starting with
formula characters.


## Vulnerable Code

```python
import xml.etree.ElementTree as ET
import xml.dom.minidom
import xml.sax
import xmlrpc.client
import csv

# SEC-094: csv.writer
import io
writer = csv.writer(io.StringIO())
dict_writer = csv.DictWriter(io.StringIO(), fieldnames=["a", "b"])
```

## Secure Code

```python
import csv
import io

# INSECURE: No sanitization of formula-starting characters
# writer.writerow([user_name, user_comment])  # comment could start with =

# SECURE: Sanitize fields that could start with formula characters
FORMULA_PREFIXES = ('=', '+', '-', '@', '\t', '\r')

def sanitize_csv_field(value: str) -> str:
    """Prepend a tab to fields starting with formula characters."""
    if isinstance(value, str) and value.startswith(FORMULA_PREFIXES):
        return f"\t{value}"  # Prepend tab to neutralize formula interpretation
    return value

def export_user_data_csv(users: list) -> str:
    output = io.StringIO()
    writer = csv.writer(output, quoting=csv.QUOTE_ALL)
    writer.writerow(["ID", "Username", "Email", "Comment"])
    for user in users:
        writer.writerow([
            user["id"],
            sanitize_csv_field(user["username"]),
            sanitize_csv_field(user["email"]),
            sanitize_csv_field(user["comment"]),
        ])
    return output.getvalue()

# SECURE: Use defusedcsv library for automatic protection
# pip install defusedcsv
# import defusedcsv
# writer = defusedcsv.writer(output)

```

## Detection Rule (Python SDK)

```python
from rules.python_decorators import python_rule
from codepathfinder import calls, QueryType

class CSVModule(QueryType):
    fqns = ["csv"]


@python_rule(
    id="PYTHON-LANG-SEC-094",
    name="csv.writer Without defusedcsv",
    severity="LOW",
    category="lang",
    cwe="CWE-1236",
    tags="python,csv,csv-injection,defusedcsv,CWE-1236",
    message="csv.writer() detected. Consider defusedcsv to prevent formula injection.",
    owasp="A03:2021",
)
def detect_csv_writer():
    """Detects csv.writer usage."""
    return CSVModule.method("writer", "DictWriter")
```

## How to Fix

- Sanitize all user-controlled CSV field values by prepending a tab character or single quote to values starting with =, +, -, or @.
- Use the defusedcsv library as a drop-in replacement for csv.writer() that automatically handles formula injection protection.
- Apply csv.QUOTE_ALL quoting mode to ensure all fields are quoted, which prevents newline injection in CSV fields.
- Include a Content-Disposition: attachment header and Content-Type: text/csv header when serving CSV files to prevent browsers from rendering them.
- Educate users about the risks of opening CSV files from untrusted sources in spreadsheet applications.

## Security Implications

- **Remote Code Execution via DDE in Excel:** Microsoft Excel supports Dynamic Data Exchange (DDE) which allows cells to execute
system commands. Formula injection payloads like =cmd|'/C calc.exe'!A0 can execute
arbitrary commands on Windows systems with DDE enabled (disabled by default in
recent Excel versions but still a risk in legacy environments).

- **Server-Side Request Forgery via Hyperlinks:** =HYPERLINK() formula injection can embed clickable links in spreadsheet cells that
redirect users to attacker-controlled URLs when clicked, enabling phishing attacks
or tracking of user behavior.

- **Data Exfiltration via Spreadsheet Formulas:** Formulas that reference other cells or perform lookups can exfiltrate data visible
in the spreadsheet by encoding it in URL parameters of outbound hyperlink requests,
potentially exposing sensitive information to attackers.

- **Deceptive Content Injection:** Formula injection can display deceptive values in spreadsheet cells that differ from
the actual CSV data, potentially misleading users about financial figures, status
codes, or other important data when they open the exported file.


## FAQ

**Q: Is CSV injection always a critical security issue?**

CSV injection severity depends on the use case. If the CSV is exported and opened
by internal users in a controlled enterprise environment with modern Excel (DDE
disabled), the risk is low. If the CSV is downloaded by external users or contains
data from untrusted third parties, the risk of phishing via hyperlink injection
and potential code execution in older Excel versions is significant.


**Q: Does quoting fields with csv.QUOTE_ALL prevent formula injection?**

CSV quoting with double-quotes prevents newline injection but does not prevent
formula injection. A quoted cell like "=cmd..." is still interpreted as a formula
by Excel and Google Sheets since they strip quotes before interpreting the cell
content. Sanitization of formula-starting characters is required.


**Q: What is the safest mitigation for CSV formula injection?**

Prepend a tab character (\t) or single quote (') to any field starting with =, +,
-, or @. The tab approach is recommended over single quote because single quotes
are sometimes visible to users in some spreadsheet applications. The defusedcsv
library handles this automatically and is the simplest approach.


**Q: Does this apply to CSV files that are only read by Python code?**

No. Formula injection is only a risk when CSV files are opened by spreadsheet
applications that evaluate formulas. If the CSV is only ever read by Python code
(csv.reader()) or by other programmatic parsers, there is no formula injection risk.
Suppress this finding if the CSV is never exposed to spreadsheet applications.


**Q: Are Excel XLSX exports also vulnerable?**

XLSX exports using libraries like openpyxl or xlsxwriter set cell values directly
and typically do not interpret formula prefixes unless the value is explicitly set
as a formula type. Review the library's documentation for how cell values starting
with = are handled. Explicit formula prevention is still recommended.


**Q: How do I handle CSV export to Google Sheets via API?**

Google Sheets API imports values via Sheets API's valueInputOption parameter.
Using RAW input option (valueInputOption="RAW") imports values as literal text
without formula interpretation. Avoid USER_ENTERED for untrusted data. For CSV
file imports, apply the same sanitization as for Excel exports.


## References

- [CWE-1236: Improper Neutralization of Formula Elements in a CSV File](https://cwe.mitre.org/data/definitions/1236.html)
- [OWASP: Testing for CSV Injection](https://owasp.org/www-community/attacks/CSV_Injection)
- [Python docs: csv module](https://docs.python.org/3/library/csv.html)
- [defusedcsv library](https://pypi.org/project/defusedcsv/)
- [OWASP Top 10 A03:2021 Injection](https://owasp.org/Top10/A03_2021-Injection/)

---

Source: https://codepathfinder.dev/registry/python/lang/PYTHON-LANG-SEC-094
Code Pathfinder — Open source, type-aware SAST with cross-file dataflow analysis
