Complete Data Science Breakdown & Reproducible Methods
This page documents the complete statistical analysis behind our dark money investigation. All calculations, formulas, and raw numbers are presented for peer review and reproducibility.
Source: data/campaign_finance/dark_money_swing_districts_2024.csv
Records: 242 congressional districts
Variables: 16 columns (5 numeric, 11 categorical/text)
Analysis Date: May 23, 2026
Download: Raw CSV
| Column Name | Data Type | Description |
|---|---|---|
| district | object | Congressional district identifier (e.g., "CO-08") |
| state | object | Two-letter state abbreviation |
| district_num | int64 | Numeric district number within state |
| dem_candidate | object | Democratic candidate name (last, first format) |
| rep_candidate | object | Republican candidate name (last, first format) |
| total_spending | float64 | Total dark money spent in district (USD) |
| spending_for | float64 | Money spent supporting any candidate (USD) |
| spending_against | float64 | Money spent attacking any candidate (USD) |
| dem_support | float64 | Money spent supporting Democratic candidate (USD) |
| dem_oppose | float64 | Money spent attacking Democratic candidate (USD) |
| rep_support | float64 | Money spent supporting Republican candidate (USD) |
| rep_oppose | float64 | Money spent attacking Republican candidate (USD) |
| net_dem_advantage | float64 | (dem_support + rep_oppose) - (dem_oppose + rep_support) |
| num_transactions | int64 | Number of separate dark money payments in district |
| top_spenders | object | Top 3 organizations and amounts (pipe-separated) |
| Metric | Support Spending | Opposition Spending | Ratio (Against:For) |
|---|---|---|---|
| Mean | $2,807,894 | $4,850,187 | 1.73:1 |
| Median | $3,154,415 | $5,297,430 | 1.68:1 |
| Total (all districts) | $679,510,000 | $1,173,745,000 | 1.73:1 |
# Python calculation
๐ฌ Analytical Tools & Software
This analysis was conducted using industry-standard statistical software and packages:
- Statistical Computing: Python 3.9+ with SciPy (statistical tests), NumPy (numerical operations), and Pandas (data manipulation)
- Visualization: Custom charting libraries for interactive data visualization
- Tests Applied: Chi-square tests, regression analysis, descriptive statistics, distribution analysis
- Verification: All calculations independently verified using multiple methods
Note on Reproducibility: While we provide complete transparency about our statistical methods, formulas, and data sources, our proprietary analytical pipeline and specific implementation details are not disclosed. This protects our competitive methodology while ensuring full scientific transparency about what tests we ran and how to interpret results.