Who We Are
About RecordsReveal

We believe the most important stories are hiding in plain sight — buried in government databases that the public paid for but almost nobody reads. Our job is to find them.

RecordsReveal is an independent investigative data journalism publication. We download public government datasets — from the FAA, FBI, USDA, NHTSA, CMS, and other agencies — run them through machine learning models, and publish what we find in plain English.

No jargon. No agenda. No financial stake in any outcome. Just what the records actually reveal.

Our Methodology

Every investigation follows the same rigorous process. We believe transparency about how we work is as important as the findings themselves.

01
Data Sourcing
We only use publicly available government databases — FAA, FBI UCR, USDA FoodData Central, NHTSA FARS, CMS Hospital Compare. All sources are linked in every investigation.
02
Cleaning & Preparation
Raw government data is often messy. We document every cleaning decision — missing value handling, outlier treatment, encoding choices — so our work is fully reproducible.
03
Machine Learning Analysis
We use regression modeling (Linear, Ridge, Lasso, Random Forest) for supervised learning and K-Means clustering with PCA for unsupervised pattern discovery. Python and scikit-learn throughout.
04
Plain English Translation
Statistical findings mean nothing if nobody can understand them. We write for a general audience — every investigation is reviewed to ensure it communicates clearly without sacrificing accuracy.

What We Are Not

We are not advocates for any political position. We are not funded by any organization with a stake in our findings. We are not trying to prove a predetermined conclusion. When the data surprises us — and it often does — we report the surprise.

We are also not infallible. Corrections policy: if we make an error, we correct it prominently and transparently. If you spot a mistake in our methodology or findings, please contact us.

Our Tech Stack

All analysis is conducted using Python 3.12, pandas, NumPy, scikit-learn, Plotly, and Matplotlib. Machine learning models are run on a local GPU-accelerated workstation with an NVIDIA GTX 1060. Local AI assistance via Ollama with qwen2.5-coder:7b and llama3.2 models for code generation and pattern interpretation.

All visualizations are built with Plotly and rendered as interactive charts embedded directly in our investigations. The website is pure HTML, CSS, and JavaScript — no tracking beyond Google AdSense and no third-party data sharing.

Georgia Tech Connection

RecordsReveal grew out of a regression analysis class project at Georgia Tech. The first investigation — the FAA Wildlife Strike Database analysis — was originally conducted as a graduate-level data science exercise. The quality of findings and the public interest they generated led to the creation of this publication.

Get in Touch

Have a dataset you think we should investigate? Found an error in our work? Want to advertise?

tips@recordsreveal.com