← RecordsReveal Home Technical Deep Dive · Investigation #001 Read the Article →
Georgia Tech · Regression Analysis · 2026

WHAT'S
REALLY
FLYING
INTO OUR
PLANES?

35 years of FAA wildlife strike data reveals surprising patterns in how animals threaten commercial aviation — and what predicts the damage they cause.

316K Strikes Analyzed
82.6% R² Score
35 Years of Data
81,663
Strikes on Approach
#8
Atlanta — Hartsfield-Jackson
52,122
Unknown Small Birds
0.25
RMSE (Damage Prediction)
4
Risk Clusters Found
01 — Exploratory Analysis

35 YEARS OF
STRIKE DATA

Wildlife strikes have increased dramatically since 1990 as air traffic expanded. The data reveals clear seasonal, geographic, and species-based patterns.

Annual Strike Frequency 1990–2025
Source: FAA Wildlife Strike Database · 316,839 incidents
Top 15 Airports by Strike Frequency
Denver leads with 10,630 incidents · Atlanta #8 at 3,961
Strikes by Phase of Flight
Approach phase dominates — 81,663 incidents · 25.8% of total
Top 15 Species Involved
Most strikes involve unidentified small birds · Mourning dove leads known species
Damage Level Distribution
89.9% of strikes cause no damage · Destroyed aircraft are extremely rare
✈️
Approach is Deadliest
81,663 strikes occur during approach — 25.8% of all incidents. Landing roll (35k) and takeoff run (32k) follow. Aircraft are most vulnerable at low altitude when wildlife is most active.
🐦
The Unknown Bird Problem
Over 52,122 strikes involve unidentified small birds — the single largest species category. Better species identification at airports could dramatically improve wildlife management programs.
🗺️
Atlanta in the Top 10
Hartsfield-Jackson ranks #8 nationally with 3,961 strikes, behind Denver (10,630) and DFW (8,372). Georgia's position on major migratory flyways explains the elevated risk.
02 — Supervised Learning

PREDICTING
DAMAGE SEVERITY

Four regression models trained on 163,832 records to predict ordinal damage level (None → Minor → Possible → Substantial → Destroyed). All models achieved R² > 0.80.

Linear Regression
R² Score
0.8261
RMSE
0.2493
Ridge Regression
R² Score
0.8261
Best Alpha
0.100
Lasso Regression
R² Score
0.8261
Features Eliminated
2 of 10
Random Forest
R² Score
0.8066
RMSE
0.2629
Feature Importance — What Predicts Damage?
INDICATED_DAMAGE dominates all models · Lasso eliminated NUM_ENGS and INCIDENT_MONTH
Lasso Feature Coefficients
Positive = increases damage severity · Negative = reduces damage risk
Model Comparison — R² Scores
All models achieve R² > 0.80 — exceptional predictive power for ordinal damage classification
03 — Unsupervised Learning

4 TYPES OF
WILDLIFE STRIKES

K-Means clustering (K=4) on aircraft characteristics reveals distinct incident profiles. PCA explains 58% of variance in 2 dimensions.

K-Means Cluster Visualization (PCA 2D)
204,790 incidents mapped to 2 principal components · 58% variance explained
0
90,377 incidents · 44.1% of total
General Aviation
Minimal Damage
Smaller aircraft in open areas with grassland or agricultural surroundings. Low damage probability. Common bird species in rural approach paths.
1
82,643 incidents · 40.4% of total
High Traffic
Takeoff & Landing
Larger aircraft at major commercial airports during high-risk phases. Elevated strike frequency due to heavy traffic volumes and bird activity near runways.
2
14,908 incidents · 7.3% of total
Low Altitude
Small Aircraft
Helicopters and small general aviation planes at low altitude. Higher bat and bird encounter rates. Some confirmed damage cases despite smaller aircraft size.
3
16,862 incidents · 8.2% of total
Commercial Jets
Confirmed Damage
Heavy commercial aircraft with confirmed damage (INDICATED_DAMAGE = 1.0). Highest-severity cluster. Large jet engines most vulnerable to bird ingestion events.
Elbow Curve — Optimal K Selection
Inertia drops sharply from K=2 to K=4, then flattens — confirming K=4 as optimal
04 — Methodology

HOW WE
DID THIS

A complete supervised and unsupervised learning pipeline built for Georgia Tech's regression analysis curriculum.

01

Data Acquisition & Cleaning

Downloaded 316,839 FAA wildlife strike records (1990–2025) from Kaggle. Identified columns with <30% missing values as viable features. Applied median imputation for numeric columns and "Unknown" fill for categoricals.

Pandas NumPy FAA Database 204,790 Clean Records
02

Feature Engineering & Encoding

Ordinal encoded target variable DAMAGE_LEVEL as N=0, M=1, M?=2, S=3, D=4. Label encoded 8 categorical features. Selected 10 features based on domain relevance and data coverage threshold.

LabelEncoder OrdinalEncoder 80/20 Train-Test Split 10 Features
03

Supervised Learning — Regression Models

Trained Linear Regression, Ridge (CV α selection), and Lasso (CV α selection) on 163,832 training samples. Evaluated with RMSE and R². Random Forest provided feature importance rankings. All models achieved R² > 0.80.

LinearRegression RidgeCV LassoCV RandomForestRegressor R² = 0.8261
04

Unsupervised Learning — K-Means Clustering

StandardScaler normalization on 5 clustering features. Elbow curve analysis confirmed K=4 as optimal. PCA reduced to 2 dimensions for visualization (58% variance explained). Cluster profiles interpreted with Ollama LLM assistance.

KMeans StandardScaler PCA Elbow Method 4 Clusters Found
05

Tech Stack & Infrastructure

Analysis run on a Linux PC (Intel i5-4590, NVIDIA GTX 1060 6GB) with GPU-accelerated Ollama for AI-assisted interpretation. Python data science stack with Jupyter notebooks. Website built in pure HTML/CSS/JS.

Python 3.12 scikit-learn Plotly Ollama qwen2.5-coder:7b GTX 1060 GPU