35 years of FAA wildlife strike data reveals surprising patterns in how animals threaten commercial aviation — and what predicts the damage they cause.
Wildlife strikes have increased dramatically since 1990 as air traffic expanded. The data reveals clear seasonal, geographic, and species-based patterns.
Four regression models trained on 163,832 records to predict ordinal damage level (None → Minor → Possible → Substantial → Destroyed). All models achieved R² > 0.80.
K-Means clustering (K=4) on aircraft characteristics reveals distinct incident profiles. PCA explains 58% of variance in 2 dimensions.
A complete supervised and unsupervised learning pipeline built for Georgia Tech's regression analysis curriculum.
Downloaded 316,839 FAA wildlife strike records (1990–2025) from Kaggle. Identified columns with <30% missing values as viable features. Applied median imputation for numeric columns and "Unknown" fill for categoricals.
Ordinal encoded target variable DAMAGE_LEVEL as N=0, M=1, M?=2, S=3, D=4. Label encoded 8 categorical features. Selected 10 features based on domain relevance and data coverage threshold.
Trained Linear Regression, Ridge (CV α selection), and Lasso (CV α selection) on 163,832 training samples. Evaluated with RMSE and R². Random Forest provided feature importance rankings. All models achieved R² > 0.80.
StandardScaler normalization on 5 clustering features. Elbow curve analysis confirmed K=4 as optimal. PCA reduced to 2 dimensions for visualization (58% variance explained). Cluster profiles interpreted with Ollama LLM assistance.
Analysis run on a Linux PC (Intel i5-4590, NVIDIA GTX 1060 6GB) with GPU-accelerated Ollama for AI-assisted interpretation. Python data science stack with Jupyter notebooks. Website built in pure HTML/CSS/JS.