PROFESSIONAL SUMMARY

Data Scientist with an MS in Data Science who builds things that actually work. I enjoy building and training models on my own multi-GPU homelab, deploying new techniques, and discovering unique ways to solve problems. Whether it's analysis, research, or projections, I dig into how something ticks, figure out why, and extract information that drives better outcomes. My multi-node homelab stack lets me run production applications, train models, and run MLflow, all on my own hardware. 20+ projects on GitHub covering deep learning, medical imaging, NLP, computer vision, and analytics.

FEATURED DATA SCIENCE PROJECTS
NFL Big Data Bowl 2026 - Kaggle Bronze MedalKaggle Bronze Medal 2026
Deep Learning Player Trajectory Prediction | Top 8% of 1,134 teams
github.com/XxRemsteelexX/NFL-Big-Data-Bowl-2026-
  • Bronze Medal in prestigious Kaggle competition predicting NFL player trajectories from tracking data
  • Conducted systematic exploration of 847+ experiments across 15+ neural network architectures
  • Best single model: 6-Layer Spatial-Temporal Transformer achieving 0.547 Public LB score
  • Best ensemble: 3-model blend (ST Transformer + CNN + GRU) achieving 0.540 Public LB with architecture diversity
  • Engineered 167 features including kinematics, ball-relative positions, temporal patterns, and geometric features with Voronoi tessellation
  • Implemented novel geometric attention with spatial distance modulation and frozen encoder fine-tuning
  • Utilized multi-GPU training, mixed precision (FP16), and test-time augmentation for +0.005-0.010 improvement
RSNA Intracranial Aneurysm Detection - Kaggle Competition
3D Deep Learning Medical Imaging | 105 Models Trained
github.com/XxRemsteelexX/RSNA-Intracranial-Aneurysm-Detection-Kaggle
  • Trained 105 deep learning models (21 architectures × 5 folds) for CT angiography aneurysm detection
  • Tested 51 ensemble configurations; best ensemble META_E_top3_weighted achieved AUC 0.8624
  • Key discovery: Smaller models (SE-ResNet18) statistically outperform larger models on limited medical data (r=-0.42, p<0.01)
  • Built complete pipeline: DICOM→NIfTI→ROI extraction→Training→Ensemble across 4 GPUs simultaneously
  • Multi-label classification across 14 classes with severe class imbalance handling (1.2% to 42.8%)
Apollo Healthcare Connect
Production Multi-modal AI Healthcare Triage System | MS Capstone
apollohealthcareconnect.com
  • Built and deployed live production healthcare AI triage system with sub-second response times
  • Achieved 93.8% combined multi-modal accuracy and 98.0% burn classification accuracy
  • Implemented 5-model ensemble architecture combining DistilBERT (NLP) and CNNs (Computer Vision)
  • Successfully handled extreme class imbalance (29.7:1 ratio) with advanced loss functions
  • Built production pipeline with Flask API, AWS S3 integration, and comprehensive safety protocols
Missing Persons Outlier Detection
Geospatial Crime Pattern Analysis | 41,200 NamUs Cases
github.com/XxRemsteelexX/missing-persons-outlier-detection
  • 7 statistical methods + 3 ML models applied to 41,200 cases across 101 years (9,204 county-decade combinations)
  • Kenedy County, TX: 46.86σ composite z-score persists after Bayes shrinkage, OLS, Random Forest, and FDR correction
  • I-35 corridor: 170% increase in missing persons (193 to 521 cases), structural break at 2020 (p < 0.001)
  • Spatial autocorrelation confirmed (Moran's I = 0.22, z = 26.03) with LISA hotspot clustering along TX border
  • Validated against known serial killers (Ridgway: 4.38σ, Gacy: 1.34σ)
  • Live 7-page Streamlit dashboard with geospatial visualization and ARIMA 5-year forecasting
Opportunity Intelligence Assistant
Agentic AI Market Analysis | Senior Living Opportunity Evaluation
github.com/XxRemsteelexX/opportunity-intelligence
  • Agentic 3-call LLM pipeline that plans, executes, and synthesizes 22 statistical methods across Census and CMS data
  • First pass planned 9 analyses, follow-up requested 5 more for 14 total; full pipeline cost about 6 cents in LLM tokens
  • Market scored 48.3/100 overall (demand pressure 53.6, competitive position 40.3) across 10 nursing facilities with 896 beds
  • Chi-square confirmed significant link between ownership type and quality rating (p = 0.0067)
  • 622-word executive briefing with source citations on every claim, traceable to Census Bureau and CMS Care Compare
NFL Rookie Wide Receiver Performance Prediction
Advanced ML Analysis with Feature Optimization | BS Capstone
github.com/XxRemsteelexX/NFL_Rookie_WR_1K_Analysis
  • Developed predictive model achieving 90.9% ROC AUC on future data validation for 1000+ yard seasons
  • Reduced overfitting gap from 18.5% to 0.4% (97.8% reduction) through feature optimization (46→20 features)
  • Implemented temporal validation strategy ensuring model generalization to future NFL seasons
  • Created production-ready ensemble model for NFL draft analysis with comprehensive data pipeline (2006-2024)
OceanEterna RAG Engine (In Progress)
High-Performance Local Retrieval-Augmented Generation System | C++17 | CPU-Only
  • Originally prototyped in Python, rewritten in C++17 for performance — evolved through 4 major versions with systematic optimization
  • Indexes 2.45 billion tokens across 5M+ chunks with 0-42ms search (avg 12ms, down from 500ms in v1) — CPU-only, minimal RAM
  • Dual LZ4/Zstd compression, 15 REST API endpoints, 47 tests at near 100% accuracy; conversations and files continuously indexed
  • Building LLM chat interface and MCP tool for terminal-based AI workflows and project knowledge management
OE-OS (In Progress)
Distributed AI Orchestration Platform | Python / FastAPI
  • Three-tier LLM routing (local Ollama to cheap API to premium models) reducing costs by routing ~80% of requests to free local models
  • Triple-layer RAG memory: BM25 over 5M+ chunks, ChromaDB semantic search, Redis session cache with graceful degradation
  • 18 MCP-compatible tools and multi-agent sandbox where 4 LLM personas deliberate at zero API cost
  • 4,200+ lines of async Python on FastAPI for a private multi-node GPU cluster
AI Homelab & Active Memory Network
Multi-Tier AI Infrastructure | 10Gb Network + RAG Pipeline
glenndalbey.com/infrastructure
  • Designed and operate multi-tier AI homelab: dual RTX 5090 training node + RTX 3090 Ti/3090 secondary node
  • Built 256GB unified memory LLM inference cluster (2× Ryzen AI Max+ 395) running Kimi K2, Qwen 3, GLM 4.6
  • Implemented automated active-memory pipeline with n8n orchestration, RAG storage, and hot/warm/cold tiering
  • Deployed Proxmox VE backbone with pfSense firewall, VLAN segmentation, and 10Gb networking (100TB+ storage)
PROFESSIONAL EXPERIENCE
Freelance Data Science Consultant
Thompson Parking & Mobility Consultants
Current
  • Provide data science and analytics consulting services for business intelligence initiatives
  • Develop AI-powered Excel analytics platform enabling natural language data queries
  • Design custom analytical solutions and machine learning models for client-specific challenges
  • Support data-driven decision making through advanced analytics and predictive modeling
Continuous Improvement Leader & Material Flow Specialist
John Deere, Waterloo Works & Ankeny Works
2005-2020, 2021-Present
  • CI Department Representative leading process improvement and operational efficiency initiatives
  • Developed comprehensive training curriculum for warehouse personnel, improving onboarding efficiency
  • Designed and implemented the Zones Project, modernizing material flow training systems
  • Led departmental CI mapping initiatives to improve operational efficiency and reduce cycle times
  • Optimized material replenishment processes using bin methodology, reducing operational inefficiencies
  • Managed supply chain logistics and SAP-integrated inventory management
  • Supported engineering teams in workflow re-splits and cycle time analysis for production optimization