Sports Data Engineer

Ivan
Gruber

I build production-grade data pipelines for professional sports — from real-time MLB pitch tracking at Sportradar to cloud analytics on GCP & AWS. 5+ years turning raw game data into decisions.

5+
Years Experience
99.7%
Pipeline Uptime
45%
Faster Insights
20+
Prod Integrations
IFG
Ivan F. Gruber
Available
01

About

I'm a Data Engineer based in Corpus Christi, TX focused on sports data infrastructure and cloud analytics. My career bridges raw event data and production-ready insights — whether that's pitch-by-pitch MLB tracking at Sportradar or GCP analytical models cutting client time-to-insight by 45%.

At Synergy Sports (Sportradar) I captured and structured live MLB game data feeding broadcast networks and franchise analytics platforms. At Vikua I led technical delivery across 6 client environments, engineering pipelines with 99.7% uptime and building SQL models that cut cloud compute costs by 18%.

Currently completing the MIT MicroMasters in Statistics and Data Science and expanding into statistical modeling. Fluent in English and Spanish, with working knowledge of Italian. Open to relocation and fully remote roles.

Cloud Platforms
GCP BigQuery Dataflow Cloud Storage AWS S3 Azure SQL Cloud Composer
Languages & Frameworks
Python SQL (Advanced) Ruby Pandas DuckDB pybaseball
Orchestration & DevOps
Apache Airflow Terraform GitHub Actions Tray.io Zapier CI/CD
BI & Visualization
Power BI Streamlit Plotly Zoho Analytics
Sports Data
Sportradar Platform Statcast Live Game Feeds FanGraphs API Baseball Savant
02

Experience

Feb 2025 – Jan 2026
Vikua

Lead Technical Account Manager, Analytics & Cloud

  • Designed and deployed GCP analytical models across 6 client environments, cutting time-to-insight by 45% and driving KPI adoption to 80% within 60 days.
  • Engineered ERP and operational centralization pipelines on Cloud Composer/Airflow, improving data freshness from daily to hourly with 99.7% uptime.
  • Built SQL models in BigQuery achieving +40% query performance and 18% reduction in cloud compute cost.
  • Managed 6 concurrent delivery accounts: 30% shorter design-to-go-live cycle, 25% fewer support tickets QoQ.
Jul 2023 – Present
Spiro.ai

Data Integration Engineer

  • Architected and deployed 20 production-grade integrations in a 3-month window across ERP, CRM, and marketing platforms using Tray.io, Zapier, and Ruby pipelines.
  • Led full migration from Tray.io to a centralized code-based framework, improving observability, maintainability, and scalability across all environments.
  • Reduced average ticket resolution time from 3 hours to under 1 hour through standardized triage and escalation workflows.
Jan 2023 – Feb 2025
Sportradar

Baseball Data Engineer, Level 2 · Synergy Sports

  • Captured and structured real-time MLB game data (pitch-by-pitch, batter movements, batted ball classifications, play outcomes) for live stats and broadcast reporting.
  • Delivered structured datasets consumed by downstream performance analytics pipelines and broadcast networks.
  • Performed Level 2 QA reconciling platform logs against official play-by-play records under real-time delivery pressure.
Oct 2020 – Mar 2023
Elite Cargo

Business Intelligence Analyst / Logistics BI Coordinator

  • Designed executive Power BI dashboards tracking OTIF, lead time, route compliance, and load efficiency for C-suite stakeholders.
  • Integrated GPS telemetry, route logs, load orders, and customs data into unified analytics-ready models.
  • Promoted to BI Coordinator within 18 months; led a team of 2 analysts.
03

Projects

🏥

Sports Injury Risk Intelligence

Production ML pipeline for injury risk in professional football using 88 real Real Madrid injury events (2021-2025). Full Medallion Architecture with 13 point-in-time correct features, leakage guards in code and tests, 5-job CI/CD pipeline, 80% coverage gate, and ML outputs with mandatory confidence intervals and OOD flags.

Python Scikit-learn Great Expectations GitHub Actions Delta Lake Pytest
☁️

GCP Data Architecture with PII Anonymization

Automated PII-safe pipeline on GCP integrating banking, POS, and university sources into a Master User Model via Bronze-Silver-Gold layering in BigQuery. SHA-256 hashing and boolean masking for PII compliance, orchestrated with Cloud Composer/Airflow. Full Terraform IaC.

GCP BigQuery Cloud Composer Airflow Terraform SHA-256
04

Education

Completed
MIT via edX

Data Analysis: Statistical Modeling & Computation in Applications

6.419x · Issued Jan 7, 2026
Verify Certificate →
In Progress
MIT via edX

Fundamentals of Statistics

18.6501x · Access expires May 2026
In Progress
MIT via edX

Probability: The Science of Uncertainty and Data

6.431x · Access expires May 2026
Completed
Lourtec · 2021

Azure Database Administration (DP-300) + SQL Server MOC

20764C · 20761C
Completed
CITI Program · Dec 2025

Biomedical Research Certification

CITI Program
Completed
Universidad Católica Andrés Bello · 2016-2021

Bachelor of Business Administration

Caracas, Venezuela

Let's build something

Get in
Touch

ivanfgruber@gmail.com LinkedIn GitHub +1 (361) 253-1361