EIL
← All projects
Core Algorithms · Active

EIL Training Stack — Metaflow on AWS Batch

Our shared ML platform: Metaflow flows, AWS Batch compute, MLflow tracking, and reproducible PyPI dependency management across local and cloud runs.

Metaflow AWS Batch MLflow Infra

A pragmatic training platform optimized for a small lab. The goals are simple: one command to spin up a tracked, reproducible flow that runs identically on a laptop and on AWS Batch.

Components

  • ml-models — model definitions, training flows, evaluation utilities, and a utils.batch_deps module that keeps local and Batch dependencies in lockstep via Metaflow’s @pypi decorator.
  • eil-infra — CDK-managed AWS infrastructure: ECS-hosted MLflow, SSM tunnels for safe access, Batch compute environments sized for g4dn.xlarge.
  • infra-bootstrap — shared submodule that initializes the Metaflow config and SSM tunnel on first run.

Why we built it ourselves

Off-the-shelf platforms tend to optimize for either local dev or cloud scale, rarely both with the same code path. Our stack is small, opinionated, and lets a four-person lab run robust, reproducible training without an MLOps team.