Building Reproducible ML Processes with an Open Source Stack – Einat Orr, Treeverse
Building Reproducible ML Processes with an Open Source Stack – Einat Orr, Treeverse
Machine learning experiments consist of Data + Code + Environment. While MLFlow Projects are a great way to ensure reproducibility of Data Science code, it cannot ensure the reproducibility of the input data used by that code. In this talk, we’ll go over the trifecta required for truly reproducible experiments: Code (KubeFlow and Git), Data (Minio+lakeFS) and Environment (Infrastructure-as-code). This talk will include a hands-on code demonstration of reproducing an experiment, while ensuring we use the exact same input data, code and processing environment as used by a previous run. We will demonstrate programmatic ways to tie all moving parts together: from creating commits that snapshot the input data, to tagging and traversing the history of both code and data in tandem.
by The Linux Foundation
linux foundation