4. DVC

DVC is a Version Control System for Machine Learning Projects.

Previously we have added a dataset to dvc. Now we make use of the second feature of DVC - Reproducibility.

You can create stages to execute a command and track their dependencies and outputs.

A stage is simply created by this command:

dvc run \
    -n NAME_OF_THE_STAGE \
    -d DEPENDENCY_1 \
    -d DEPENDENCY_2 \
    -o OUTPUT_1 \
    -o OUTPUT_2 \
    COMMAND

DVC will automatically create a dvc.yaml and dvc.lock file, to track your dependencies and outputs.

A dvc.yaml file could look like this:

stages:
    NAME_OF_THE_STAGE:
    cmd: COMMAND
    deps:
    - DEPENDENCY_1
    - DEPENDENCY_2
    outs:
    - OUTPUT_1
    - OUTPUT_2

Your Task

Create dvc stages for training and testing.