4. DVC
DVC is a Version Control System for Machine Learning Projects.
Previously we have added a dataset to dvc. Now we make use of the second feature of DVC - Reproducibility.
You can create stages to execute a command and track their dependencies and outputs.
A stage is simply created by this command:
dvc run \
-n NAME_OF_THE_STAGE \
-d DEPENDENCY_1 \
-d DEPENDENCY_2 \
-o OUTPUT_1 \
-o OUTPUT_2 \
COMMAND
DVC will automatically create a dvc.yaml and dvc.lock file,
to track your dependencies and outputs.
A dvc.yaml file could look like this:
stages:
NAME_OF_THE_STAGE:
cmd: COMMAND
deps:
- DEPENDENCY_1
- DEPENDENCY_2
outs:
- OUTPUT_1
- OUTPUT_2
Your Task
Create dvc stages for training and testing.