🌜
🌞
@dvcorg/cml

@dvcorg/cml

v0.17.2

<p align="center"> <img src="https://static.iterative.ai/img/cml/title_strip_trim.png" width=400> </p>

npm install @dvcorg/cml

README

GHA npm

What is CML? Continuous Machine Learning (CML) is an open-source CLI tool for implementing continuous integration & delivery (CI/CD) with a focus on MLOps. Use it to automate development workflows — including machine provisioning, model training and evaluation, comparing ML experiments across project history, and monitoring changing datasets.

CML can help train and evaluate models — and then generate a visual report with results and metrics — automatically on every pull request.

An example report for a neural style transfer model.

CML principles:

  • GitFlow for data science. Use GitLab or GitHub to manage ML experiments, track who trained ML models or modified data and when. Codify data and models with DVC instead of pushing to a Git repo.
  • Auto reports for ML experiments. Auto-generate reports with metrics and plots in each Git pull request. Rigorous engineering practices help your team make informed, data-driven decisions.
  • No additional services. Build your own ML platform using GitLab, Bitbucket, or GitHub. Optionally, use cloud storage as well as either self-hosted or cloud runners (such as AWS EC2 or Azure). No databases, services or complex setup needed.

:question: Need help? Just want to chat about continuous integration for ML? Visit our Discord channel!

:play_or_pause_button: Check out our YouTube video series for hands-on MLOps tutorials using CML!

Table of Contents

  1. Setup (GitLab, GitHub, Bitbucket)
  2. Usage
  3. Getting started (tutorial)
  4. Using CML with DVC
  5. Advanced Setup (Self-hosted, local package)
  6. Example projects

Setup

You'll need a GitLab, GitHub, or Bitbucket account to begin. Users may wish to familiarize themselves with Github Actions or GitLab CI/CD. Here, will discuss the GitHub use case.

GitLab

Please see our docs on CML with GitLab CI/CD and in particular the personal access token requirement.

Bitbucket

Please see our docs on CML with Bitbucket Cloud.

GitHub

The key file in any CML project is .github/workflows/cml.yaml:

name: your-workflow-name
on: [push]
jobs:
  run:
    runs-on: ubuntu-latest
    # optionally use a convenient Ubuntu LTS + DVC + CML image
    # container: docker://ghcr.io/iterative/cml:0-dvc2-base1
    steps:
      - uses: actions/[email protected]
      # may need to setup NodeJS & Python3 on e.g. self-hosted
      # - uses: actions/[email protected]
      #   with:
      #     node-version: '16'
      # - uses: actions/[email protected]
      #   with:
      #     python-version: '3.x'
      - uses: iterative/[email protected]
      - name: Train model
        run: |
          # Your ML workflow goes here
          pip install -r requirements.txt
          python train.py
      - name: Write CML report
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          # Post reports as comments in GitHub PRs
          cat results.txt >> report.md
          cml send-comment report.md

Usage

We helpfully provide CML and other useful libraries pre-installed on our custom Docker images. In the above example, uncommenting the field container: docker://ghcr.io/iterative/cml:0-dvc2-base1) will make the runner pull the CML Docker image. The image already has NodeJS, Python 3, DVC and CML set up on an Ubuntu LTS base for convenience.

CML Functions

CML provides a number of functions to help package the outputs of ML workflows (including numeric data and visualizations about model performance) into a CML report.

Below is a table of CML functions for writing markdown reports and delivering those reports to your CI system.

Function Description Example Inputs
cml runner Launch a runner locally or hosted by a cloud provider See Arguments
cml publish Publicly host an image for displaying in a CML report <path to image> --title <image title> --md
cml send-comment Return CML report as a comment in your GitLab/GitHub workflow <path to report> --head-sha <sha>
cml send-github-check Return CML report as a check in GitHub <path to report> --head-sha <sha>
cml pr Commit the given files to a new branch and create a pull request <path>...
cml tensorboard-dev Return a link to a Tensorboard.dev page --logdir <path to logs> --title <experiment title> --md

CML Reports

The cml send-comment command can be used to post reports. CML reports are written in markdown (GitHub, GitLab, or Bitbucket flavors). That means they can contain images, tables, formatted text, HTML blocks, code snippets and more — really, what you put in a CML report is up to you. Some examples:

:spiral_notepad: Text Write to your report using whatever method you prefer. For example, copy the contents of a text file containing the results of ML model training:

cat results.txt >> report.md

:framed_picture: Images Display images using the markdown or HTML. Note that if an image is an output of your ML workflow (i.e., it is produced by your workflow), you will need to use the cml publish function to include it a CML report. For example, if graph.png is output by python train.py, run:

cml publish graph.png --md >> report.md

Getting Started

  1. Fork our example project repository.

:warning: Note that if you are using GitLab, you will need to create a Personal Access Token for this example to work.

getting started

:warning: The following steps can all be done in the GitHub browser interface. However, to follow along with the commands, we recommend cloning your fork to your local workstation:

git clone https://github.com/<your-username>/example_cml
  1. To create a CML workflow, copy the following into a new file, .github/workflows/cml.yaml:
name: model-training
on: [push]
jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/[email protected]
      - uses: actions/[email protected]
      - uses: iterative/[email protected]
      - name: Train model
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          pip install -r requirements.txt
          python train.py

          cat metrics.txt >> report.md
          cml publish plot.png --md >> report.md
          cml send-comment report.md
  1. In your text editor of choice, edit line 16 of train.py to depth = 5.

  2. Commit and push the changes:

git checkout -b experiment
git add . && git commit -m "modify forest depth"
git push origin experiment
  1. In GitHub, open up a pull request to compare the experiment branch to master.

getting started

Shortly, you should see a comment from github-actions appear in the pull request with your CML report. This is a result of the cml send-comment function in your workflow.

getting started

This is the outline of the CML workflow:

  • you push changes to your GitHub repository,
  • the workflow in your .github/workflows/cml.yaml file gets run, and
  • a report is generated and posted to GitHub.

CML functions let you display relevant results from the workflow — such as model performance metrics and visualizations — in GitHub checks and comments. What kind of workflow you want to run, and want to put in your CML report, is up to you.

Using CML with DVC

In many ML projects, data isn't stored in a Git repository, but needs to be downloaded from external sources. DVC is a common way to bring data to your CML runner. DVC also lets you visualize how metrics differ between commits to make reports like this:

using cml with dvc

The .github/workflows/cml.yaml file used to create this report is:

name: model-training
on: [push]
jobs:
  run:
    runs-on: ubuntu-latest
    container: docker://ghcr.io/iterative/cml:0-dvc2-base1
    steps:
      - uses: actions/[email protected]
      - name: Train model
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          # Install requirements
          pip install -r requirements.txt

          # Pull data & run-cache from S3 and reproduce pipeline
          dvc pull data --run-cache
          dvc repro

          # Report metrics
          echo "## Metrics" >> report.md
          git fetch --prune
          dvc metrics diff master --show-md >> report.md

          # Publish confusion matrix diff
          echo "## Plots" >> report.md
          echo "### Class confusions" >> report.md
          dvc plots diff --target classes.csv --template confusion -x actual -y predicted --show-vega master > vega.json
          vl2png vega.json -s 1.5 > plot.png
          cml publish --md plot.png >> report.md

          # Publish regularization function diff
          echo "### Effects of regularization" >> report.md
          dvc plots diff --target estimators.csv -x Regularization --show-vega master > vega.json
          vl2png vega.json -s 1.5 > plot.png
          cml publish --md plot.png >> report.md

          cml send-comment report.md

:warning: If you're using DVC with cloud storage, take note of environment variables for your storage format.

Configuring Cloud Storage Providers

There are many supported could storage providers. Here are a few examples for some of the most frequently used providers:

S3 and S3-compatible storage (Minio, DigitalOcean Spaces, IBM Cloud Object Storage...)
# Github
env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  AWS_SESSION_TOKEN: ${{ secrets.AWS_SESSION_TOKEN }}

:point_right: AWS_SESSION_TOKEN is optional.

:point_right: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY can also be used by cml runner to launch EC2 instances. See [Environment Variables].

Azure
env:
  AZURE_STORAGE_CONNECTION_STRING:
    ${{ secrets.AZURE_STORAGE_CONNECTION_STRING }}
  AZURE_STORAGE_CONTAINER_NAME: ${{ secrets.AZURE_STORAGE_CONTAINER_NAME }}
Aliyun
env:
  OSS_BUCKET: ${{ secrets.OSS_BUCKET }}
  OSS_ACCESS_KEY_ID: ${{ secrets.OSS_ACCESS_KEY_ID }}
  OSS_ACCESS_KEY_SECRET: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
  OSS_ENDPOINT: ${{ secrets.OSS_ENDPOINT }}
Google Storage

:warning: Normally, GOOGLE_APPLICATION_CREDENTIALS is the path of the json file containing the credentials. However in the action this secret variable is the contents of the file. Copy the json contents and add it as a secret.

env:
  GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
Google Drive

:warning: After configuring your Google Drive credentials you will find a json file at your_project_path/.dvc/tmp/gdrive-user-credentials.json. Copy its contents and add it as a secret variable.

env:
  GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}

Advanced Setup

Self-hosted (On-premise or Cloud) Runners

GitHub Actions are run on GitHub-hosted runners by default. However, there are many great reasons to use your own runners: to take advantage of GPUs, orchestrate your team's shared computing resources, or train in the cloud.

:point_up: Tip! Check out the official GitHub documentation to get started setting up your own self-hosted runner.

Allocating Cloud Compute Resources with CML

When a workflow requires computational resources (such as GPUs), CML can automatically allocate cloud instances using cml runner. You can spin up instances on AWS, Azure, GCP, or Kubernetes.

For example, the following workflow deploys a g4dn.xlarge instance on AWS EC2 and trains a model on the instance. After the job runs, the instance automatically shuts down.

You might notice that this workflow is quite similar to the basic use case above. The only addition is cml runner and a few environment variables for passing your cloud service credentials to the workflow.

Note that cml runner will also automatically restart your jobs (whether from a GitHub Actions 35-day workflow timeout or a AWS EC2 spot instance interruption).

name: Train-in-the-cloud
on: [push]
jobs:
  deploy-runner:
    runs-on: ubuntu-latest
    steps:
      - uses: iterative/[email protected]
      - uses: actions/[email protected]
      - name: Deploy runner on EC2
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          cml runner \
            --cloud=aws \
            --cloud-region=us-west \
            --cloud-type=g4dn.xlarge \
            --labels=cml-gpu
  train-model:
    needs: deploy-runner
    runs-on: [self-hosted, cml-gpu]
    timeout-minutes: 50400 # 35 days
    container:
      image: docker://iterativeai/cml:0-dvc2-base1-gpu
      options: --gpus all
    steps:
      - uses: actions/[email protected]
      - name: Train model
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
        run: |
          pip install -r requirements.txt
          python train.py

          cat metrics.txt > report.md
          cml send-comment report.md

In the workflow above, the deploy-runner step launches an EC2 g4dn.xlarge instance in the us-west region. The model-training step then runs on the newly-launched instance. See [Environment Variables] below for details on the secrets required.

:tada: Note that jobs can use any Docker container! To use functions such as cml send-comment from a job, the only requirement is to have CML installed.

Docker Images

The CML Docker image (docker://iterativeai/cml) comes loaded with Python, CUDA, git, node and other essentials for full-stack data science. Different versions of these essentials are available from different iterativeai/cml image tags. The tag convention is {CML_VER}-dvc{DVC_VER}-base{BASE_VER}{-gpu}:

{BASE_VER} Software included (-gpu)
0 Ubuntu 18.04, Python 2.7 (CUDA 10.1, CuDNN 7)
1 Ubuntu 20.04, Python 3.8 (CUDA 11.0.3, CuDNN 8)

For example, docker://iterativeai/cml:0-dvc2-base1-gpu, or docker://ghcr.io/iterative/cml:0-dvc2-base1.

Arguments

The cml runner function accepts the following arguments:

--help                      Show help                                [boolean]
--version                   Show version number                      [boolean]
--log                       Maximum log level
                 [choices: "error", "warn", "info", "debug"] [default: "info"]
--labels                    One or more user-defined labels for this runner
                            (delimited with commas)           [default: "cml"]
--idle-timeout              Seconds to wait for jobs before shutting down. Set
                            to -1 to disable timeout            [default: 300]
--name                      Name displayed in the repository once registered
                            cml-{ID}
--no-retry                  Do not restart workflow terminated due to instance
                            disposal or GitHub Actions timeout
                                                    [boolean] [default: false]
--single                    Exit after running a single job
                                                    [boolean] [default: false]
--reuse                     Don't launch a new runner if an existing one has
                            the same name or overlapping labels
                                                    [boolean] [default: false]
--driver                    Platform where the repository is hosted. If not
                            specified, it will be inferred from the
                            environment          [choices: "github", "gitlab"]
--repo                      Repository to be used for registering the runner.
                            If not specified, it will be inferred from the
                            environment
--token                     Personal access token to register a self-hosted
                            runner on the repository. If not specified, it
                            will be inferred from the environment
                                                            [default: "infer"]
--cloud                     Cloud to deploy the runner
                                [choices: "aws", "azure", "gcp", "kubernetes"]
--cloud-region              Region where the instance is deployed. Choices:
                            [us-east, us-west, eu-west, eu-north]. Also
                            accepts native cloud regions  [default: "us-west"]
--cloud-type                Instance type. Choices: [m, l, xl]. Also supports
                            native types like i.e. t2.micro
--cloud-gpu                 GPU type.
                                    [choices: "nogpu", "k80", "v100", "tesla"]
--cloud-hdd-size            HDD size in GB
--cloud-ssh-private         Custom private RSA SSH key. If not provided an
                            automatically generated throwaway key will be used
                                                                 [default: ""]
--cloud-spot                Request a spot instance                  [boolean]
--cloud-spot-price          Maximum spot instance bidding price in USD.
                            Defaults to the current spot bidding price
                                                               [default: "-1"]
--cloud-startup-script      Run the provided Base64-encoded Linux shell script
                            during the instance initialization   [default: ""]
--cloud-aws-security-group  Specifies the security group in AWS  [default: ""]

Environment Variables

:warning: You will need to create a personal access token (PAT) with repository read/write access and workflow privileges. In the example workflow, this token is stored as PERSONAL_ACCESS_TOKEN.

:information_source: If using the --cloud option, you will also need to provide access credentials of your cloud compute resources as secrets. In the above example, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (with privileges to create & destroy EC2 instances) are required.

For AWS, the same credentials can also be used for configuring cloud storage.

Proxy support

CML support proxy via known environment variables http_proxy and https_proxy.

On-premise (Local) Runners

This means using on-premise machines as self-hosted runners. The cml runner function is used to set up a local self-hosted runner. On a local machine or on-premise GPU cluster, install CML as a package and then run:

cml runner \
  --repo=$your_project_repository_url \
  --token=$PERSONAL_ACCESS_TOKEN \
  --labels="local,runner" \
  --idle-timeout=180

The machine will listen for workflows from your project repository.

Local Package

In the examples above, CML is installed by the setup-cml action, or comes pre-installed in a custom Docker image pulled by a CI runner. You can also install CML as a package:

npm i -g @dvcorg/cml

You may need to install additional dependencies to use DVC plots and Vega-Lite CLI commands:

sudo apt-get install -y libcairo2-dev libpango1.0-dev libjpeg-dev libgif-dev \
                        librsvg2-dev libfontconfig-dev
npm install -g vega-cli vega-lite

CML and Vega-Lite package installation require the NodeJS package manager (npm) which ships with NodeJS. Installation instructions are below.

Install NodeJS

  • GitHub: This is probably not necessary when using GitHub's default containers or one of CML's Docker containers. Self-hosted runners may need to use a set up action to install NodeJS:
uses: actions/[email protected]
  with:
    node-version: '16'
  • GitLab: Requires direct installation.
curl -sL https://deb.nodesource.com/setup_12.x | bash
apt-get update
apt-get install -y nodejs

See Also

These are some example projects using CML.

:key: needs a PAT.

Release Notes

0.17.2
By Olivaw[bot] • Published on August 3, 2022
0.17.1
By Olivaw[bot] • Published on July 31, 2022
  • Use JSON instead of HCL for runner templates (#1100)
  • Fix cml ci pr (#1108)
0.17.0
By Olivaw[bot] • Published on July 13, 2022
  • Add send-comment --publish (#1026) and --url (#1075)
    • Support relative paths in Markdown (#1089)
  • Add runner --reuse-idle (#1070)
  • Add pr --title --body --message --branch (#1063)
  • Fix pr --skip-ci on GitLab (#1056, #1062)
  • Misc runner edge cases improvements (#1030)
  • Detect runner ACPI shutdown (#1068)
  • Increase GHA workflow timeout to 35 days from 72h (#1067)
  • Add devcontainer.json (#1082, #1085)
  • Fix external test triggers (#1061, #1078)
  • Fix packages (#1079)
  • Deps update and lock file check (#1081)
0.16.1
By Olivaw[bot] • Published on June 14, 2022
  • Add pr --skip-ci (#1035)
  • Faster runner termination (#1021)
  • Support native GPU types (#1049)
  • Preserve plain HTTP in Git remote URL (#1043)
  • Live runner logging & debug error capture (#1052)
  • Fix send-comment --pr for GitLab (#1040)
  • Soft-deprecate --file (#1059)
  • Remove tfFile (#1029)
  • Correct Bitbucket Server mention (#1024)
  • Fix TensorBoard protobuf tests (#1031)
  • Trigger example repos/external tests (#988)
  • Update dependencies (#998, #999, #1007, #1033)
0.16.0
By Olivaw[bot] • Published on May 25, 2022
  • Add cml runner --driver=bitbucket support for Bitbucket Cloud Pipelines (#798)
  • Try fallback auto-merge when unknown protection (#968)
  • Default to HEAD on send-comment --commit-sha (#992)
  • Fix runner timer restarting on spot instances (#1009)
  • Fix NVIDIA APT GPG keys (#994, #995)
  • Refine safe directory code (#986)
  • Remove cml publish pipe from examples (#1004)
  • Remove docker machine (#1011)
0.15.2
By Olivaw[bot] • Published on April 22, 2022
  • Remove standard input detection from cml publish (#983)
  • Add CWD to Git's safe.directory with cml ci (#974)
  • Document Bitbucket cml publish --native (#976)
0.15.1
By Olivaw[bot] • Published on April 20, 2022
  • Revert client-side GitHub Actions job filtering (#980)
  • Add cml runner --tpi-version (#885, #967)
0.15.0
By Olivaw[bot] • Published on April 15, 2022
  • Build pkg standalone binaries for more platforms (#954)
  • Revert conditional chaining (#966)
0.14.2
By Olivaw[bot] • Published on April 14, 2022
  • ci: add --unshallow (#957)
  • Fix pull request auto-merge edge cases (#959)
  • bitbucket Driver: body can be undefined (#962)
  • Update node engine version constraint (#963)
0.14.1
By Olivaw[bot] • Published on April 12, 2022
  • Fix GitHub actions.listWorkflowRunsForRepo status filtering (#950)
  • Paginate octokit requests (#948)

General

License
Apache-2.0
Typescript Types
None found
Tree-shakeable
No

Popularity

GitHub Stargazers
3,488
Community Interest
3,064
Number of Forks
262

Maintenance

Commits
11/2110/22030
Last Commit
Open Issues
109
Closed Issues
446
Open Pull Requests
8
Closed Pull Requests
143

Versions

Versions Released
11/2110/2207
Latest Version Released
Aug 3, 2022
Current Tags
latest0.17.2

Contributors

DavidGOrtega
DavidGOrtega
Commits: 145
0x2b3bfa0
0x2b3bfa0
Commits: 69
casperdcl
casperdcl
Commits: 49
elleobrien
elleobrien
Commits: 33
dmpetrov
dmpetrov
Commits: 7
dacbd
dacbd
Commits: 3
JamesMowatt
JamesMowatt
Commits: 2
snyk-bot
snyk-bot
Commits: 2
courentin
courentin
Commits: 2
shcheklein
shcheklein
Commits: 2
restyled-commits
restyled-commits
Commits: 1
josemaia
josemaia
Commits: 1
iterative-olivaw
iterative-olivaw
Commits: 1
vincent-leonardo
vincent-leonardo
Commits: 1
jorgeorpinel
jorgeorpinel
Commits: 1