Monitoring Azure Experiments

Azure Cloud is a popular work environment for many data scientists, yet many features remain poorly documented. This note shows how to monitor Azure experiments in a more handy and detailed way than through web or cl interface.

The trick is to create a dashborad of experiments and their respective runs, up to a desired level of detail, from Python. The workhorse is the following handy utility function:

from collections import namedtuple

def get_runs_summary(ws):
    """Summarise all runs under a given workspace, with experiment name, run id and run status
    Args:
        ws (azureml.core.Workspace): Azure workspace to look into
    """
    # NOTE: extend the scope of run details if needed
    record = namedtuple('Run_Description',['job_name','run_id','run_status'])
    for exp_name,exp_obj in ws.experiments.items():
        for run_obj in exp_obj.get_runs():
            yield(record(exp_name,run_obj.id,run_obj.status))

Now it’s time to see it in action 😎

# get the default workspace
from azureml.core import Workspace
import pandas as pd

ws = Workspace.from_config()

# generate the job dashboard and inspect
runs = get_runs_summary(ws)
summary_df = pd.DataFrame(runs)
summary_df.head()
# count jobs by status
summary_df.groupby('run_status').size()

Use the dashboard for to automatically manage experiments. For example, to kill running jobs:

from azureml.core import Experiment, Run

for exp_name,run_id in summary_df.loc[summary_df.run_status=='Running',['job_name','run_id']].values:
    exp = Experiment(ws,exp_name)
    run = Run(exp,run_id)
    run.cancel()

Check the jupyter notebook in my repository for a one-click demo.

Dodaj komentarz

Twój adres e-mail nie zostanie opublikowany.