Usage

Initialization

Initialize a `kedro` project

This plugin must be used in an existing kedro project. If you already have a kedro project you can skip this step. To create a new kedro project run the following command. This will create the default kedro starter project in the current directory.

kedro new

Initialize a `kedro-aim`

To initialize kedro-aim run the following command.

kedro aim init

This will create a new aim.yml config file in the conf directory of your kedro project. For a more detailed explanation of the aim.yml config file, see Configuration. When you run this command for the first time, it will create a new aim repository in the conf directory of your kedro project. This repository is used to store the experiments. You can change the path of the repository in the aim.yml config file. The repository path must be relative to the conf directory.

Tracking

Option 1: Track via `run`

In order to use aim inside a node you need to pass the run object as a argument of the function. Inside the function you can access the run object and use it to log metrics and parameters.

# nodes.py
import pandas as pd
from aim import Run


def logging_in_node(run: Run, data: pd.DataFrame) -> None:
    # track metric
    run.track(0.5, "score")

    # track parameter
    run["parameter"] = "abc"

When defining the pipeline, you need to pass the run dataset as a input to the node. The run dataset will be automatically created by kedro-aim and added to the DataCatalog. As a result, the run dataset will be passed to the node as an argument.

# pipeline.py
from kedro.pipeline import node, Pipeline
from kedro.pipeline.modular_pipeline import pipeline


def create_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [
            node(
                func=logging_in_node,
                inputs=["run", "input_data"],
                outputs=None,
                name="logging_in_node",
            )
        ]
    )

Option 2: Track via dataset

The second option of tracking artifacts is to use the aim dataset. For that kedro-aim introduces a new dataset called AimArtifactDataSet. Everything that is written to that dataset will be logged by the aim Run To use this feature you need to create a new entry in the catalog.yml.

Assume you have a dataset text_artifact already in your catalog which saves its input to disk.

# catalog.yml
text_artifact:
    type: kedro.extras.datasets.text.TextDataSet
    filepath: test.md

You can change it to:

# catalog.yml
text_artifact:
  type: kedro_aim.io.artifacts.AimArtifactDataSet
  artifact_type: text # <- Could be either "text", "image", "figure", "audio"
  name: funny_joke
  data_set:
    type: kedro.extras.datasets.text.TextDataSet
    filepath: test.md

and this artifact will be automatically tracked by aim at each pipeline execution, in addition of still storing in data to the specified location of the data_set key. After which you can write and read from this dataset as if it were any it were any other dataset:

# nodes.py
import pandas as pd
from aim import Run


def write_text_to_dataset() -> str:
    return "Super interesting text artifact."

# pipeline.py
from kedro.pipeline import node, Pipeline
from kedro.pipeline.modular_pipeline import pipeline


def create_pipeline(**kwargs: Any) -> Pipeline:
    return pipeline(
        [
            node(
                func=write_text_to_dataset,
                inputs=None,
                outputs=["text_artifact"],
                name="write_text_to_dataset",
            )
        ]
    )

UI

The results of the experiments can be visualized using the aim UI. To start the aim UI run the following command after which it can be accessed at http://localhost:8080. For more information on the aim UI see the documentation

kedro aim ui