Usage
Initialization
Initialize a kedro
project
This plugin must be used in an existing kedro project. If you already have a kedro project you can skip this step. To create a new kedro project run the following command. This will create the default kedro starter project in the current directory.
kedro new
Initialize a kedro-aim
To initialize kedro-aim
run the following command.
kedro aim init
This will create a new aim.yml
config file in the conf
directory of your kedro project.
For a more detailed explanation of the aim.yml
config file, see Configuration.
When you run this command for the first time, it will create a new aim
repository in the conf
directory of your kedro project.
This repository is used to store the experiments.
You can change the path of the repository in the aim.yml
config file.
The repository path must be relative to the conf
directory.
Tracking
Option 1: Track via run
In order to use aim
inside a node you need to pass the run object as a argument of the function.
Inside the function you can access the run object and use it to log metrics and parameters.
# nodes.py
import pandas as pd
from aim import Run
def logging_in_node(run: Run, data: pd.DataFrame) -> None:
# track metric
run.track(0.5, "score")
# track parameter
run["parameter"] = "abc"
When defining the pipeline, you need to pass the run
dataset as a input to the node.
The run
dataset will be automatically created by kedro-aim
and added to the DataCatalog.
As a result, the run
dataset will be passed to the node as an argument.
# pipeline.py
from kedro.pipeline import node, Pipeline
from kedro.pipeline.modular_pipeline import pipeline
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=logging_in_node,
inputs=["run", "input_data"],
outputs=None,
name="logging_in_node",
)
]
)
Option 2: Track via dataset
The second option of tracking artifacts is to use the aim
dataset.
For that kedro-aim
introduces a new dataset called AimArtifactDataSet
.
Everything that is written to that dataset will be logged by the aim Run
To use this feature you need to create a new entry in the catalog.yml
.
Assume you have a dataset text_artifact
already in your catalog which saves its input to disk.
# catalog.yml
text_artifact:
type: kedro.extras.datasets.text.TextDataSet
filepath: test.md
You can change it to:
# catalog.yml
text_artifact:
type: kedro_aim.io.artifacts.AimArtifactDataSet
artifact_type: text # <- Could be either "text", "image", "figure", "audio"
name: funny_joke
data_set:
type: kedro.extras.datasets.text.TextDataSet
filepath: test.md
and this artifact will be automatically tracked by aim
at each pipeline execution, in addition of still storing in data to the specified location of the data_set
key.
After which you can write and read from this dataset as if it were any it were any other dataset:
# nodes.py
import pandas as pd
from aim import Run
def write_text_to_dataset() -> str:
return "Super interesting text artifact."
# pipeline.py
from kedro.pipeline import node, Pipeline
from kedro.pipeline.modular_pipeline import pipeline
def create_pipeline(**kwargs: Any) -> Pipeline:
return pipeline(
[
node(
func=write_text_to_dataset,
inputs=None,
outputs=["text_artifact"],
name="write_text_to_dataset",
)
]
)
UI
The results of the experiments can be visualized using the aim
UI.
To start the aim
UI run the following command after which it can be accessed at http://localhost:8080.
For more information on the aim
UI see the documentation
kedro aim ui