Metaflow
4 minute read
Overview
Metaflow is a framework created by Netflix for creating and running ML workflows.
This integration lets users apply decorators to Metaflow steps and flows to automatically log parameters and artifacts to W&B.
- Decorating a step will turn logging off or on for certain types within that step.
- Decorating the flow will turn logging off or on for every step in the flow.
Quickstart
Install W&B and login
!pip install -Uqqq metaflow fastcore wandb
import wandb
wandb.login()
pip install -Uqqq metaflow fastcore wandb
wandb login
Decorate your flows and steps
Decorating a step turns logging off or on for certain types within that step.
In this example, all datasets and models in start
will be logged
from wandb.integration.metaflow import wandb_log
class WandbExampleFlow(FlowSpec):
@wandb_log(datasets=True, models=True, settings=wandb.Settings(...))
@step
def start(self):
self.raw_df = pd.read_csv(...). # pd.DataFrame -> upload as dataset
self.model_file = torch.load(...) # nn.Module -> upload as model
self.next(self.transform)
Decorating a flow is equivalent to decorating all the constituent steps with a default.
In this case, all steps in WandbExampleFlow
default to logging datasets and models by default, just like decorating each step with @wandb_log(datasets=True, models=True)
from wandb.integration.metaflow import wandb_log
@wandb_log(datasets=True, models=True) # decorate all @step
class WandbExampleFlow(FlowSpec):
@step
def start(self):
self.raw_df = pd.read_csv(...). # pd.DataFrame -> upload as dataset
self.model_file = torch.load(...) # nn.Module -> upload as model
self.next(self.transform)
Decorating the flow is equivalent to decorating all steps with a default. That means if you later decorate a Step with another @wandb_log
, it overrides the flow-level decoration.
In this example:
start
andmid
log both datasets and models.end
logs neither datasets nor models.
from wandb.integration.metaflow import wandb_log
@wandb_log(datasets=True, models=True) # same as decorating start and mid
class WandbExampleFlow(FlowSpec):
# this step will log datasets and models
@step
def start(self):
self.raw_df = pd.read_csv(...). # pd.DataFrame -> upload as dataset
self.model_file = torch.load(...) # nn.Module -> upload as model
self.next(self.mid)
# this step will also log datasets and models
@step
def mid(self):
self.raw_df = pd.read_csv(...). # pd.DataFrame -> upload as dataset
self.model_file = torch.load(...) # nn.Module -> upload as model
self.next(self.end)
# this step is overwritten and will NOT log datasets OR models
@wandb_log(datasets=False, models=False)
@step
def end(self):
self.raw_df = pd.read_csv(...).
self.model_file = torch.load(...)
Access your data programmatically
You can access the information we’ve captured in three ways: inside the original Python process being logged using the wandb
client library, with the web app UI, or programmatically using our Public API. Parameter
s are saved to W&B’s config
and can be found in the Overview tab. datasets
, models
, and others
are saved to W&B Artifacts and can be found in the Artifacts tab. Base python types are saved to W&B’s summary
dict and can be found in the Overview tab. See our guide to the Public API for details on using the API to get this information programmatically from outside .
Cheat sheet
Data | Client library | UI |
---|---|---|
Parameter(...) |
wandb.config |
Overview tab, Config |
datasets , models , others |
wandb.use_artifact("{var_name}:latest") |
Artifacts tab |
Base Python types (dict , list , str , etc.) |
wandb.summary |
Overview tab, Summary |
wandb_log
kwargs
kwarg | Options |
---|---|
datasets |
|
models |
|
others |
|
settings |
By default, if:
|
Frequently Asked Questions
What exactly do you log? Do you log all instance and local variables?
wandb_log
only logs instance variables. Local variables are NEVER logged. This is useful to avoid logging unnecessary data.
Which data types get logged?
We currently support these types:
Logging Setting | Type |
---|---|
default (always on) |
|
datasets |
|
models |
|
others |
|
How can I configure logging behavior?
Kind of Variable | behavior | Example | Data Type |
---|---|---|---|
Instance | Auto-logged | self.accuracy |
float |
Instance | Logged if datasets=True |
self.df |
pd.DataFrame |
Instance | Not logged if datasets=False |
self.df |
pd.DataFrame |
Local | Never logged | accuracy |
float |
Local | Never logged | df |
pd.DataFrame |
Is artifact lineage tracked?
Yes. If you have an artifact that is an output of step A and an input to step B, we automatically construct the lineage DAG for you.
For an example of this behavior, please see this notebook and its corresponding W&B Artifacts page
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.