Is your feature request related to a problem?
When using a nested DataCatalog of the kind
from pytask import DataCatalog
MODEL_NAMES = ("ols", "logistic_regression")
DATA_NAMES = ("data_1", "data_2")
nested_data_catalogs = {
model_name: {
data_name: DataCatalog(name=f"{model_name}-{data_name}")
for data_name in DATA_NAMES
}
for model_name in MODEL_NAMES
}
and adding products to a DataCatalog e.g. via the following task:
from pathlib import Path
from pytask import task
from typing_extensions import Annotated
from my_project.config import DATA_NAMES
from my_project.config import MODEL_NAMES
from my_project.config import nested_data_catalogs
for model_name in MODEL_NAMES:
for data_name in DATA_NAMES:
@task
def fit_model(
path: Path = Path("...", data_name)
) -> Annotated[
Any, nested_data_catalogs[model_name][data_name]["fitted_model"]
]:
data = ...
fitted_model = ...
return fitted_model
as described in the extended DataCatalog guide, I would expect the DAG to facilitate the nested structure of the DataCatalog.
For now the PickleNode's name, "fitted_model" in the example, is only used in the representation of the DAG. When having multiple models and datasets, the information "fitted_model" is on the one hand insufficient, and on the other hand, produces a DAG which implies the wrong structure and dependencies.
Describe the solution you'd like
I would want the DAG to facilitate the nested structure of the DataCatalog and not only use the PickleNode's name. One approach would be to display in the DAG the name of the DataCatalog and the PickleNode, e.g. ols1-data_1-fitted_model. Another approach would be to use the key values of nested_data_catalogs and join these with the PickleNode's name, producing a similar result in the example above, but guaranteeing a more informative name in general.
Is your feature request related to a problem?
When using a nested
DataCatalogof the kindand adding products to a
DataCataloge.g. via the following task:as described in the extended DataCatalog guide, I would expect the DAG to facilitate the nested structure of the
DataCatalog.For now the
PickleNode's name, "fitted_model" in the example, is only used in the representation of the DAG. When having multiple models and datasets, the information "fitted_model" is on the one hand insufficient, and on the other hand, produces a DAG which implies the wrong structure and dependencies.Describe the solution you'd like
I would want the DAG to facilitate the nested structure of the
DataCatalogand not only use thePickleNode's name. One approach would be to display in the DAG the name of theDataCatalogand thePickleNode, e.g.ols1-data_1-fitted_model. Another approach would be to use the key values ofnested_data_catalogsand join these with thePickleNode's name, producing a similar result in the example above, but guaranteeing a more informative name in general.