You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Notebook Migration tool is an in-development feature that utilizes LLM capabilities to migrate user-uploaded Python Jupyter notebooks to Texera workflows. When the user uploads a notebook, the LLM will return the generated workflow and a mapping. The mapping serves as the link between the notebook and the workflow, containing information about which notebook cells correspond to which workflow operators and vice versa. This tool requires two pieces of information to be stored in the Texera database: the uploaded notebook and the mapping, both of which are JSONs. This discussion focuses on the database schema design for these two data.
2. Current Design
The current design adds two new tables: workflow_notebook_source and workflow_notebook_mapping. The former stores the notebook and the latter stores the mapping.
2a. workflow_notebook_source
This table relates to the existing workflow table and uses the wid as its primary key, and stores the notebook as a JSON binary. Note that this table does not need to be a standalone table, as notebook can be merged into the workflow table as a default null value. As a consequence of this design, only one notebook can ever be tied to a workflow (this limitation is discussed later).
2b. workflow_notebook_mapping
This table relates to the existing workflow_version table and uses wid and vid as primary keys. The reason the mapping relates to the workflow_version table instead of the workflow table is because when the user edits the workflow, we want to generate another mapping between the new workflow and the original notebook. This design allows us to store a mapping for every workflow version.
2c. Limitations of this design
As mentioned earlier, this design only allows one notebook to ever be associated with a workflow, basically assuming that notebooks are immutable after workflow generation. This is problematic, because if the user edits the notebook after generation then our mappings will no longer function. We can't store the notebook in the same table as the mapping because if we change the notebook but not the workflow, we won't have a new (wid, vid) key since the workflow did not change.
3. Possible Solutions/Workarounds
A workaround that maintains the current design is to create a new workflow every time the notebook is modified. This way the mappings will still be valid after the notebook is changed. However this is not desirable as it adds unnecessary complexity to the user, who now has to manage multiple workflows for the same project.
Other solutions would likely require modifying or adding more tables in order to support multiple notebook versions while maintaining mapping validity. I am currently in the process of brainstorming alternative better designs.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
1. Context
The Notebook Migration tool is an in-development feature that utilizes LLM capabilities to migrate user-uploaded Python Jupyter notebooks to Texera workflows. When the user uploads a notebook, the LLM will return the generated workflow and a mapping. The mapping serves as the link between the notebook and the workflow, containing information about which notebook cells correspond to which workflow operators and vice versa. This tool requires two pieces of information to be stored in the Texera database: the uploaded notebook and the mapping, both of which are JSONs. This discussion focuses on the database schema design for these two data.
2. Current Design
The current design adds two new tables:
workflow_notebook_sourceandworkflow_notebook_mapping. The former stores the notebook and the latter stores the mapping.2a. workflow_notebook_source
This table relates to the existing
workflowtable and uses thewidas its primary key, and stores the notebook as a JSON binary. Note that this table does not need to be a standalone table, asnotebookcan be merged into theworkflowtable as a default null value. As a consequence of this design, only one notebook can ever be tied to a workflow (this limitation is discussed later).2b. workflow_notebook_mapping
This table relates to the existing
workflow_versiontable and useswidandvidas primary keys. The reason the mapping relates to theworkflow_versiontable instead of theworkflowtable is because when the user edits the workflow, we want to generate another mapping between the new workflow and the original notebook. This design allows us to store a mapping for every workflow version.2c. Limitations of this design
As mentioned earlier, this design only allows one notebook to ever be associated with a workflow, basically assuming that notebooks are immutable after workflow generation. This is problematic, because if the user edits the notebook after generation then our mappings will no longer function. We can't store the notebook in the same table as the mapping because if we change the notebook but not the workflow, we won't have a new
(wid, vid)key since the workflow did not change.3. Possible Solutions/Workarounds
A workaround that maintains the current design is to create a new workflow every time the notebook is modified. This way the mappings will still be valid after the notebook is changed. However this is not desirable as it adds unnecessary complexity to the user, who now has to manage multiple workflows for the same project.
Other solutions would likely require modifying or adding more tables in order to support multiple notebook versions while maintaining mapping validity. I am currently in the process of brainstorming alternative better designs.
Beta Was this translation helpful? Give feedback.
All reactions