Notebook Migration tool: database schema design #4175

zyratlo · 2026-01-24T00:47:44Z

zyratlo
Jan 24, 2026

1. Context

The Notebook Migration tool is an in-development feature that utilizes LLM capabilities to migrate user-uploaded Python Jupyter notebooks to Texera workflows. When the user uploads a notebook, the LLM will return the generated workflow and a mapping. The mapping serves as the link between the notebook and the workflow, containing information about which notebook cells correspond to which workflow operators and vice versa. This tool requires two pieces of information to be stored in the Texera database: the uploaded notebook and the mapping, both of which are JSONs. This discussion focuses on the database schema design for these two data.

2. Current Design

The current design adds two new tables: workflow_notebook_source and workflow_notebook_mapping. The former stores the notebook and the latter stores the mapping.

2a. workflow_notebook_source

This table relates to the existing workflow table and uses the wid as its primary key, and stores the notebook as a JSON binary. Note that this table does not need to be a standalone table, as notebook can be merged into the workflow table as a default null value. As a consequence of this design, only one notebook can ever be tied to a workflow (this limitation is discussed later).

2b. workflow_notebook_mapping

This table relates to the existing workflow_version table and uses wid and vid as primary keys. The reason the mapping relates to the workflow_version table instead of the workflow table is because when the user edits the workflow, we want to generate another mapping between the new workflow and the original notebook. This design allows us to store a mapping for every workflow version.

2c. Limitations of this design

As mentioned earlier, this design only allows one notebook to ever be associated with a workflow, basically assuming that notebooks are immutable after workflow generation. This is problematic, because if the user edits the notebook after generation then our mappings will no longer function. We can't store the notebook in the same table as the mapping because if we change the notebook but not the workflow, we won't have a new (wid, vid) key since the workflow did not change.

3. Possible Solutions/Workarounds

A workaround that maintains the current design is to create a new workflow every time the notebook is modified. This way the mappings will still be valid after the notebook is changed. However this is not desirable as it adds unnecessary complexity to the user, who now has to manage multiple workflows for the same project.

Other solutions would likely require modifying or adding more tables in order to support multiple notebook versions while maintaining mapping validity. I am currently in the process of brainstorming alternative better designs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebook Migration tool: database schema design #4175

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Notebook Migration tool: database schema design #4175

Uh oh!

zyratlo Jan 24, 2026

1. Context

2. Current Design

2a. workflow_notebook_source

2b. workflow_notebook_mapping

2c. Limitations of this design

3. Possible Solutions/Workarounds

Replies: 0 comments

zyratlo
Jan 24, 2026