Skip to content

Ensure channel version database exists when adding to community library#5233

Merged
AlexVelezLl merged 8 commits intolearningequality:unstablefrom
Jakoma02:ensure-channel-version-database-exists
Sep 22, 2025
Merged

Ensure channel version database exists when adding to community library#5233
AlexVelezLl merged 8 commits intolearningequality:unstablefrom
Jakoma02:ensure-channel-version-database-exists

Conversation

@Jakoma02
Copy link
Copy Markdown
Contributor

@Jakoma02 Jakoma02 commented Jul 31, 2025

Summary

This PR ensures that if an older channel that does not have a versioned database file yet is added to the community library, the versioned database file is created.

References

Solves #5191.

This PR depends on changes from #5228, and must be merged after it. (Done)

Reviewer guidance

After merging #5228, this PR should first be rebased onto the merged changes and only then reviewed and merged. (Done)

mapper.run()


def _possibly_migrate_unversioned_database(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read it in depth yet 😅. But, a general comment is that we should make this copy when we create the submission, not after it's approved and mapped to the public models.

Mainly for two reasons:

  1. If the user has published more recent versions between submission creation and submission approval, then it will no longer be true that the current channel database is the database for that channel version.
  2. In the future, we'll need to create a way to preview the channel version related to the submission, and for that, we'll need to ensure that the channel-versioned database exists, and this preview would happen before approving the submission.

If there are arguments to have this copy at export time instead, Im happy to hear it too 😄

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea for doing this at export time was to deal with content databases at a place that was already dealing with content databases, without complicating the viewset logic with something I thought it did not need to care about. I was trying to solve 1 by checking whether the database contains the channel metadata with the given version, but 2 alone is a good reason for actually doing this at submission creation time.

I am thinking that I could create a create_versioned_database_if_needed method inside contentcuration/utils/publish.sh and use it from the submission viewset -- or is there a better place for it?

Also, I think that the using_temp_migrated_database helper is fairly useful and makes the export_channel_to_kolibri_public implementation (arguably) more readable, but my motivation for creating it was to avoid reimplementing its logic in _possibly_migrate_unversioned_database, and it is no longer valid. Should I scratch this, or should I keep this change anyway since it is already done?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! I agree on not complicating the viewset logic, I think this function can perfectly live in the publish.py module.

Should I scratch this, or should I keep this change anyway since it is already done?

I also think it is more readable now, and we can re-use this if we ever need it, so Im fine with keeping this change!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 3f83c80.

@Jakoma02 Jakoma02 force-pushed the ensure-channel-version-database-exists branch from 30b24d0 to 3f83c80 Compare August 6, 2025 17:40
@Jakoma02
Copy link
Copy Markdown
Contributor Author

Jakoma02 commented Aug 6, 2025

I have rebased this PR onto current community-channels right now.

Copy link
Copy Markdown
Member

@AlexVelezLl AlexVelezLl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jakoma02! Code changes looks mostly correct, I just found a little bug on how we are copying the versioned database, and noticed that we should probably have this process as an async task. Apart from that, code changes looks good, and tests provide a lot of confidence.

)

with storage.open(unversioned_db_storage_path, "rb") as unversioned_db_file:
with storage.open(versioned_db_storage_path, "wb") as versioned_db_file:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting this error when try to create a submission that doesn't have a versioned channel database:

  File "/home/alexvelezll/.pyenv/versions/studio-py3.10/lib/python3.10/site-packages/django_s3_storage/storage.py", line 318, in _open
    raise ValueError("S3 files can only be opened in read-only mode")
ValueError: S3 files can only be opened in read-only mode

So, it seems like a better way to go here is just to save the same database in the new path just like we do in the publish_channel method:

        with storage.open(unversioned_db_storage_path, "rb") as unversioned_db_file:
            storage.save(versioned_db_storage_path, unversioned_db_file)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this slipped through. It should be fixed in 054c89d, and I did more thorough manual testing this time.

# When creating a new submission, ensure the channel has a versioned database
# (it might not have if the channel was published before versioned databases
# were introduced).
ensure_versioned_database_exists(self.channel)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that this should probably happen in an async task since downloading the databases may take some time, and we should not keep the connection open for that long. So could you please create a new task in contentcuration/tasks.py that just calls the ensure_versioned_database_exists method (so we dont have all this logic in the tasks module) and then enqueue it here? Apologies I did not catch this earlier.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 611b641.

@rtibbles rtibbles changed the base branch from community-channels to unstable August 28, 2025 00:00
@Jakoma02 Jakoma02 requested a review from AlexVelezLl September 2, 2025 21:22
Copy link
Copy Markdown
Member

@AlexVelezLl AlexVelezLl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jakoma02, code changes looks good! Just a nitpick comment. We will also need to rebase this PR to solve the conflicts. After that, this is good to go!

mapper.run()
db_storage_path = versioned_db_storage_path

with using_temp_migrated_database(db_storage_path):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename it to something like using_temp_migrated_content_database to explicity declare that this is using the content database context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b115edb

@Jakoma02 Jakoma02 force-pushed the ensure-channel-version-database-exists branch from b115edb to 15d40ff Compare September 19, 2025 17:55
@Jakoma02
Copy link
Copy Markdown
Contributor Author

I rebased the PR on top of current unstable. The rebasing was not completely trivial, it would be great if you could at least briefly double-check the new commits (I have some respect for rewriting git history like this, because if I make a mistake here and it is discovered later, it will be hard to tell that this was a rebasing error and it might make the history really hard to understand).

Copy link
Copy Markdown
Member

@AlexVelezLl AlexVelezLl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new task is working correctly, code changes look good, and tests gives a lot of reassurance. Went through all the PR commits and didnt spot anything weird. Thanks a lot @Jakoma02!

@AlexVelezLl AlexVelezLl merged commit fc27318 into learningequality:unstable Sep 22, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESoCC: Ensure that channel version's database exists when a community library submission is created

3 participants