Ensure channel version database exists when adding to community library#5233
Conversation
| mapper.run() | ||
|
|
||
|
|
||
| def _possibly_migrate_unversioned_database( |
There was a problem hiding this comment.
I haven't read it in depth yet 😅. But, a general comment is that we should make this copy when we create the submission, not after it's approved and mapped to the public models.
Mainly for two reasons:
- If the user has published more recent versions between submission creation and submission approval, then it will no longer be true that the current channel database is the database for that channel version.
- In the future, we'll need to create a way to preview the channel version related to the submission, and for that, we'll need to ensure that the channel-versioned database exists, and this preview would happen before approving the submission.
If there are arguments to have this copy at export time instead, Im happy to hear it too 😄
There was a problem hiding this comment.
My idea for doing this at export time was to deal with content databases at a place that was already dealing with content databases, without complicating the viewset logic with something I thought it did not need to care about. I was trying to solve 1 by checking whether the database contains the channel metadata with the given version, but 2 alone is a good reason for actually doing this at submission creation time.
I am thinking that I could create a create_versioned_database_if_needed method inside contentcuration/utils/publish.sh and use it from the submission viewset -- or is there a better place for it?
Also, I think that the using_temp_migrated_database helper is fairly useful and makes the export_channel_to_kolibri_public implementation (arguably) more readable, but my motivation for creating it was to avoid reimplementing its logic in _possibly_migrate_unversioned_database, and it is no longer valid. Should I scratch this, or should I keep this change anyway since it is already done?
There was a problem hiding this comment.
For sure! I agree on not complicating the viewset logic, I think this function can perfectly live in the publish.py module.
Should I scratch this, or should I keep this change anyway since it is already done?
I also think it is more readable now, and we can re-use this if we ever need it, so Im fine with keeping this change!
30b24d0 to
3f83c80
Compare
|
I have rebased this PR onto current community-channels right now. |
AlexVelezLl
left a comment
There was a problem hiding this comment.
Thanks @Jakoma02! Code changes looks mostly correct, I just found a little bug on how we are copying the versioned database, and noticed that we should probably have this process as an async task. Apart from that, code changes looks good, and tests provide a lot of confidence.
| ) | ||
|
|
||
| with storage.open(unversioned_db_storage_path, "rb") as unversioned_db_file: | ||
| with storage.open(versioned_db_storage_path, "wb") as versioned_db_file: |
There was a problem hiding this comment.
I am getting this error when try to create a submission that doesn't have a versioned channel database:
File "/home/alexvelezll/.pyenv/versions/studio-py3.10/lib/python3.10/site-packages/django_s3_storage/storage.py", line 318, in _open
raise ValueError("S3 files can only be opened in read-only mode")
ValueError: S3 files can only be opened in read-only mode
So, it seems like a better way to go here is just to save the same database in the new path just like we do in the publish_channel method:
with storage.open(unversioned_db_storage_path, "rb") as unversioned_db_file:
storage.save(versioned_db_storage_path, unversioned_db_file)There was a problem hiding this comment.
You are right, this slipped through. It should be fixed in 054c89d, and I did more thorough manual testing this time.
| # When creating a new submission, ensure the channel has a versioned database | ||
| # (it might not have if the channel was published before versioned databases | ||
| # were introduced). | ||
| ensure_versioned_database_exists(self.channel) |
There was a problem hiding this comment.
Just realized that this should probably happen in an async task since downloading the databases may take some time, and we should not keep the connection open for that long. So could you please create a new task in contentcuration/tasks.py that just calls the ensure_versioned_database_exists method (so we dont have all this logic in the tasks module) and then enqueue it here? Apologies I did not catch this earlier.
AlexVelezLl
left a comment
There was a problem hiding this comment.
Thanks @Jakoma02, code changes looks good! Just a nitpick comment. We will also need to rebase this PR to solve the conflicts. After that, this is good to go!
| mapper.run() | ||
| db_storage_path = versioned_db_storage_path | ||
|
|
||
| with using_temp_migrated_database(db_storage_path): |
There was a problem hiding this comment.
I'd rename it to something like using_temp_migrated_content_database to explicity declare that this is using the content database context.
b115edb to
15d40ff
Compare
|
I rebased the PR on top of current |
AlexVelezLl
left a comment
There was a problem hiding this comment.
The new task is working correctly, code changes look good, and tests gives a lot of reassurance. Went through all the PR commits and didnt spot anything weird. Thanks a lot @Jakoma02!
Summary
This PR ensures that if an older channel that does not have a versioned database file yet is added to the community library, the versioned database file is created.
References
Solves #5191.
This PR depends on changes from #5228, and must be merged after it.(Done)Reviewer guidance
After merging #5228, this PR should first be rebased onto the merged changes and only then reviewed and merged.(Done)