feat: Add MongoDB Atlas document store#6471
Conversation
julian-risch
left a comment
There was a problem hiding this comment.
@NoahStapp Thank you so much for opening this PR! There are a couple of smaller changes needed to make this PR ready to be merged. Please have a look at my comments below and see whether they make sense to you. 🙂 In addition to these comments, we need to take care of two other things. There are no tests added with this PR but we should add tests similar to the ones we have for other document stores. I assume you tested this PR locally with python 3.10 and MacOS?
Last but not least, we need to add a small release note to this PR: https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md#release-notes
Feel free to reach out if you have any questions or need any help.
Comments make sense, I just have a question regarding tests. We have an existing test suite that requires a connection to a MongoDB Atlas cluster, but no tests that use mocks to avoid this requirement. Is that sufficient for this PR, or do we need to also add a mocked test suite of unit tests? |
|
Haystack 1.x won't change much now that we focus on 2.0 in our development. Mocked unit tests should thus be enough here. For adding integration tests, we would need adjust the workflow and running the tests would take even longer. I'd prefer mocked unit tests for that reason. |
There was a problem hiding this comment.
Hi @NoahStapp in addition to my other comments, I see that the CI steps are still failing because of an issue here: https://github.com/deepset-ai/haystack/actions/runs/7146021601/job/19464075024?pr=6471 and in the code here. That's the match that is not supported before python3.10. And the other failing step is the missing release note. Just let me know if you need any support from my side to fix this so that we can merge your PR soon.
Whoops, forgot about the second |
|
@julian-risch @masci Is the CLA mandatory to be signed by the PR author, or can it be done separately? Usually, do other companies engage their legal team for review? |
|
@prakul the CLA is usually signed by the single committer, if that works it's the fastest way. The alternative would be having an agreement between Mongo and deepset where you discuss and accept the CLA as an organization, than I can configure the bot to allow-list any contributor for that org. |
@NoahStapp Nice! We're making good progress here. Our goal is to have this in the upcoming Haystack v1.23 release that we are preparing for tomorrow. I fixed the mypy and pylint issues. Could you please have a look at the failing unit tests? For example: https://github.com/deepset-ai/haystack/actions/runs/7180932661/job/19554287643?pr=6471 |
Are the CI checks/environment identical to the local environment setup by following CONTRIBUTING.md? I don't see that test failure locally, but I've added |
|
@julian-risch @masci Thanks for all the quick reviews and guidance. we would love to see this as part of the Haystack v1.23 release, if possible. |
|
Hey @NoahStapp, this is not documented and it doesn't show as a CI failure, but using |
|
With MONGO_ATLAS_USERNAME, MONGO_ATLAS_PASSWORD and MONGO_ATLAS_HOST defined locally, most of the tests pass for me now. The tests that are still failing for me are the following:
|
|
@NoahStapp I took care of the
If we can make them pass and are confident that everything works well, I would merge only the implementation but not the tests and then do the release tomorrow morning. |
@julian-risch Those tests require creating an Atlas Vector Search index named Once that index is created, those tests also pass. |
julian-risch
left a comment
There was a problem hiding this comment.
Looks good to me now! 👍 It's ready to be merged. We can merge the tests later after cleaning them up. For now it was important to test it at least locally, which worked.
|
Thanks @julian-risch! |
Proposed Changes:
MongoAtlasDocumentStore is a document store for use with Haystack. It functions in a similar manner as the existing Pinecone, Weaviate, and Qdrant document stores.
How did you test it?
Unit and integration tests. The integration tests that test the actual document store functionality require a MongoDB Atlas cluster, so they are not included in this PR.