-
Notifications
You must be signed in to change notification settings - Fork 129
Add ObjectStoreRegistry
#348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
criccomini
wants to merge
44
commits into
apache:main
from
criccomini:347-upstream-object-store-registry
Closed
Changes from all commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
fce6481
Initial (broken) copy
criccomini 9c72689
Strip `ObjectStoreUrl` and use instead of
criccomini 85dbce4
Remove wasm32
criccomini ff551ea
Create new \s if not exist on get
criccomini 0bedcac
Add registry tests
criccomini 395f32d
Add \
criccomini 48fd0bf
Fix rustdocs and wasm
criccomini 9b44140
cargo fmt
criccomini 81a05b0
Remove dashmap
criccomini a09bd7e
Placate clippy
criccomini 658ba5e
Expose parse_url_opts options when calling get
criccomini 9905e92
Revert "Expose parse_url_opts options when calling get"
criccomini 0a61733
Update src/registry.rs
criccomini fb4c955
Update src/registry.rs
criccomini e5ec53a
Rename list_urls
criccomini 6a51b0b
Fix doc formatting for fmt
criccomini 005c9f2
Add more tests for file scheme checking
criccomini bdb8194
Make wasm32 happy
criccomini 5a01bd8
Remove get_store_key and make DefaultObjectStore really dumb
criccomini b19c26e
Some more docs
criccomini 5e622d7
Add a get_url method as well
criccomini 3ef24cd
Clippy!
criccomini ac7f6ea
Clarify how the default registry works
criccomini b667f3e
Test url match behavior
criccomini 72b1875
Add prefix object store registry
criccomini 5cb90ae
Docs and pub
criccomini a00db85
Fix rustdocs
criccomini e468e6c
Add tests for PrefixObjectStoreRegistry
criccomini 4ef9923
Clippy
criccomini fe04c89
Add a more builder-ish pattern for prefix reg
criccomini e8decc8
Add Parsing object store registry
criccomini 2145159
Add proper test for parser object store
criccomini 39efe63
Misc cleanup
criccomini 970dc13
Clippy!
criccomini 2f87a22
Revert silly clone
criccomini 227dcba
More clippy sigh
criccomini b7937cc
Fix test_url_http
criccomini a87ecd2
Remove get_prefix method
criccomini b9a97b6
Remove prefix and parser object store registries
criccomini 25a156d
Use parse_opts with default object store registry
criccomini 5d16b15
Add a test for map_url_to_key
criccomini 1d530fd
Clean up docs
criccomini c81f155
Clippy
criccomini 2a076c9
More clippy
criccomini File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,305 @@ | ||
| // Licensed to the Apache Software Foundation (ASF) under one | ||
| // or more contributor license agreements. See the NOTICE file | ||
| // distributed with this work for additional information | ||
| // regarding copyright ownership. The ASF licenses this file | ||
| // to you under the Apache License, Version 2.0 (the | ||
| // "License"); you may not use this file except in compliance | ||
| // with the License. You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, | ||
| // software distributed under the License is distributed on an | ||
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| // KIND, either express or implied. See the License for the | ||
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| //! ObjectStoreRegistry holds object stores at runtime with a URL for each store. | ||
| //! The registry serves as a cache for object stores to avoid repeated creation. | ||
| use crate::{parse_url, Error, ObjectStore}; | ||
| use std::collections::HashMap; | ||
| use std::sync::{Arc, RwLock}; | ||
| use url::Url; | ||
|
|
||
| type GetStoreResult = Result<Option<(Arc<dyn ObjectStore>, Url)>, Error>; | ||
|
|
||
| /// [`ObjectStoreRegistry`] maps a URL to an [`ObjectStore`] instance. The meaning of | ||
| /// a URL mapping depends on the [`ObjectStoreRegistry`] implementation. See implementation | ||
| /// docs for more details. | ||
| pub trait ObjectStoreRegistry: Send + Sync + std::fmt::Debug + 'static { | ||
| /// Register a new store for the provided URL | ||
| /// | ||
| /// ## Returns | ||
| /// | ||
| /// If a store with the same URL mapping exists before, it is replaced and returned along | ||
| /// with the mapped URL. | ||
| fn register_store( | ||
| &self, | ||
| url: &Url, | ||
| store: Arc<dyn ObjectStore>, | ||
| ) -> Option<(Arc<dyn ObjectStore>, Url)>; | ||
|
|
||
| /// Get a store for the provided URL. The input URL is mapped to an [`ObjectStore`] | ||
| /// instance based on the [`ObjectStoreRegistry`] implementation. See implementation docs | ||
| /// for more details. | ||
| /// | ||
| /// If no [`ObjectStore`] is found for the `url`, an [`ObjectStore`] may be lazily be | ||
| /// created and registered. The logic for doing so is left to each [`ObjectStoreRegistry`] | ||
| /// implementation. | ||
| /// | ||
| /// ## Returns | ||
| /// | ||
| /// If a store is found for the `url`, it is returned along with the mapped URL. | ||
| /// | ||
| /// If no store is found for the `url`, `None` is returned. | ||
| /// | ||
| /// ## Errors | ||
| /// | ||
| /// Returns an error if an implementation can't parse a URL or create a store. | ||
| fn get_store(&self, url: &Url) -> GetStoreResult; | ||
|
|
||
| /// List all registered store URLs. These are the URL mappings for all registered stores. | ||
| /// | ||
| /// ## Returns | ||
| /// | ||
| /// A vector of all registered store URLs. | ||
| fn get_store_urls(&self) -> Vec<Url>; | ||
| } | ||
|
|
||
| /// An [`ObjectStoreRegistry`] implementation that maps URLs to object stores using | ||
| /// `scheme://host:port`. | ||
| /// | ||
| /// ## Examples | ||
| /// | ||
| /// Registering a store: | ||
| /// | ||
| /// ``` | ||
| /// # use std::sync::Arc; | ||
| /// # use url::Url; | ||
| /// # use object_store::ObjectStore; | ||
| /// # use object_store::memory::InMemory; | ||
| /// # use object_store::registry::{ObjectStoreRegistry, DefaultObjectStoreRegistry}; | ||
| /// let registry = DefaultObjectStoreRegistry::new(); | ||
| /// let url = Url::parse("memory://path/to/store").unwrap(); | ||
| /// let store = Arc::new(InMemory::new()) as Arc<dyn ObjectStore>; | ||
| /// registry.register_store(&url, Arc::clone(&store)); | ||
| /// let (retrieved_store, mapped_url) = registry.get_store(&url).unwrap().unwrap(); | ||
| /// assert_eq!(mapped_url.as_str(), "memory://"); | ||
| /// assert!(Arc::ptr_eq(&retrieved_store, &store)); | ||
| /// ``` | ||
| /// | ||
| /// Dynamically creating a store: | ||
| /// | ||
| /// ``` | ||
| /// # use std::sync::Arc; | ||
| /// # use url::Url; | ||
| /// # use object_store::ObjectStore; | ||
| /// # use object_store::registry::{ObjectStoreRegistry, DefaultObjectStoreRegistry}; | ||
| /// let registry = DefaultObjectStoreRegistry::new(); | ||
| /// let url = Url::parse("memory://path/to/store").unwrap(); | ||
| /// let (store, mapped_url) = registry.get_store(&url).unwrap().unwrap(); | ||
| /// assert_eq!(mapped_url.as_str(), "memory://"); | ||
| /// ``` | ||
| pub struct DefaultObjectStoreRegistry { | ||
| /// A map from URL to object store that serve list / read operations for the store | ||
| object_stores: RwLock<HashMap<Url, Arc<dyn ObjectStore>>>, | ||
| } | ||
|
|
||
| impl std::fmt::Debug for DefaultObjectStoreRegistry { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| let stores = self.object_stores.read().unwrap(); | ||
| f.debug_struct("DefaultObjectStoreRegistry") | ||
| .field("urls", &stores.keys().cloned().collect::<Vec<_>>()) | ||
| .finish() | ||
| } | ||
| } | ||
|
|
||
| impl Default for DefaultObjectStoreRegistry { | ||
| fn default() -> Self { | ||
| Self::new() | ||
| } | ||
| } | ||
|
|
||
| impl DefaultObjectStoreRegistry { | ||
| /// Create a new [`DefaultObjectStoreRegistry`] with no registered stores. | ||
| pub fn new() -> Self { | ||
| let object_stores = RwLock::new(HashMap::new()); | ||
| Self { object_stores } | ||
| } | ||
|
|
||
| /// Get the key of a url for object store registration. Mapping rules are as follows: | ||
| /// | ||
| /// - Any URL with a `file` scheme is mapped to `file:///` | ||
| /// - Any URL with a `memory` scheme is mapped to `memory://` | ||
| /// - All other URLs are mapped to `scheme://host:port` | ||
| /// | ||
| /// ## Returns | ||
| /// | ||
| /// A [`Url`] with the same scheme and host as the input, but with an empty path. | ||
| /// | ||
| /// ## Errors | ||
| /// | ||
| /// Returns an error if the input is not a valid URL. | ||
| fn map_url_to_key(url: &Url) -> Url { | ||
| match url.scheme() { | ||
| // Don't include the host for memory or path. Just hard code it | ||
| // since [`crate::parse::parse_url`] expects these to never have | ||
| // a "host" component. | ||
| "memory" => Url::parse(&format!("{}://", url.scheme())), | ||
| // Note this will handle file://path/to/file as well | ||
| // as file:///path/to/file even though file://path/to/file | ||
| // is not technically a valid URL. | ||
| "file" => Url::parse(&format!("{}:///", url.scheme())), | ||
| _ => Url::parse(&format!( | ||
| "{}://{}", | ||
| url.scheme(), | ||
| &url[url::Position::BeforeHost..url::Position::AfterPort], | ||
| )), | ||
| } | ||
| .unwrap() | ||
| } | ||
| } | ||
|
|
||
| impl ObjectStoreRegistry for DefaultObjectStoreRegistry { | ||
| /// Register a new store for the provided URL | ||
| /// | ||
| /// If a store with the same URL existed before, it is replaced and returned | ||
| fn register_store( | ||
| &self, | ||
| url: &Url, | ||
| store: Arc<dyn ObjectStore>, | ||
| ) -> Option<(Arc<dyn ObjectStore>, Url)> { | ||
| let key = Self::map_url_to_key(url); | ||
| let mut stores = self.object_stores.write().unwrap(); | ||
| stores | ||
| .insert(key.clone(), store) | ||
| .map(|old_store| (old_store, key)) | ||
| } | ||
|
|
||
| /// Get a store that was registered with the provided URL. | ||
| /// | ||
| /// If no store was registered with the provided URL, `None` is returned. | ||
| fn get_store(&self, url: &Url) -> Result<Option<(Arc<dyn ObjectStore>, Url)>, crate::Error> { | ||
| let key = Self::map_url_to_key(url); | ||
| eprintln!("key: {key}"); | ||
| let mut stores = self.object_stores.write().unwrap(); | ||
| if let Some(store) = stores.get(&key) { | ||
| Ok(Some((Arc::clone(store), key))) | ||
| } else { | ||
| let (store, _) = parse_url(&key)?; | ||
| let store: Arc<dyn ObjectStore> = store.into(); | ||
| stores.insert(key.clone(), Arc::clone(&store)); | ||
| Ok(Some((store, key))) | ||
| } | ||
| } | ||
|
|
||
| /// Returns a vector of all registered store URLs. | ||
| fn get_store_urls(&self) -> Vec<Url> { | ||
| let stores = self.object_stores.read().unwrap(); | ||
| stores.keys().cloned().collect() | ||
alamb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
| use crate::memory::InMemory; | ||
|
|
||
| #[test] | ||
| fn test_register_store() { | ||
| let registry = DefaultObjectStoreRegistry::new(); | ||
| let url = Url::parse("memory://").unwrap(); | ||
| let store = Arc::new(InMemory::new()) as Arc<dyn ObjectStore>; | ||
| let old_store = registry.register_store(&url, Arc::clone(&store)); | ||
| assert!(old_store.is_none()); | ||
| let new_store = Arc::new(InMemory::new()) as Arc<dyn ObjectStore>; | ||
| let (old_store, mapped_url) = registry | ||
| .register_store(&url, Arc::clone(&new_store)) | ||
| .unwrap(); | ||
| assert_eq!(mapped_url.as_str(), "memory://"); | ||
| assert!(Arc::ptr_eq(&old_store, &store)); | ||
| let (retrieved_store, mapped_url) = registry.get_store(&url).unwrap().unwrap(); | ||
| assert_eq!(mapped_url.as_str(), "memory://"); | ||
| assert!(Arc::ptr_eq(&retrieved_store, &new_store)); | ||
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_dynamic_register_store() { | ||
| let registry = DefaultObjectStoreRegistry::new(); | ||
| let url = Url::parse("memory://").unwrap(); | ||
| let (first_store, mapped_url) = registry.get_store(&url).unwrap().unwrap(); | ||
| assert_eq!(mapped_url.as_str(), "memory://"); | ||
| first_store.put(&"/foo".into(), "bar".into()).await.unwrap(); | ||
| let (second_store, mapped_url) = registry.get_store(&url).unwrap().unwrap(); | ||
| assert_eq!(mapped_url.as_str(), "memory://"); | ||
| eprintln!("first_store: {:?}", first_store); | ||
| eprintln!("second_store: {:?}", second_store); | ||
| assert!(Arc::ptr_eq(&second_store, &first_store)); | ||
| let val = second_store | ||
| .get(&"/foo".into()) | ||
| .await | ||
| .unwrap() | ||
| .bytes() | ||
| .await | ||
| .unwrap(); | ||
| assert_eq!(val.as_ref(), b"bar"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_list_urls() { | ||
| let registry = DefaultObjectStoreRegistry::new(); | ||
| let url = Url::parse("memory://").unwrap(); | ||
| let store = Arc::new(InMemory::new()) as Arc<dyn ObjectStore>; | ||
| registry.register_store(&url, store); | ||
| let urls = registry.get_store_urls(); | ||
| assert_eq!(urls.len(), 1); | ||
| assert_eq!(urls[0].as_str(), "memory://"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_get_child_url() { | ||
| let registry = DefaultObjectStoreRegistry::new(); | ||
| let base_url = Url::parse("memory://").unwrap(); | ||
| let store = Arc::new(InMemory::new()) as Arc<dyn ObjectStore>; | ||
| registry.register_store(&base_url, Arc::clone(&store)); | ||
| let subprefix_url = Url::parse("memory://foo/bar").unwrap(); | ||
| let (retrieved_store, mapped_url) = registry.get_store(&subprefix_url).unwrap().unwrap(); | ||
| assert_eq!(mapped_url.as_str(), "memory://"); | ||
| assert!(Arc::ptr_eq(&retrieved_store, &store)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_map_url_to_key() { | ||
| let test_cases = [ | ||
| ("s3://bucket", "s3://bucket"), | ||
| ("s3://bucket/path", "s3://bucket"), | ||
| ("s3://bucket/path?param=value", "s3://bucket"), | ||
| ("memory://", "memory://"), | ||
| ("memory://path", "memory://"), | ||
| ("file:///", "file:///"), | ||
| ("file:///path", "file:///"), | ||
| ("http://host:1234", "http://host:1234"), | ||
| ("http://host:1234/path", "http://host:1234"), | ||
| ( | ||
| "http://user:pass@host:1234/path/to/file", | ||
| "http://host:1234", | ||
| ), | ||
| ]; | ||
|
|
||
| for (input, expected) in test_cases { | ||
| let input_url = Url::parse(input).unwrap(); | ||
| let expected_url = Url::parse(expected).unwrap(); | ||
| let result = DefaultObjectStoreRegistry::map_url_to_key(&input_url); | ||
|
|
||
| assert_eq!( | ||
| result.as_str(), | ||
| expected_url.as_str(), | ||
| "Expected '{}' to map to '{}', but got '{}'", | ||
| input, | ||
| expected, | ||
| result | ||
| ); | ||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add some examples of these two use cases -- we don't have to do it in the first PR, but maybe we could do it in a follow on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do! I think it's fine as part of this PR.