[Core]add table schema cache for SchemaManager#2939
Conversation
| * The class is responsible for providing a schemaManager with a concurrent and serializable schema | ||
| * cache. | ||
| */ | ||
| public class SchemaCache implements Serializable { |
There was a problem hiding this comment.
You don't need to introduce this class, just merge this into SchemaManager.
| } | ||
|
|
||
| private void writeObject(ObjectOutputStream out) throws IOException { | ||
| Map<Long, TableSchema> map = new HashMap<>(cache.asMap()); |
There was a problem hiding this comment.
I think we don't need serialize this cache.
0aa4f0b to
6c6e483
Compare
6c6e483 to
7b5b862
Compare
7b5b862 to
a013cad
Compare
| private Map<Long, TableSchema> loadSchemaCache(FileIO fileIO, Path path) { | ||
| Map<Long, TableSchema> schemaCache = new ConcurrentHashMap<>(); | ||
| SchemaManager schemaManager = new SchemaManager(fileIO, path); | ||
| for (TableSchema schema : schemaManager.listAll()) { |
There was a problem hiding this comment.
Why we need to list all at first?
| protected final TableSchema tableSchema; | ||
| protected final CatalogEnvironment catalogEnvironment; | ||
|
|
||
| protected final Map<Long, TableSchema> schemaCache; |
There was a problem hiding this comment.
or just store SchemaManager here?
7df4bf1 to
9608263
Compare
| private final Map<Long, TableSchema> cache; | ||
|
|
||
| public SchemaManager(FileIO fileIO, Path tableRoot) { | ||
| this(fileIO, tableRoot, null); |
There was a problem hiding this comment.
new hash map? just keep cache not null?
| } | ||
| this.tableSchema = tableSchema; | ||
| this.catalogEnvironment = catalogEnvironment; | ||
| tableSchemaManager = new SchemaManager(fileIO, path, new ConcurrentHashMap<>()); |
There was a problem hiding this comment.
Just use public SchemaManager(FileIO fileIO, Path tableRoot)
| manager.commitChanges(SchemaChange.setOption("ccc", "ddd")); | ||
|
|
||
| Map<Long, TableSchema> cachedSchema = manager.getCachedSchema(); | ||
| assertThat(cachedSchema).hasSize(3); |
There was a problem hiding this comment.
You don't need to getCachedSchema.
You can just verify same instances.
428f5bb to
3ae9815
Compare
|
See #3021 |
|
At present, this solution is relatively obscure. Generally speaking, it is difficult to accept a solution where the cache is serialized and reused by distributed tasks. Considering that #3021 has already been merged and can solve most problems in most cases (without schema changes), I am considering closing this PR. You can reopen this PR at any time if you have any further needs. |
Purpose
When reading a split ,recordReader will loads the schema of the split from the FileSystem.
The pr is for adding the cache of TableSchema for SchemaManager to reduce the access of FileSystem.
Tests
SchemaManagerTest#testCache
API and Format
Documentation