Currently: a fragment is a product of a scan; it is a lazy collection of scan tasks corresponding to a data source which is logically singular (like a single file, a single row group, ...). It would be more useful if instead a fragment were the direct object of a scan; one scans a fragment (or a collection of fragments):
-
Remove ScanOptions from Fragment's properties and move it into Fragment::Scan parameters.
-
Remove ScanOptions from Dataset::GetFragments. We can provide an overload to support predicate pushdown in FileSystemDataset and UnionDataset Dataset::GetFragments(std::shared_ptr<Expression> predicate).
-
Expose lazy accessor to Fragment::physical_schema()
-
Consolidate ScanOptions and ScanContext
This will lessen the cognitive dissonance between fragments and files since fragments will no longer include references to scan properties.
Reporter: Francois Saint-Jacques / @fsaintjacques
Assignee: Francois Saint-Jacques / @fsaintjacques
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-8065. Please see the migration documentation for further details.
Currently: a fragment is a product of a scan; it is a lazy collection of scan tasks corresponding to a data source which is logically singular (like a single file, a single row group, ...). It would be more useful if instead a fragment were the direct object of a scan; one scans a fragment (or a collection of fragments):
Remove
ScanOptionsfrom Fragment's properties and move it intoFragment::Scanparameters.Remove
ScanOptionsfromDataset::GetFragments. We can provide an overload to support predicate pushdown in FileSystemDataset and UnionDatasetDataset::GetFragments(std::shared_ptr<Expression> predicate).Expose lazy accessor to Fragment::physical_schema()
Consolidate ScanOptions and ScanContext
This will lessen the cognitive dissonance between fragments and files since fragments will no longer include references to scan properties.
Reporter: Francois Saint-Jacques / @fsaintjacques
Assignee: Francois Saint-Jacques / @fsaintjacques
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-8065. Please see the migration documentation for further details.