Skip to content

[SYSTEMDS-3932] CSV reader for out-of-core streams #2352

Closed
janniklinde wants to merge 3 commits into
apache:mainfrom
janniklinde:OOCCSVReaderRevisioned
Closed

[SYSTEMDS-3932] CSV reader for out-of-core streams #2352
janniklinde wants to merge 3 commits into
apache:mainfrom
janniklinde:OOCCSVReaderRevisioned

Conversation

@janniklinde

Copy link
Copy Markdown
Contributor

This patch adds an out-of-core CSV reblock instruction. It supports reading single or multiple row partitioned CSV files into dense matrix blocks. Reads are currently performed by a single thread and thus performance is comparable to (slightly slower than) non-parallel dense CSV reads if all blocks per row can be held in cache. The number of maximum blen x blen matrix blocks that are constructed in memory simultaneously can be specified by MAX_BLOCKS_IN_CACHE.

@codecov

codecov Bot commented Nov 12, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 80.06135% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.32%. Comparing base (c7300f3) to head (c0e87a4).

Files with missing lines Patch % Lines
...ime/instructions/ooc/CSVReblockOOCInstruction.java 80.18% 39 Missing and 25 partials ⚠️
...rc/main/java/org/apache/sysds/lops/CSVReBlock.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2352      +/-   ##
============================================
+ Coverage     72.29%   72.32%   +0.02%     
- Complexity    46829    46880      +51     
============================================
  Files          1508     1509       +1     
  Lines        177638   177962     +324     
  Branches      34880    34938      +58     
============================================
+ Hits         128430   128708     +278     
- Misses        39511    39535      +24     
- Partials       9697     9719      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mboehm7

mboehm7 commented Nov 16, 2025

Copy link
Copy Markdown
Contributor

LGTM - thanks for the patch and the additional simplification @janniklinde. For now we focus on lean code and optimizing the common case in terms of data characteristics.

@mboehm7 mboehm7 closed this in 6f3cdb3 Nov 16, 2025
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Nov 16, 2025
aperov9 pushed a commit to aperov9/systemds that referenced this pull request Nov 17, 2025
@janniklinde janniklinde deleted the OOCCSVReaderRevisioned branch February 20, 2026 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants