fix: memory-based reader paths drop records after blank lines (#122)#123
Merged
Conversation
MemorySliceLineSource.TryReadLine and MemoryReaderLineSource.TryReadLine
returned false on blank middle-of-stream lines, terminating enumeration
and silently dropping every record that followed. The TextReader-based
paths don't have this bug because TextReader.ReadLine distinguishes blank
("") from EOF (null), letting the engine's SkipRow predicate skip empties
while continuing to read.
Fix: TryReadLine now returns false only when position >= csv.Length at
entry. Blank middle-of-stream lines surface as empty MemoryText with
return value true, and the engine's existing SkipRow logic skips them
(by default; or surfaces them as empty records if SkipRow is overridden).
This was a pre-existing bug in ReadLineOptimized and ReadFromMemory
preserved verbatim by #118; flagged by Gemini on PR #121 and filed
separately because the fix requires its own correctness analysis.
Adds three cross-path tests pinning the new behavior:
- blank middle-of-stream line skipped by default SkipRow
- blank line surfaces as an empty record when SkipRow is disabled
- trailing blank line terminates cleanly after the last real record
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the CSV parsing engine to handle blank lines more consistently. Key changes include modifying TryReadLine to return true for empty lines rather than terminating the read, and updating test utility methods to verify column existence before access. Three new test cases were added to validate the handling of blank lines in various scenarios, such as in the middle of a stream or at the end. I have no feedback to provide as there were no review comments to evaluate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MemorySliceLineSource.TryReadLineandMemoryReaderLineSource.TryReadLinereturnedfalseon blank middle-of-stream lines, terminating enumeration.Read/ReadAsSpan/ReadAsyncdon't have this bug becauseTextReader.ReadLinedistinguishes blank\"\"from EOFnull.TryReadLinenow returnsfalseonly whenposition >= csv.Lengthat entry. Blank lines surface as emptyMemoryTextwithreturn true, and the engine's existingSkipRowlogic handles them (default predicate skips empties; user-overridden predicate may surface them).Csv.Tests/EngineUnificationTests.cs.Resolves #122.
Repro on master
```csharp
var csv = "a,b,c\r\n1,2,3\r\n\r\n4,5,6\r\n";
foreach (var row in CsvReader.ReadFromMemoryOptimized(csv.AsMemory()))
Console.WriteLine(row.Raw);
// Before: just "1,2,3" -- the "4,5,6" record is silently dropped.
// After: "1,2,3" then "4,5,6".
```
Pre-existing
Not a regression from #118 — this behavior was present in the pre-#118 `ReadLineOptimized` and `MemoryText.ReadLine(ref position)` extension. #118 preserved it verbatim per its consolidation scope. Gemini's review on #121 flagged it; filed as #122 because the fix requires its own correctness analysis around `SkipRow` semantics for blank lines.
Test plan
🤖 Generated with Claude Code