Fixes continuation row grouping and add strictness by vethman · Pull Request #10 · CeriosTesting/csv-nested-json

vethman · 2026-04-08T20:44:11Z

This pull request introduces important changes to the CSV parsing library to improve consistency between batch and streaming parsing, enforce stricter validation, and enhance memory safety. The main changes include aligning the grouping behavior of CsvStreamParser with CsvParser, enforcing strict validation of the identifierColumn, clarifying documentation, and adding a memory safeguard for continuation groups. Additionally, several development dependencies have been updated.

CSV Parsing Consistency and Validation:

CsvStreamParser now always emits nested grouped output, matching CsvParser continuation-row semantics. The previous nested option is removed to avoid divergence and ensure consistent grouping of continuation rows in both batch and streaming APIs. [1] [2] [3]
Enforced strict identifierColumn validation: if the configured identifier column is missing from headers, parsing throws a CsvParseError instead of continuing ambiguously. Additionally, a continuation row cannot start a group; if the first data row has an empty identifier, parsing throws CsvParseError. [1] [2]

Streaming Parser Improvements:

Added a maxContinuationGroupSize option (default: 10,000) to CsvStreamParser to prevent unbounded memory usage when identifier values are missing for long stretches. Exceeding this limit throws a CsvParseError. [1] [2] [3] [4]

Documentation Updates:

Updated the README to clarify the distinction between CsvParser.parseStream() (buffers entire stream in memory) and CsvStreamParser (true streaming, memory efficient, always groups continuation rows). Also clarified options such as includeColumns, excludeColumns, null handling, and the new maxContinuationGroupSize safeguard. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Bug Fixes and Internal Improvements:

Improved line splitting in CsvReader.splitLines() to handle custom quote characters and escaped quotes correctly. [1] [2] [3] [4]
Updated dev dependencies to latest versions for better compatibility and tooling. [1] [2]

These changes make the CSV parsing behavior more predictable, robust, and safe for large-scale streaming workloads.

Resolves critical issues where continuation rows (rows with empty identifier columns) were not properly grouped in CsvStreamParser, causing discrepancies between streaming and non-streaming parsing results. Changes CsvStreamParser to buffer continuation groups by default, matching CsvParser behavior. Adds maxContinuationGroupSize guard (default: 10000) to prevent unbounded memory growth when identifier values are missing. Improves identifierColumn validation to throw early when configured column doesn't exist after filtering/transformation, requiring transformed column names when headerTransformer or columnMapping are used. Fixes nested array handling in JsonToCsv to properly emit child continuation values under parent array items rather than at root level. Enhances documentation to clarify parseStream() buffers full content in memory, while CsvStreamParser provides true incremental processing with continuation grouping. Updates dependencies and fixes quote character handling in splitLines to properly track escaped quotes.

vethman self-assigned this Apr 8, 2026

ivo-rws approved these changes Apr 9, 2026

View reviewed changes

IvoCerios approved these changes Apr 9, 2026

View reviewed changes

vethman merged commit 82a3a8e into main Apr 9, 2026
4 checks passed

vethman deleted the strictness branch April 9, 2026 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes continuation row grouping and add strictness#10

Fixes continuation row grouping and add strictness#10
vethman merged 1 commit into
mainfrom
strictness

vethman commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

vethman commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants