Skip to content

Parquet: Skip parquet conversion for blocks with too many labels#7524

Open
siddarth2810 wants to merge 9 commits into
cortexproject:masterfrom
siddarth2810:add-no-convert-marker
Open

Parquet: Skip parquet conversion for blocks with too many labels#7524
siddarth2810 wants to merge 9 commits into
cortexproject:masterfrom
siddarth2810:add-no-convert-marker

Conversation

@siddarth2810
Copy link
Copy Markdown
Contributor

@siddarth2810 siddarth2810 commented May 18, 2026

What this PR does:
If a TSDB block exceeds a configurable threshold of distinct label names, the converter writes a parquet-no-convert-mark.json marker and skips the block.

  • Added no-convert marker with read/write logic
  • Added paruqet-converter.max-block-label-names limit

Which issue(s) this PR fixes:
Fixes #7195

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
- Add max-block-label-names limit, blocks exceeding it get a
  no-convert marker instead of being converted.

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
…correctly

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
…test

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
@siddarth2810 siddarth2810 marked this pull request as ready for review May 18, 2026 12:15
@dosubot dosubot Bot added go Pull requests that update Go code storage/blocks Blocks storage engine type/feature labels May 18, 2026
Copy link
Copy Markdown
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is experimental, we need to update docs/configuration/v1-guarantees.md too

Comment thread pkg/parquetconverter/converter.go Outdated
}
continue
}
level.Info(logger).Log("msg", "skipping parquet conversion for block with too many label names", "block", b.ULID.String(), "label_names", labelNamesCount, "limit", maxBlockLabelNames)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a debug log.
and pair it with a metric for skipped blocks like
c.metrics.skippedBlocks.WithLabelValues(userID, "too_many_labels").Inc()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll add a new metric for keeping track of skipped blocks. Thanks a lot :)

func WriteNoConvertMark(ctx context.Context, id ulid.ULID, userBkt objstore.Bucket, labelNamesCount int, maxBlockLabelNames int) error {
noConvertMarker := NoConvertMark{
Version: CurrentNoConvertMarkVersion,
Reason: "too_many_labels",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make "too_many_labels" a constant that you use everywhere. You have a couple places

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'll change this

- Add a new cortex_parquet_converter_blocks_skipped_total counter with user and reason labels
- Extract "too_many_labels" to a constant to avoid string duplication

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update Go code size/L storage/blocks Blocks storage engine type/feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet] Stop converting TSDB block to parquet if it has too many labels

2 participants