Skip to content

[fix](be) Apply scanner v2 load counter fixes#64871

Merged
Gabriel39 merged 1 commit into
apache:refact_reader_branchfrom
Gabriel39:fix_0626
Jun 26, 2026
Merged

[fix](be) Apply scanner v2 load counter fixes#64871
Gabriel39 merged 1 commit into
apache:refact_reader_branchfrom
Gabriel39:fix_0626

Conversation

@Gabriel39

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: #63781, #64671

Problem Summary: File scanner v2 did not carry the same fixes as the existing file scanner path. Predicate rows filtered inside v2 file readers were still reported through scanner load counters unless the scanner was a real load source, and Hive TEXTFILE empty physical lines were still skipped unless read_csv_empty_line_as_null was enabled. This change gates v2 load counter reporting with the same FILE_STREAM exception used by FileScanner and adds a delimited text hook so Hive Text v2 treats empty physical lines as records while CSV keeps the old default behavior.

Release note

Fix file scanner v2 load counter reporting and Hive TEXTFILE empty-line handling.

Check List (For Author)

  • Test: Unit Test / Manual test
    • Added TextV2ReaderTest coverage for Hive TEXTFILE empty line records, single-column empty string fields, and COUNT pushdown.
    • Ran git diff --check.
    • Ran clang-format v16 through build-support/run_clang_format.py for changed files.
    • Attempted ./run-be-ut.sh --run --filter='TextV2ReaderTest.:FileScannerV2Test.', but the local run was blocked because the script needed to update/download datasketches-cpp and network access was unavailable; no BE UT binary was already built.
    • Attempted clang-tidy with the available compile_commands.json, but it pointed at a stale /mnt/disk3/gabriel path; the project clang-tidy wrapper also requires bash 4+ while only system bash is available.
  • Behavior changed: Yes. File scanner v2 now matches v1 load counter gating and Hive TEXTFILE empty-line semantics.
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: None

Related PR: apache#63781, apache#64671

Problem Summary: File scanner v2 did not carry the same fixes as the existing file scanner path. Predicate rows filtered inside v2 file readers were still reported through scanner load counters unless the scanner was a real load source, and Hive TEXTFILE empty physical lines were still skipped unless read_csv_empty_line_as_null was enabled. This change gates v2 load counter reporting with the same FILE_STREAM exception used by FileScanner and adds a delimited text hook so Hive Text v2 treats empty physical lines as records while CSV keeps the old default behavior.

### Release note

Fix file scanner v2 load counter reporting and Hive TEXTFILE empty-line handling.

### Check List (For Author)

- Test: Unit Test / Manual test
    - Added TextV2ReaderTest coverage for Hive TEXTFILE empty line records, single-column empty string fields, and COUNT pushdown.
    - Ran git diff --check.
    - Ran clang-format v16 through build-support/run_clang_format.py for changed files.
    - Attempted ./run-be-ut.sh --run --filter='TextV2ReaderTest.*:FileScannerV2Test.*', but the local run was blocked because the script needed to update/download datasketches-cpp and network access was unavailable; no BE UT binary was already built.
    - Attempted clang-tidy with the available compile_commands.json, but it pointed at a stale /mnt/disk3/gabriel path; the project clang-tidy wrapper also requires bash 4+ while only system bash is available.
- Behavior changed: Yes. File scanner v2 now matches v1 load counter gating and Hive TEXTFILE empty-line semantics.
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39 Gabriel39 merged commit 86ab3d9 into apache:refact_reader_branch Jun 26, 2026
10 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants