Skip to content

[AMORO-4216] Refactor dangling-delete-files-cleaning via ProcessFactory plugin#4214

Merged
zhoujinsong merged 7 commits into
apache:masterfrom
zhangwl9:AMORO-DanglingDelete-optimize-dev
May 19, 2026
Merged

[AMORO-4216] Refactor dangling-delete-files-cleaning via ProcessFactory plugin#4214
zhoujinsong merged 7 commits into
apache:masterfrom
zhangwl9:AMORO-DanglingDelete-optimize-dev

Conversation

@zhangwl9

@zhangwl9 zhangwl9 commented May 14, 2026

Copy link
Copy Markdown
Contributor

Why are the changes needed?

Close #4216.

Brief change log

  • Replaced the old DanglingDeleteFilesCleaningExecutor with DanglingDeleteFilesCleaningProcess, a process-based implementation using TableMaintainerFactory.

  • Moved config from AmoroManagementConf/config.yaml to process-factories.yaml/execute-engines.yaml, and removed the DANGLING_DELETE_FILES_CLEANING enum value from CleanupOperation along with related cleanup methods in DefaultTableRuntime.

  • Deleted old scheduler tests and invalid configuration.

  • updated deployment.md with the new configuration.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions Bot added type:docs Improvements or additions to documentation module:ams-server Ams server module type:build module:common labels May 14, 2026
@zhangwl9 zhangwl9 changed the title [AMORO-4212] Refactor dangling-delete-files-cleaning via ProcessFactory plugin [AMORO-4216] Refactor dangling-delete-files-cleaning via ProcessFactory plugin May 14, 2026
张文领 added 2 commits May 15, 2026 09:12
# Conflicts:
#	amoro-ams/src/main/java/org/apache/amoro/server/AmoroServiceContainer.java
#	amoro-ams/src/main/java/org/apache/amoro/server/process/iceberg/IcebergProcessFactory.java
#	amoro-ams/src/main/java/org/apache/amoro/server/scheduler/inline/InlineTableExecutors.java
@zhangwl9 zhangwl9 force-pushed the AMORO-DanglingDelete-optimize-dev branch from 1a78771 to 88b9736 Compare May 15, 2026 01:12
@zhangwl9

Copy link
Copy Markdown
Contributor Author

@zhoujinsong cc

tableRuntime.updateState(
DefaultTableRuntime.CLEANUP_STATE_KEY,
cleanUp -> cleanUp.setLastDanglingDeleteFilesCleanTime(System.currentTimeMillis()));
} catch (Throwable t) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalExecutionEngine tracks process status via CompletableFuture.whenComplete: if run() throws, the process is marked FAILED; if it returns normally, it is marked SUCCESS.

By catching Throwable and swallowing it here, exceptions never propagate to the CompletableFuture, so the process status is always reported as SUCCESS even when the actual operation failed. This makes failure invisible to the framework and to operators.

The fix is to let exceptions propagate out of run(). The same issue exists in OrphanFilesCleaningProcess and should be fixed there as well.

@Override
public void run() {
  AmoroTable<?> amoroTable = tableRuntime.loadTable();
  TableMaintainer tableMaintainer = TableMaintainerFactory.create(amoroTable, tableRuntime);
  tableMaintainer.cleanDanglingDeleteFiles();
  tableRuntime.updateState(
      DefaultTableRuntime.CLEANUP_STATE_KEY,
      cleanUp -> cleanUp.setLastDanglingDeleteFilesCleanTime(System.currentTimeMillis()));
}

If a caught-and-logged pattern is intentional (e.g., to avoid flooding failure metrics for transient errors), consider wrapping in a RuntimeException so the framework still sees a failure:

} catch (Throwable t) {
  LOG.error("Unexpected dangling delete files cleaning error for table {}",
      tableRuntime.getTableIdentifier(), t);
  throw new RuntimeException(t);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixup

…ing updated to FAILED

Problem Analysis:
- In DanglingDeleteFilesCleaningProcess, SnapshotsExpiringProcess, and OrphanFilesCleaningProcess, exceptions in the run() method were caught but not re-thrown
- This caused LocalExecutionEngine.ProcessHolder.onComplete() to never detect failures
- Process status was always set to SUCCESS even when execution actually failed

Fix:
- Add  in the catch block to re-throw exceptions
- Keep existing logging for troubleshooting
- Ensure exceptions properly propagate to ProcessHolder so status is correctly updated to FAILED

Modified Files:
- amoro-ams/src/main/java/.../DanglingDeleteFilesCleaningProcess.java
- amoro-ams/src/main/java/.../SnapshotsExpiringProcess.java
- amoro-ams/src/main/java/.../OrphanFilesCleaningProcess.java
@zhangwl9 zhangwl9 force-pushed the AMORO-DanglingDelete-optimize-dev branch from 49a116a to 1996e60 Compare May 18, 2026 09:11
@codecov-commenter

codecov-commenter commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 52.17391% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 29.86%. Comparing base (75dd657) to head (38a2ae1).

Files with missing lines Patch % Lines
...ss/iceberg/DanglingDeleteFilesCleaningProcess.java 21.05% 15 Missing ⚠️
.../server/process/iceberg/IcebergProcessFactory.java 86.36% 0 Missing and 3 partials ⚠️
...rver/process/iceberg/SnapshotsExpiringProcess.java 0.00% 2 Missing ⚠️
...er/process/iceberg/OrphanFilesCleaningProcess.java 0.00% 1 Missing ⚠️
...src/main/java/org/apache/amoro/IcebergActions.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4214      +/-   ##
============================================
+ Coverage     29.80%   29.86%   +0.06%     
- Complexity     4284     4286       +2     
============================================
  Files           677      679       +2     
  Lines         54978    55054      +76     
  Branches       7013     7037      +24     
============================================
+ Hits          16385    16441      +56     
- Misses        37370    37386      +16     
- Partials       1223     1227       +4     
Flag Coverage Δ
core 29.86% <52.17%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhangwl9 zhangwl9 requested a review from zhoujinsong May 19, 2026 01:11

@zhoujinsong zhoujinsong left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhoujinsong zhoujinsong merged commit 0c5d3d8 into apache:master May 19, 2026
6 checks passed
czy006 pushed a commit that referenced this pull request May 20, 2026
…ry plugin (#4214)

* Refactor dangling-delete-files-cleaning via ProcessFactory plugin

# Conflicts:
#	amoro-ams/src/main/java/org/apache/amoro/server/AmoroServiceContainer.java
#	amoro-ams/src/main/java/org/apache/amoro/server/process/iceberg/IcebergProcessFactory.java
#	amoro-ams/src/main/java/org/apache/amoro/server/scheduler/inline/InlineTableExecutors.java

* Fix IcebergActions init failure by increasing MAX_NAME_LENGTH to 32

* rename action to clean-dangling-delete-files to fix pool tag mismatch

* Fix: LocalProcess exceptions were swallowed, preventing state from being updated to FAILED

Problem Analysis:
- In DanglingDeleteFilesCleaningProcess, SnapshotsExpiringProcess, and OrphanFilesCleaningProcess, exceptions in the run() method were caught but not re-thrown
- This caused LocalExecutionEngine.ProcessHolder.onComplete() to never detect failures
- Process status was always set to SUCCESS even when execution actually failed

Fix:
- Add  in the catch block to re-throw exceptions
- Keep existing logging for troubleshooting
- Ensure exceptions properly propagate to ProcessHolder so status is correctly updated to FAILED

Modified Files:
- amoro-ams/src/main/java/.../DanglingDeleteFilesCleaningProcess.java
- amoro-ams/src/main/java/.../SnapshotsExpiringProcess.java
- amoro-ams/src/main/java/.../OrphanFilesCleaningProcess.java

* Fix logging message for dangling delete files error

* Fix formatting of class documentation comment

* fixup style

---------

Co-authored-by: 张文领 <zhangwl9@chinatelecom.cn>
(cherry picked from commit 0c5d3d8)
@zhangwl9 zhangwl9 deleted the AMORO-DanglingDelete-optimize-dev branch May 22, 2026 03:16
j1wonpark pushed a commit to j1wonpark/amoro that referenced this pull request Jun 4, 2026
upstream 11커밋 동기화: AMS startup crash fix(apache#4224 AMORO-4223),
snapshot/data/orphan/dangling cleaning ProcessFactory 리팩터(apache#4226/apache#4218/apache#4209/apache#4214),
JUnit5 마이그레이션(apache#4199/apache#4204) 등. 사내 고유 JP CI(ts-ci-jp.yml: JDK11, tag-base) 보존.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-server Ams server module module:common type:build type:docs Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask]: Refactor dangling-delete-files-cleaning via ProcessFactory plugin

3 participants