Skip to content

Flink: Maintenance - TriggerManager#10484

Merged
pvary merged 14 commits into
apache:mainfrom
pvary:trigger
Aug 22, 2024
Merged

Flink: Maintenance - TriggerManager#10484
pvary merged 14 commits into
apache:mainfrom
pvary:trigger

Conversation

@pvary
Copy link
Copy Markdown
Contributor

@pvary pvary commented Jun 11, 2024

The responsibility of the Trigger Manager is to start the Maintenance Tasks based on the incoming Table Change messages and prevent overlapping Maintenance Task runs. The event time of the Trigger messages should be monotonically increasing, as this will be used as a watermark to separate Maintenance Task runs.

Implementation for #10301

This will be used for Maintenance Trigger Manager as described in the design doc. The Trigger Manager paragraph describes the requirements in more detail.

Comment thread core/src/main/java/org/apache/iceberg/SerializableTable.java Outdated
@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Jul 29, 2024

Hi @stevenzwu,

I would like to move forward with this PR.
I see the following open questions:

  1. Lock implementation - I still think it is important to provide a working implementation without adding a requirement for external systems. For this the Tag based solution is working for every Iceberg deployment and there were no objections or better suggestions for it during the discussion on the mailing list (https://lists.apache.org/thread/vjf8m5wg840o58yz4y3q35k2mfhbm49l). Also tried to ask help from the Flink mailing list for alternative solutions, but I got a single one offline which turned out a dead end (https://lists.apache.org/thread/3ss83w1hp8wpocx13kcqr7qyjnghvnc8). - I propose to remove the JVM based lock, and move forward with the Tag based lock
  2. TableChange parameters - I am open to discussing the list of parameters which will drive the triggers. It will touch a big part of the previous PR as well, and will make this change even harder to review/read. - I propose to do it in the next PR
  3. There were some questions about the overall architecture. This is described in the original proposal doc here. - I'm open to discuss the proposal offline too, if that is needed.

WDYT @stevenzwu? Could we move forward along these lines, or do you have another proposal? Do you have any foundational concerns about the PR?

Thanks,
Peter

@github-actions github-actions Bot added the build label Aug 6, 2024
Comment on lines +34 to +62
interface Lock {
/**
* Tries to acquire a lock with a given key. Anyone already holding a lock would prevent
* acquiring this lock. Not reentrant.
*
* <p>Called by {@link TriggerManager}. Implementations could assume that are no concurrent
* calls for this method.
*
* @return <code>true</code> if the lock is acquired by this job, <code>false</code> if the lock
* is already held by someone
*/
boolean tryLock();

/**
* Checks if the lock is already taken.
*
* @return <code>true</code> if the lock is held by someone
*/
boolean isHeld();

// TODO: Fix the link to the LockRemover when we have a final name and implementation
/**
* Releases the lock. Should not fail if the lock is not held by anyone.
*
* <p>Called by LockRemover. Implementations could assume that are no concurrent calls for this
* method.
*/
void unlock();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are creating new interfaces rather than enhancing the existing ones ?
https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/LockManager.java

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice the different requirements for the methods. This lock is not reentrant, and the unlock removes locks held by others.

@pvary pvary merged commit b2cd6f3 into apache:main Aug 22, 2024
@pvary pvary deleted the trigger branch August 22, 2024 14:19
@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Aug 22, 2024

Merged to master.
Thanks for the review @stevenzwu, @singhpk234 and @gyfora!

pvary pushed a commit to pvary/iceberg that referenced this pull request Aug 22, 2024
pvary added a commit that referenced this pull request Aug 23, 2024
pvary pushed a commit to pvary/iceberg that referenced this pull request Sep 12, 2024
pvary added a commit that referenced this pull request Sep 12, 2024
throw new UncheckedInterruptedException(e, "Interrupted during unlock");
} catch (UncheckedSQLException e) {
throw e;
} catch (SQLException e) {
Copy link
Copy Markdown

@tedyu tedyu Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With SQLException caught on line 257, it seems this clause is not reachable.
For instanceId(), SQLException is caught inside the method as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pool.run(..) method could throw SQLException. For example on reconnect.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the reconnect method in core/src/main/java/org/apache/iceberg/jdbc/JdbcClientPool.java.

It seems SQLException is caught there as well.

BTW thanks for the quick response.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JdbcClientPool extends ClientPoolImpl<Connection, SQLException>. The ClientPoolImpl defines run like this:

  @Override
  public <R> R run(Action<R, C, E> action) throws E, InterruptedException {
    return run(action, retryByDefault);
  }

So we either need to declare SQLException in throws, or we need to handle the exception.

List<TriggerEvaluator> evaluators,
long minFireDelayMs,
long lockCheckDelayMs) {
Preconditions.checkNotNull(tableLoader, "Table loader should no be null");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no be null -> not be null

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants