Skip to content

[persistence] improve time mock for quartz scheduler#1787

Merged
cyrilou242 merged 2 commits intomasterfrom
te-xx-improve-time-mock
Feb 7, 2025
Merged

[persistence] improve time mock for quartz scheduler#1787
cyrilou242 merged 2 commits intomasterfrom
te-xx-improve-time-mock

Conversation

@cyrilou242
Copy link
Contributor

@cyrilou242 cyrilou242 commented Feb 6, 2025

This PR makes the behavior of the QuartzScheduler simpler to understand when the time is mocked in e2e tests.

Problem

When the time is mocked, time can jump and the quartz scheduler can miss cron triggers.
See

Previously the logic applied for the cron trigger schedules created here:
was MISFIRE_INSTRUCTION_SMART_POLICY, which resulted in MISFIRE_INSTRUCTION_FIRE_ONCE_NOW in CronTriggerImpl.updateAfterMisfire

@Override
public void updateAfterMisfire(org.quartz.Calendar cal) {
    int instr = getMisfireInstruction();

    if(instr == Trigger.MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY)
        return;

    if (instr == MISFIRE_INSTRUCTION_SMART_POLICY) {
        instr = MISFIRE_INSTRUCTION_FIRE_ONCE_NOW;
    }

    if (instr == MISFIRE_INSTRUCTION_DO_NOTHING) {
        Date newFireTime = getFireTimeAfter(new Date());
        while (newFireTime != null && cal != null
                && !cal.isTimeIncluded(newFireTime.getTime())) {
            newFireTime = getFireTimeAfter(newFireTime);
        }
        setNextFireTime(newFireTime);
    } else if (instr == MISFIRE_INSTRUCTION_FIRE_ONCE_NOW) {
        setNextFireTime(new Date());
    }
}

This means jobs were created with a scheduled fire time equal to the mocked time, which would not necessarily correspond to a correct cron trigger time.
This caused issue in the simulation because when a task is created, the endTime is obtained from the scheduled fire time and it is assumed this time is a valid cron time.
See

final long endTime = ctx.getScheduledFireTime().getTime();
.

Change

The behavior of the function above updateAfterMisfire is overridden with aspectJ to behave like MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY.
The other implementation option was to introduce a configuration knob in the server and add .withMisfireHandlingInstructionIgnoreMisfires() to the code here


but introducing some hard to understand logic in the public config API did not seem like a good idea.
This may be changed in the future if need be, changing from one solution to the other is simple.

The behavior of the QuartzScheduler is now simpler to understand when time is mocked:

  • if a cron is supposed to run everyday, and the time is jumped by 3 days, then 3 triggers will happen, and the triggers schedule time will correspond to the 3 expected cron times.
  • note that in ThirdEye context, this does not mean 3 DETECTION/NOTIFICATION tasks will be created --> in the quartz job, if a task is already created and in waiting /running state, backpressure is applied and task creation is skipped. See example here:
    Because the triggers will run at roughly the same time, this is very likely to happen, so if the output of tasks or the number of tasks is important, best is to increase time 1 cron trigger at a time.

The AnomalyResolutionTest are fixed: they were incorrect, flaky and exploiting the issue described above.

Time mock changes:

  • thirdeye-persistence/src/test/resources/META-INF/aop.xml
  • thirdeye-persistence/src/test/java/ai/startree/thirdeye/aspect/CronTriggerImplAspect.java

other changes are test fixes and improvements

@vercel
Copy link

vercel bot commented Feb 6, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
thirdeye ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 7, 2025 10:16am

@cyrilou242 cyrilou242 merged commit 8b7d79c into master Feb 7, 2025
15 of 19 checks passed
@cyrilou242 cyrilou242 deleted the te-xx-improve-time-mock branch February 7, 2025 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant