[persistence] improve time mock for quartz scheduler by cyrilou242 · Pull Request #1787 · startreedata/thirdeye

cyrilou242 · 2025-02-06T12:33:45Z

This PR makes the behavior of the QuartzScheduler simpler to understand when the time is mocked in e2e tests.

Problem

When the time is mocked, time can jump and the quartz scheduler can miss cron triggers.
See

https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/tutorial-lesson-04.html#:~:text=A%20misfire%20occurs%20if%20a,pool%20for%20executing%20the%20job.
https://nurkiewicz.com/2012/04/quartz-scheduler-misfire-instructions.html and
read the code of org.quartz.simpl.RamJobStore, org.quartz.core.QuartzSchedulerThread and org.quartz.impl.triggers.CronTriggerImpl

Previously the logic applied for the cron trigger schedules created here:
was MISFIRE_INSTRUCTION_SMART_POLICY, which resulted in MISFIRE_INSTRUCTION_FIRE_ONCE_NOW in CronTriggerImpl.updateAfterMisfire

@Override
public void updateAfterMisfire(org.quartz.Calendar cal) {
    int instr = getMisfireInstruction();

    if(instr == Trigger.MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY)
        return;

    if (instr == MISFIRE_INSTRUCTION_SMART_POLICY) {
        instr = MISFIRE_INSTRUCTION_FIRE_ONCE_NOW;
    }

    if (instr == MISFIRE_INSTRUCTION_DO_NOTHING) {
        Date newFireTime = getFireTimeAfter(new Date());
        while (newFireTime != null && cal != null
                && !cal.isTimeIncluded(newFireTime.getTime())) {
            newFireTime = getFireTimeAfter(newFireTime);
        }
        setNextFireTime(newFireTime);
    } else if (instr == MISFIRE_INSTRUCTION_FIRE_ONCE_NOW) {
        setNextFireTime(new Date());
    }
}

This means jobs were created with a scheduled fire time equal to the mocked time, which would not necessarily correspond to a correct cron trigger time.
This caused issue in the simulation because when a task is created, the endTime is obtained from the scheduled fire time and it is assumed this time is a valid cron time.
See

thirdeye/thirdeye-scheduler/src/main/java/ai/startree/thirdeye/scheduler/job/DetectionPipelineJob.java

Line 74 in 0756d06

final long endTime = ctx.getScheduledFireTime().getTime();

.

Change

The behavior of the function above updateAfterMisfire is overridden with aspectJ to behave like MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY.
The other implementation option was to introduce a configuration knob in the server and add .withMisfireHandlingInstructionIgnoreMisfires() to the code here

thirdeye/thirdeye-scheduler/src/main/java/ai/startree/thirdeye/scheduler/TaskCronSchedulerRunnable.java

Line 270 in 713d3b4

.cronSchedule(cron)

but introducing some hard to understand logic in the public config API did not seem like a good idea.
This may be changed in the future if need be, changing from one solution to the other is simple.

The behavior of the QuartzScheduler is now simpler to understand when time is mocked:

if a cron is supposed to run everyday, and the time is jumped by 3 days, then 3 triggers will happen, and the triggers schedule time will correspond to the 3 expected cron times.
note that in ThirdEye context, this does not mean 3 DETECTION/NOTIFICATION tasks will be created --> in the quartz job, if a task is already created and in waiting /running state, backpressure is applied and task creation is skipped. See example here:

thirdeye/thirdeye-scheduler/src/main/java/ai/startree/thirdeye/scheduler/job/NotificationPipelineJob.java

Line 65 in 0756d06

if (taskManager.isAlreadyInQueue(jobName)) {

Because the triggers will run at roughly the same time, this is very likely to happen, so if the output of tasks or the number of tasks is important, best is to increase time 1 cron trigger at a time.

The AnomalyResolutionTest are fixed: they were incorrect, flaky and exploiting the issue described above.

Time mock changes:

thirdeye-persistence/src/test/resources/META-INF/aop.xml
thirdeye-persistence/src/test/java/ai/startree/thirdeye/aspect/CronTriggerImplAspect.java

other changes are test fixes and improvements

vercel · 2025-02-06T12:33:46Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
thirdeye	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Feb 7, 2025 10:16am

[persistence] improve time mock for quartz scheduler

6dc6365

vercel bot deployed to Preview February 6, 2025 12:37 View deployment

fix flaky anomaly resolution test

5c7599f

vercel bot deployed to Preview February 7, 2025 10:16 View deployment

cyrilou242 requested a review from anshul98ks123 February 7, 2025 10:43

cyrilou242 merged commit 8b7d79c into master Feb 7, 2025
15 of 19 checks passed

cyrilou242 deleted the te-xx-improve-time-mock branch February 7, 2025 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[persistence] improve time mock for quartz scheduler#1787

[persistence] improve time mock for quartz scheduler#1787
cyrilou242 merged 2 commits intomasterfrom
te-xx-improve-time-mock

cyrilou242 commented Feb 6, 2025 •

edited

Loading

Uh oh!

vercel bot commented Feb 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cyrilou242 commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change

Uh oh!

vercel bot commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cyrilou242 commented Feb 6, 2025 •

edited

Loading

vercel bot commented Feb 6, 2025 •

edited

Loading