Skip to content

[worker] Fix infinite requeuing of messages permanently rejected with 4xx errors (#14644)#14687

Merged
xfournet merged 2 commits intomasterfrom
claude/fix-worker-requeuing-issue
Mar 3, 2026
Merged

[worker] Fix infinite requeuing of messages permanently rejected with 4xx errors (#14644)#14687
xfournet merged 2 commits intomasterfrom
claude/fix-worker-requeuing-issue

Conversation

@Claude
Copy link
Contributor

@Claude Claude AI commented Mar 1, 2026

The worker indefinitely requeued messages rejected by remote servers with HTTP 400 errors (e.g., duplicate/obsolete STIX bundles), clogging the queue and causing repeated rejected API calls.

Changes

  • Differentiate permanent from temporary failures: 4xx client errors now result in nack (permanent rejection without requeue), while 5xx server errors continue to requeue
  • Add granular logging: Separate log messages for permanent rejections vs. temporary failures with status codes and response text
  • Remove blanket exception handling: Replace raise RequestException for all non-200/202 responses with status code range checking

Implementation

# Before: all non-success responses requeued
if response.status_code != 200 and response.status_code != 202:
    raise RequestException(response.status_code, response.text)

# After: distinguish permanent (4xx) from temporary (5xx+) errors
if response.status_code not in (200, 202):
    if 400 <= response.status_code < 500:
        # Permanent error: don't requeue
        return "nack"
    else:
        # Temporary error: requeue with backoff
        return "requeue"

Network errors and timeouts continue to requeue as expected for transient infrastructure issues.

Original prompt

This section details on the original issue you should resolve

<issue_title>Worker never gives up requeuing work items that are being rejected by the remote server</issue_title>
<issue_description>## Description

OpenAEV might reject a STIX bundle based on certain criterias, such as the modified attribute, which is a permanent error and OAEV return HTTP 400 to signify this.

However, the OCTI worker will insist in requeuing the already rejected message, only to be rejected again.

if response.status_code != 200 and response.status_code != 202:
raise RequestException(response.status_code, response.text)
return "ack"
except (RequestException, Timeout):
self.logger.error(
"Error executing listen handling, a connection error or timeout occurred"
)
# Platform is under heavy load: wait for unlock & retry almost indefinitely.
sleep_jitter = round(random.uniform(10, 30), 2)
time.sleep(sleep_jitter)
return "requeue"

Environment

OCTI master branch

Reproducible Steps

Steps to create the smallest reproducible scenario:

  1. Setup OCTIxOAEV interconnection (OAEV Coverage connector)
  2. Create a report in OCTI and add a Security Coverage (automated, connector)
  3. The scenario is created in OAEV and simulation might start
  4. Open the Enrichement menu (three dot menu next to the upper-right Update button)
  5. Trigger the connector a second time, which will resend the identical STIX bundle to OAEV

Expected Output

OAEV rejects the identical bundle (it's a guard in OAEV to not reprocess a Security Coverage base don the modified attribute, or md5 digest) and the worker acknowledges it, setting the work item as errored with reason.

Actual Output

Work item is permanently in the queue, and is also clogging the queue, preventing the normal course of operations in OCTI.
OAEV is constantly called with the rejected STIX bundle and keeps rejecting it.

2026-02-26T12:10:30.145+01:00 ERROR 2267235 --- [OpenAEV API] [0.0-8080-exec-6] io.openaev.api.stix_process.StixApi      : Parsing error while processing STIX bundle 
Error: The STIX package is obsolete because a newer version has already been computed.

Image</issue_description>

Comments on the Issue (you are @claude[agent] in this section)

@filigran-cla-bot
Copy link

filigran-cla-bot bot commented Mar 1, 2026

Contributor License Agreement

CLA signed 💚

Thank you @claude for signing the Contributor License Agreement! Your pull request can now be reviewed and merged.

We appreciate your contribution to Filigran's open source projects! ❤️

This is an automated message from the Filigran CLA Bot.

@SamuelHassine
Copy link
Member

/cla recheck

@filigran-cla-bot filigran-cla-bot bot removed the cla:pending CLA signature required label Mar 1, 2026
@filigran-cla-bot
Copy link

CLA recheck passed@claude is exempted from the CLA (exemption list).

@Claude Claude AI changed the title [WIP] Fix worker requeuing rejected work items from remote server [worker] Fix infinite requeuing of messages permanently rejected with 4xx errors Mar 1, 2026
@SamuelHassine SamuelHassine marked this pull request as ready for review March 1, 2026 06:11
@SamuelHassine SamuelHassine requested a review from xfournet March 1, 2026 06:11
@SamuelHassine SamuelHassine changed the title [worker] Fix infinite requeuing of messages permanently rejected with 4xx errors [worker] Fix infinite requeuing of messages permanently rejected with 4xx errors (#14644) Mar 1, 2026
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Thank you for your contribution, but we need you to sign your commits. Please see https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

1 similar comment
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Thank you for your contribution, but we need you to sign your commits. Please see https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 32.37%. Comparing base (1e2dad1) to head (93dcc9a).
⚠️ Report is 9 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #14687   +/-   ##
=======================================
  Coverage   32.37%   32.37%           
=======================================
  Files        3098     3098           
  Lines      211019   211019           
  Branches    38241    38241           
=======================================
  Hits        68310    68310           
  Misses     142709   142709           
Flag Coverage Δ
opencti-client-python 45.48% <ø> (ø)
opencti-front 2.83% <ø> (ø)
opencti-graphql 67.74% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@SamuelHassine SamuelHassine force-pushed the claude/fix-worker-requeuing-issue branch from f1baae4 to 0f67105 Compare March 2, 2026 11:49
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Please attach at least one issue to your Pull Request

1 similar comment
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Please attach at least one issue to your Pull Request

Claude AI and others added 2 commits March 2, 2026 09:48
Co-authored-by: SamuelHassine <1334279+SamuelHassine@users.noreply.github.com>
@SamuelHassine SamuelHassine force-pushed the claude/fix-worker-requeuing-issue branch from 0f67105 to 93dcc9a Compare March 2, 2026 14:48
@xfournet xfournet merged commit f7e084c into master Mar 3, 2026
36 checks passed
@xfournet xfournet deleted the claude/fix-worker-requeuing-issue branch March 3, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Worker never gives up requeuing work items that are being rejected by the remote server

4 participants