Skip to content

feat: Add retry mechanism with exponential backoff for transient GitHub API failures #1070

Description

@myakove

Summary

Add a retry mechanism with exponential backoff for transient GitHub API failures (HTTP 500/502/503) to prevent webhook processing failures during temporary GitHub outages.

Problem / Motivation

The webhook server currently has no retry logic for transient GitHub API failures. When GitHub's API experiences temporary issues, all API calls fail immediately after urllib3's built-in retries exhaust (which happens very quickly with only ~3 retries and no meaningful backoff).

This was observed in production on the RedHat webhook server for the mtv-api-tests repository where multiple check_run and pull_request webhooks failed with:

HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/RedHatQE/mtv-api-tests/pulls/420/requested_reviewers (Caused by ResponseError('too many 500 error responses'))

Multiple operations failed in a cascade: label management, assignee assignment, reviewer requests, and Compare API calls.

Requirements

  1. Add tenacity library as a dependency for retry logic
  2. Create a utility wrapper function (e.g., github_api_call()) that wraps asyncio.to_thread() calls with retry + exponential backoff
  3. Retry ONLY on transient errors: HTTP 500, 502, 503, ConnectionError, MaxRetryError, ResponseError
  4. Do NOT retry on permanent errors: 401 (Unauthorized), 403 (Forbidden), 404 (Not Found), 422 (Validation)
  5. Use exponential backoff: e.g., 2s → 4s → 8s → 16s (max ~4 retries, ~30s total)
  6. Log each retry attempt with warning level
  7. Replace raw asyncio.to_thread() calls across handlers with the new retry wrapper

Deliverables

  • Add tenacity to pyproject.toml dependencies
  • Create retry utility in webhook_server/utils/github_retry.py
  • Replace asyncio.to_thread() calls in webhook_server/libs/github_api.py with retry wrapper
  • Replace asyncio.to_thread() calls in handler files (labels_handler.py, pull_request_handler.py, issue_comment_handler.py, owners_files_handler.py, check_run_handler.py, pull_request_review_handler.py, runner_handler.py) with retry wrapper
  • Add unit tests for the retry utility
  • Ensure all existing tests pass
  • Verify mypy type checking passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions