Skip to content

TDL-22921 Fix the api limit error#190

Merged
sgandhi1311 merged 7 commits into
masterfrom
bugfix-api-limit-error
May 15, 2023
Merged

TDL-22921 Fix the api limit error#190
sgandhi1311 merged 7 commits into
masterfrom
bugfix-api-limit-error

Conversation

@sgandhi1311
Copy link
Copy Markdown
Member

Description of change

GitHub has a 5000 API requests limit per hour, once the API request rate limit is reached the TAP throws the below exception -

2023-05-11 10:14:56,025Z    tap - CRITICAL API rate limit exceeded, please try after 1854 seconds.
2023-05-11 10:14:56,026Z    tap - Traceback (most recent call last):
2023-05-11 10:14:56,026Z    tap -   File "tap-env/bin/tap-github", line 33, in <module>
2023-05-11 10:14:56,026Z    tap -     sys.exit(load_entry_point('tap-github==2.0.1', 'console_scripts', 'tap-github')())
2023-05-11 10:14:56,026Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/singer/utils.py", line 229, in wrapped
2023-05-11 10:14:56,026Z    tap -     return fnc(*args, **kwargs)
2023-05-11 10:14:56,026Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/tap_github/__init__.py", line 39, in main
2023-05-11 10:14:56,026Z    tap -     _sync(client, config, state, catalog)
2023-05-11 10:14:56,026Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/tap_github/sync.py", line 204, in sync
2023-05-11 10:14:56,026Z    tap -     do_sync(catalog, streams_to_sync_for_repos, selected_stream_ids, client, start_date, state, repo)
2023-05-11 10:14:56,026Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/tap_github/sync.py", line 232, in do_sync
2023-05-11 10:14:56,026Z    tap -     stream_to_sync = streams_to_sync
2023-05-11 10:14:56,026Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/tap_github/streams.py", line 434, in sync_endpoint
2023-05-11 10:14:56,026Z    tap -     parent_record = record)
2023-05-11 10:14:56,026Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/tap_github/streams.py", line 159, in get_child_records
2023-05-11 10:14:56,026Z    tap -     stream = child_object.tap_stream_id
2023-05-11 10:14:56,026Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/tap_github/client.py", line 220, in authed_get_all_pages
2023-05-11 10:14:56,027Z    tap -     r = self.authed_get(source, url, headers, stream, should_skip_404)
2023-05-11 10:14:56,027Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/backoff/_sync.py", line 94, in retry
2023-05-11 10:14:56,027Z    tap -     ret = target(*args, **kwargs)
2023-05-11 10:14:56,027Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/tap_github/client.py", line 205, in authed_get
2023-05-11 10:14:56,027Z    tap -     rate_throttling(resp, self.max_sleep_seconds)
2023-05-11 10:14:56,027Z    tap -   File "/code/orchestrator/tap-env/lib/python3.5/site-packages/tap_github/client.py", line 149, in rate_throttling
2023-05-11 10:14:56,027Z    tap -     raise RateLimitExceeded(message) from None
2023-05-11 10:14:56,027Z    tap - tap_github.client.RateLimitExceeded: API rate limit exceeded, please try after 1854 seconds.
2023-05-11 10:14:56,066Z target - INFO Serializing batch with 27 messages for table pull_requests
2023-05-11 10:14:56,071Z target - INFO Sending batch of 500162 bytes to https://api.stitchdata.com/v2/import/batch
2023-05-11 10:14:56,073Z target - INFO replicated 27 records from "pull_requests" endpoint
2023-05-11 10:14:56,073Z target - INFO Serializing batch with 58 messages for table reviews
2023-05-11 10:14:56,075Z target - INFO Sending batch of 109970 bytes to https://api.stitchdata.com/v2/import/batch
2023-05-11 10:14:56,076Z target - INFO replicated 58 records from "reviews" endpoint
2023-05-11 10:14:56,076Z target - INFO Serializing batch with 545 messages for table pr_commits
2023-05-11 10:14:56,100Z target - INFO Sending batch of 2463071 bytes to https://api.stitchdata.com/v2/import/batch
2023-05-11 10:14:56,105Z target - INFO replicated 545 records from "pr_commits" endpoint
2023-05-11 10:14:56,730Z target - INFO Requests complete, stopping loop
2023-05-11 10:14:56,774Z   main - INFO Target exited normally with status 0
2023-05-11 10:14:56,777Z   main - INFO No tunnel subprocess to tear down
2023-05-11 10:14:56,777Z   main - INFO Exit status is: Discovery succeeded. Tap failed with code 1 and error message: "API rate limit exceeded, please try after 1854 seconds.". Target succeeded.

To resolve the issue, make the tap sleep for X-RateLimit-Reset + 2 (buffer) and recursively call the auth_get function to continue the execution.
Remove the config property - max_sleep_seconds as it does not add any significance within the code. (These property is already missing from the connection service platform integration properties table.)

Manual QA steps

  • Reached the maximum API rate limit and observe the tap sleep time.

Risks

Rollback steps

  • revert this branch

@sgandhi1311 sgandhi1311 merged commit 7013274 into master May 15, 2023
AJWurts pushed a commit to villagelabsco/tap-github that referenced this pull request Oct 24, 2024
* to avoid api rate limit error, tap will sleep for the seconds mentioned in header - X-RateLimit-Remaining

* recursively call the function(afterwards) if the tap is paused for sometime.

* fix the existing unit tests

* fixed pylint issue

* update comments

* setup and changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants