Skip to content

Where is the best layer to add throttling to avoid being marked for abuse? #431

@dmitshur

Description

@dmitshur

GitHub API has some guidelines to follow when accessing their API to avoid hitting abuse rate limit:

https://developer.github.com/guides/best-practices-for-integrators/#dealing-with-abuse-rate-limits

An application should follow those rules. Some of them can be enforced in a general/automated way via throttling, I think, and so they can be factored out into a library (rather than having to be done manually inside the application layer). Specifically:

  1. Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.

    If we can assume a single *github.Client maps to a single user or client ID, then this can be done by using a mutex or semaphore or similar.

  2. If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.

    Again, if we can assume a single *github.Client maps to a single user or client ID, then this can be done by keeping track of the last POST, PATCH, PUT, or DELETE request time, and sleeping if neccessary to make the next one happen at least one second later.

  3. When you have been limited, wait the number of seconds specified in the Retry-After response header.

    We can detect abuse rate limit response (it's documented here), read the number of seconds specified in the Retry-After response header, and not allow making network requests until that time has passed (probably returning *RateLimitError error, or an abuse-specific version thereof, right away for all requests).

I see two places where this can be done:

  1. Possibly inside go-github library. Inside Client itself. Probably as a configurable option.
  2. Or outside of this library. As a higher-level wrapper layer around go-github library.

So far, go-github has been a relatively "low-level" API client, meaning it helps you out with type safety and provides structured output, but aside from that, it maps relatively closely to 1:1 to making HTTP requests to GitHub API.

It has taken some steps to be a smarter/higher-level client where it could be done so transparently, without blocking. For example, after #277, RateLimitError type was added, and after #347, go-github is smart enough to track rate limit reset time and not make network calls when the rate limit is known to still be exceeded, returning RateLimitError right away.

So far it has also been a "return-right-away" API, where none of the calls are artificially throttled on the client side, they just return an error if there's a rate limit problem. So it seems adding throttling to the go-github client itself may be out of place, and would belong in a higher-level wrapper around go-github.

On the other hand, if done as an option that can be controlled, perhaps this is the best place to do these things, since it means more people can use the GitHub API correctly with less work from their side, by default. The user of the github client could make calls concurrently, but they would block and get serialized by the client as to follow the GH API guidelines. It would effectively shift some of the throttling that would otherwise be done on GitHub server side, to the go-github library client side.

@willnorris, @gmlewis, what are your thoughts on this? Where is the best place for this functionality?

Related issues: #152, #153, #277, #347, #304.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions