Page MenuHomePhabricator

Composer activity from Labs hosts can be rate limited by GitHub
Open, LowPublic

Description

Issues like T106339 are caused by GitHub's API rate limiting. The limit for anon access to the GitHub api is 60 requests per IP per hour.

The current rate limit can be checked with curl:

$ curl -sD - https://api.github.com/rate_limit | grep '^X-RateLimit'
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 60
X-RateLimit-Reset: 1437527145

The generic solution to this issue would be to document the process of creating a GitHub auth token and configuring Composer to use it. A possibly nicer solution would be to talk to GitHub and see if we can get a higher limit for the IP address range that is seen by GitHub as the origin of Labs Composer requests.

Event Timeline

bd808 raised the priority of this task from to Low.
bd808 updated the task description. (Show Details)
Restricted Application added a project: Cloud-Services. · View Herald TranscriptJul 22 2015, 12:09 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We clearly need someone with the word 'Manager' in their title to ask GitHub.

Jdlrobson added a comment.EditedJul 22 2015, 1:00 AM

@dr0ptp4kt this would be a great use of twitter :)

Jdlrobson set Security to None.Jul 22 2015, 1:01 AM
Jdlrobson added a subscriber: dr0ptp4kt.

We clearly need someone with the word 'Manager' in their title to ask GitHub.

What are the CIDRs for our outbound IPv4 traffic? I can certainly send an email or two.

Here's an example of the rate limit being exhausted:

$ date; curl -sD - https://api.github.com/rate_limit | grep '^X-RateLimit'
Wed Jul 22 15:52:55 UTC 2015
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1437580844

$ date -d '@1437580844'
Wed Jul 22 16:00:44 UTC 2015

$ curl https://api.ipify.org
208.80.155.255

@dr0ptp4kt this would be a great use of twitter :)

@Jdlrobson, got some proposed text?

@bd808 208.80.155.128/25 is the labs CIDR

I sent this to GitHub:

Before I get to whining and asking for favors, thanks for making such a great platform for the FLOSS community.
The Wikimedia Foundation [0] runs a hosting platform for volunteer created projects related to Wikipedia and the rest of the Wikimedia movement projects [1] called Wikimedia Labs [2]. Many of the projects within this hosting environment are PHP applications that make use of Composer. As use of Composer has increased we have had more and more reports of people hitting the GitHub anonymous API rate limit of 60 requests per IP per hour [3].
The outbound addresses of these virtual machines map the the 208.80.155.128/25 CIDR block. Before we try to do something more heroic in the way of end-user education and work arounds, I thought it would be worthwhile to see if it is possible for GitHub to raise the anon API rate limit [4] for requests originating in this particular IP range.
Here's an example of the rate limit being exhausted:

$ date; curl -sD - https://api.github.com/rate_limit | grep '^X-RateLimit'
Wed Jul 22 15:52:55 UTC 2015
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1437580844
$ date -d '@1437580844'
Wed Jul 22 16:00:44 UTC 2015
$ curl https://api.ipify.org
208.80.155.255

[0]: https://wikimediafoundation.org/wiki/Home
[1]: https://wikimediafoundation.org/wiki/Our_projects
[2]: https://wikitech.wikimedia.org/wiki/Help:FAQ
[3]: https://phabricator.wikimedia.org/T106339
[4]: https://developer.github.com/v3/#rate-limiting
Thanks,
Bryan

I'll report back when I hear from them.

bd808 removed bd808 as the assignee of this task.Jul 24 2015, 3:23 PM

GitHub does not have a whitelist capability for anonymous API requests. They suggest that we use authenticated requests to bypass the limitation.

Hi Bryan,
Thanks for reaching out and for the kind words. <3
Currently, we can't offer permanent rate limit increases, especially for anonymous requests, even for a single IP address. The current rate limits help us keep the API fast and reliable for all our users.
However, as you noticed, there's an easy way to get around that 60 reqs/hour limit (which is per IP address) and get a much better 5000 reqs/hour limit (which is per user), and that is by making authenticated requests.
https://developer.github.com/v3/#rate-limiting
Anonymous (unauthenticated) requests are good for drive-by testing but shouldn't be used in production-type environments because you'll hit the rate limit quickly. Most tools, including Composer, which depend on GitHub allow you to pass it a token (which every user can create at https://github.com/settings/tokens/new) which the tool can then use to make authenticated requests. I recommend you consider the same approach for those projects. GitHub accounts are free and easy to create, so there's no roadblock there if you want a better rate limit.
There are some other ways to optimize rate limit usage, such as making conditional requests [1] and using webhooks instead of polling the API [2], so that's something to look into as well.
I hope these notes are helpful and let me know if you have any other questions.
Cheers,
Ivan
[1] https://developer.github.com/v3/#conditional-requests
[2] https://developer.github.com/webhooks/

One thing we could do to help with this would be to add some MediaWiki-Vagrant plugin tooling to prompt the user for a GitHub OAuth token similarly to the prompt we give for Gerrit credentials and then store it in the VM's Composer config file -- composer config -g github-oauth.github.com <oauthtoken>