Page MenuHomePhabricator

[Bug] github.com is 403ing downloads from Wikimedia CI during composer update
Closed, ResolvedPublic

Description

github.com is 403ing downloads from Wikimedia CI during composer update

Example from https://integration.wikimedia.org/ci/job/mwext-WikibaseQualityConstraints-repo-tests-sqlite-hhvm/123/console :

  • Installing wikibase/data-model-serialization (1.6.0)

13:21:06 Downloading https://api.github.com/repos/wmde/WikibaseDataModelSerialization/zipball/1b6df155e1a0a6565789e2258ccee8557ec9a803
13:21:06 Downloading: Connecting...
13:21:06 Failed: [Composer\Downloader\TransportException] 403: The "https://api.github.com/repos/wmde/WikibaseDataModelSerialization/zipball/1b6df155e1a0a6565789e2258ccee8557ec9a803" file could not be downloaded (HTTP/1.1 403 Forbidden)

Reproduction steps

git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/WikibaseQualityConstraints.git
rm -fR ~/.cache/composer/files/wikibase/data-model-services
composer install -vvv
...
  - Installing wikibase/data-model-services (3.6.0)
Downloading https://api.github.com/repos/wmde/WikibaseDataModelServices/zipball/7892031adcaa657bf8fa05aad2638ed64f548682
Writing ~/.cache/composer/files/wikibase/data-model-services/56f2536761476fcaf363ef59bffeb37e77bb4299.zip
   into cache from ~/projects/mediawiki/extensions/WikibaseQualityConstraints/vendor/wikibase/data-model-services/bf7e3d736405142f63289df2353cd070
...

Event Timeline

JanZerebecki raised the priority of this task from to Needs Triage.
JanZerebecki updated the task description. (Show Details)

Of the top of my head it looks like our CI infrastructure will need a OAuth token for the api requests, I am guessing the 403 is due to limits

https://developer.github.com/v3/#rate-limiting

We had a similar problem with the WikidataBuilder
https://github.com/wmde/WikidataBuilder/commit/818292769f1c008a6211c2c22e7baecc0e934e13

This is also fixed in WikidataBuildResources
https://github.com/wmde/WikidataBuildResources/commit/0bf0ac3269dfe976d518d7d749599c62abf1f807

This token probably shouldn't be in the composer.json for the Quality extensions though...

https://getcomposer.org/doc/articles/troubleshooting.md#api-rate-limit-and-oauth-tokens

We need something like this

composer config -g github-oauth.github.com <oauthtoken>

We should be using Satis ( https://getcomposer.org/doc/articles/handling-private-packages-with-satis.md ) to have a local mirror of the dependencies we need, which has the benefit that our infrastructure would be independent from Github and Packagist (except for pulling new versions when they are added in a Github maintained repository).

Just took another looks at this and it would seem there are other URLs that composer could use to download the zips that are not part of the api (or don't appear to be) thus would not be rate limited (or might not be)..

3:57 PM <addshore> The download zip link on the site, for example for master, can have master replaced with a hash
3:57 PM <addshore> https://github.com/wmde/WikibaseDataModelSerialization/archive/1b6df155e1a0a6565789e2258ccee8557ec9a803.zip
3:57 PM <addshore> which in turn redirects to something like https://codeload.github.com/wmde/WikibaseDataModelSerialization/zip/1b6df155e1a0a6565789e2258ccee8557ec9a803

We will still need a local cache like satis. But changing composer to not hit the github api if it is possible is a good idea anyway.

Maybe we can route all requests to a shared web proxy that would cache the packages? That would benefit other package systems (npm, gem, pip...), though with https I am not sure the material will be cached :-\

Yes unless we add a MITM proxy they can not be cached by a normal proxy.

Jonas renamed this task from github.com is 403ing downloads from Wikimedia CI during composer update to [Bug] github.com is 403ing downloads from Wikimedia CI during composer update.Aug 14 2015, 4:49 PM
Jonas set Security to None.

That is quite an old task, I am not sure whether it is still happen of it got worked around somehow.

A potential solution would be to have the package manager use a central proxy T147635: Investigate again a central cache for package managers which would old cache of materials and thus stop hitting the upstream repo (eg packagist).

If someone lurking as a reproduction test case that ends up trying to download a tarball from github, we can then try to hit the proxy and figure out a way have it cache the tarball.

If that central cache (T147635) works. I guess we can decline the subtasks since we solely rely on packagist.org to host the composer packages. Eg decline:

  • T106548 [Task] create mirror for our composer dependencies
  • T107840 [Task] try out satis

Reproduction steps

git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/WikibaseQualityConstraints.git
rm -fR ~/.cache/composer/files/wikibase/data-model-services
composer install -vvv
...
  - Installing wikibase/data-model-services (3.6.0)
Downloading https://api.github.com/repos/wmde/WikibaseDataModelServices/zipball/7892031adcaa657bf8fa05aad2638ed64f548682
Writing ~/.cache/composer/files/wikibase/data-model-services/56f2536761476fcaf363ef59bffeb37e77bb4299.zip
   into cache from ~/projects/mediawiki/extensions/WikibaseQualityConstraints/vendor/wikibase/data-model-services/bf7e3d736405142f63289df2353cd070
...

And since that is over HTTPS we would need some kind of SSL man in the middle proxy bah :(

And since that is over HTTPS we would need some kind of SSL man in the middle proxy bah :(

composer was/is easy to MITM, @csteipp even had/has a proof of concept to do it...

Not sure there is a real issue to solve here. Yes, GitHub does not have a 100% success rate for providing zips. travis_retry gets around that nicely. Works fine for all my projects, and I'm not aware of other people having problems with it. So why would WMF need its own crazy solution?

hashar claimed this task.

This is mostly fixed nowadays. We have caches and or github raised their throttle limit.