Page MenuHomePhabricator

CI depending on GitHub results in numerous failures outside our control
Closed, DeclinedPublic

Description

As per T362425: ForeignResourceStructureTest flaky in CI due to "Failed to download resource at https://codeload.github.com" and T362095: "composer install" flaky in CI due to "Failed to connect to github.com port 443: Connection timed out", our CI, especially for MediaWiki heavily depends on GitHub. This is partly due to Composer and partly due to more specific dependancies in npm related packages (validated as part of foreign-resources.yaml), which often in turn have their canonical repositories on GitHub, meaning we load things from there.

Numerous jobs repeatedly failing is not helpful to Developer Productivity!

While fixes like rCICFe0982e0c4573: dockerfiles: [composer-php70] Inject GitHub OAuth token into composer runtime for T106452: Composer activity from Cloud VPS hosts can be rate limited by GitHub and T248387: Could not authenticate against github.com have helped, we still regularly enough experience issues that are very much outside our control, and do not even correlate with GitHub service outages, for example.

Event Timeline

Reedy triaged this task as High priority.Apr 12 2024, 3:54 PM

While this is more common on MediaWiki and related skins, extensions etc, which is primarily a lot of our CI traffic, it does affect the wider ecosystem, as seen in tasks like T362404: Rust buildservice failed to clone a repository from GitHub.

hashar subscribed.

I feel this task was filed as GitHub had some transient issue and I don't think there is much we can do on the CI infra.

There is a sole task T362425 which is about MediaWiki core ForeignResourceStructureTest hitting GitHub, and that one should probably be untied from GitHub somehow, but I don't think that is specific to the CI infra.

Given the timing, the issue was most probably T368550 which is that we ran CI without any cache since April 5th which triggered some rate limit on GitHub side.