Page MenuHomePhabricator

mobileapps-periodic-test failing since 2019-12-04 01:00 UTC due to failing git fetches
Closed, ResolvedPublic

Description

Since 2019-12-04 01:00 UTC, the mobileapps-periodic-test CI job is failing due to a server 500 error when attempting to fetch the mobileapps repo from Phabricator.

20:03:04 + git fetch --quiet --depth 2 https://phabricator.wikimedia.org/diffusion/GMOA master
20:03:04 fatal: unable to access 'https://phabricator.wikimedia.org/diffusion/GMOA/': The requested URL returned error: 500

The timing of the first of these failures corresponds with the switchover described in T238956: switch prod Phabricator from phab1003 to phab1001.

Event Timeline

Mholloway created this task.Dec 4 2019, 1:51 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 4 2019, 1:51 PM
mmodell added a subscriber: mmodell.Dec 4 2019, 6:19 PM

Something is wrong with this repo. I spent a couple of hours trying to figure this out and the issue comes down to this: shallow clone of the repo fails while a normal clone works fine.

Is there a reason why this is even cloning from phabricator at all? The CI job should probably be set up to clone from gerrit-mirror

Thanks for investigating, @mmodell. That's strange; a shallow clone from Gerrit works fine. In any case, I don't know of any specific reason this was set up to clone from Phabricator.

If the fix is any more involved than changing a clone URI, maybe it's best to wait until we've had a chance to reassess the usefulness of this CI job before sinking any more time into it.

It should be as simple as switching the url to gerrit. Generally I think it's a bad idea to have CI rely on phabricator which is sometimes out of date due to replication lag and phabricator's git hosting is essentially maintained as a best-effort service rather than a critical service that must have ~100% uptime.

I obviously can't comment on the value of this CI job.

tldr;

There is a real issue with phabricator that I am still working on, however, the fastest route to getting this job running again is to switch the url to gerrit.

It should be as simple as switching the url to gerrit. Generally I think it's a bad idea to have CI rely on phabricator which is sometimes out of date due to replication lag and phabricator's git hosting is essentially maintained as a best-effort service rather than a critical service that must have ~100% uptime.

I obviously can't comment on the value of this CI job.

tldr;

There is a real issue with phabricator that I am still working on, however, the fastest route to getting this job running again is to switch the url to gerrit.

I can help with redirecting.

Change 554595 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] jjb: [mobileapps-periodic-test] Point at gerrit, not diffusion

https://gerrit.wikimedia.org/r/554595

Change 554595 merged by jenkins-bot:
[integration/config@master] jjb: [mobileapps-periodic-test] Point at gerrit, not diffusion

https://gerrit.wikimedia.org/r/554595

I believe this is now fixed; please confirm.

Mholloway closed this task as Resolved.Dec 4 2019, 8:01 PM