Page MenuHomePhabricator

Diffusion mirrors of GitLab repos failing to be created by Striker
Open, HighPublicBUG REPORT

Description

Traceback (most recent call last):
  File "/srv/app/striker/tools/views/repo.py", line 74, in create
    mirror = phab.create_repository("tool-{}".format(name), [])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/app/striker/phabricator.py", line 211, in create_repository
    r = self.post(
        ^^^^^^^^^^
  File "/srv/app/striker/phabricator.py", line 76, in post
    raise APIError(
striker.phabricator.APIError: This install is configured in cluster mode, but all available repository cluster services are closed to new allocations. At least one service must be open to allow new allocations to take place. (ERR-CONDUIT-CORE)

This is the diffusion.repository.edit Conduit endpoint returning a failure when attempting to create a new Diffusion repo to mirror a GitLab repo.

The application code logs this error and moves on without notifying the user, so this looks to have been happening for quite a while without being noticed. https://logstash.wikimedia.org/goto/2d6b87add607809037c29f04b6671654 shows the first occurrence I can find logged at Jan 22, 2024 @ 23:29:04.767.

Event Timeline

I was able to create R3385 bd808-testing-T362909 via the Phabricator web UI. Is it only the Conduit API that is failing? Are the failures intermittent?

@bd808: Does this error still appear after 2024-04-16 15:30UTC? If not, then this is likely a duplicate of T355644 (which reverted T352530 which I ideally would have never merged but lots of undocumented custom changes make it hard to foresee unwanted breakage).

@bd808: Does this error still appear after 2024-04-16 15:30UTC? If not, then this is likely a duplicate of T355644 (which reverted T352530 which I ideally would have never merged but lots of undocumented custom changes make it hard to foresee unwanted breakage).

The most recent failure seems to have been 2024-04-16 00:25UTC. The test repo I just now made at https://gitlab.wikimedia.org/toolforge-repos/bd808-test-T362909 has been properly mirrored as https://phabricator.wikimedia.org/source/tool-bd808-test-T362909/. I'd say this combination makes it very likely that this was the same T352530: Remove unused code in Diffusion to handle closed almanac cluster services triggered root as T355644: Enable git repo creation in Diffusion again and is now fixed.

I will make time to find and fix the missing mirrors from the 2024-01-22 through 2024-04-16 period that it looks like the bug was in effect for Striker's users.

Aklapper moved this task from To Triage to Misc on the Phabricator board.