Page MenuHomePhabricator

Understand which repositories we mirror, observe, host in Diffusion (and fix some findings)
Closed, ResolvedPublic

Description

Bigger picture: T330347: A tool for displaying the canonical location of all Wikimedia repositories and any known mirrors

Docs: https://we.phorge.it/book/phorge/article/diffusion_uris/#reference-i-o-types

URI stats (note that the number of repos is lower, and that this query ignores the active/disabled state of the repo itself):

mysql:phstats@m3-slave.eqiad.wmnet [phabricator_maniphest]> SELECT COUNT(u.id), u.ioType FROM phabricator_repository.repository_uri u WHERE u.isDisabled = 0 GROUP BY u.ioType;
+-------------+-----------+
| COUNT(u.id) | ioType    |
+-------------+-----------+
|        1970 | default   |
|          74 | mirror    |
|         366 | none      |
|        2846 | observe   |
|       18026 | read      |
|           8 | readwrite |
+-------------+-----------+

Per SELECT CONCAT("https://phabricator.wikimedia.org/diffusion/", r.id, "/manage/uris/"), r.name, u.uri FROM phabricator_repository.repository_uri u INNER JOIN phabricator_repository.repository r ON r.phid = u.repositoryPHID WHERE u.ioType = "mirror" AND u.isDisabled = 0 AND r.details NOT LIKE "%\"importing\":false%" ORDER BY u.uri; we mirror (up to) 74 repos from Diffusion to other places:

  • 3 times to Gerrit (1913, 1334, 2341) by using K18,
  • 4 times to Gerrit via ssh://phab@gerrit.wikimedia.org:29418 (2668, 1914, 2047, 1912) by using K19,
  • 1 time to personal GitLab (see below)
  • 4 times to personal GitHub (2311, 2310, 2761, 2579) (see below),
  • the rest is all to GitHub under toolforge or wikimedia.
    • Some of these GitHub repos might be read-only so trying to mirror should fail (e.g. malu in 1876, likely more).
    • Some of these GitHub repos have been deleted so trying to mirror should fail (e.g. 1972, likely more)?
    • Some repos like 1876 are "disabled" already in Diffusion so does Diffusion still try to push a mirror out then? Do I need to fix the SQL queries above to also take that in consideration by adding AND r.details NOT LIKE "%\"importing\":false%" ? This reduces the mirror number from 74 down to 30: 29 being mirrored to github.com/wikimedia/* and 1 being https://phabricator.wikimedia.org/diffusion/PHES/manage/uris/
    • https://phabricator.wikimedia.org/diffusion/PHES/manage/uris/ states it mirrors to Gerrit but that repo is marked as inactive, where to find that info in the Phabricator database?
Incomplete list of some action items:

Related Objects

Event Timeline

Aklapper created this task.

I disabled the GitHub pushing URIs, those are legacy items that are no longer needed.

@Aklapper The whole repo can be deleted. Tool isn't in use anymore.

@Mbch331 Thanks! As Phab observed https://gitlab.wikimedia.org/toolforge-repos/ddescriptions , the place to delete would be on GitLab I'd say. I wonder if you can access https://gitlab.wikimedia.org/toolforge-repos/ddescriptions/edit/ (under "Advanced" there would be a Delete option), because I cannot...

Aklapper edited subscribers, added: Dibya; removed: Mbch331, Urbanecm.
Aklapper raised the priority of this task from Low to High.Oct 18 2023, 5:42 PM
Aklapper removed a subscriber: Dibya.
Aklapper updated the task description. (Show Details)
Aklapper added a subscriber: Dibya.
Aklapper removed a subscriber: Dibya.

This comment is about Repo hosting only (means: writing to a repo in Diffusion via pushing to Diffusion itself). This comment overlaps with T321380.

We still had 8 URIs in 5 repositories with read/write set. See P53043 for details.
Done; we have 0 now (though ancient unused https://phabricator.wikimedia.org/diffusion/PHES/ seems to have its canonical location still in Diffusion).

Sharing notes while I'm trying to understand our state of things (and ambiguous terminology):

When going to the Basics of a repository, the State section lists:

  • "Active": Defined by "tracking-enabled":"active" or "tracking-enabled":"inactive" in the JSON blob in the DB column phabricator_repository.repository.details
  • "Publishing": Defined by`"herald-disabled":0` or "herald-disabled":1 in the JSON blob in the DB column phabricator_repository.repository.details

Quoting the docs:

Deactivating a repository has these effects:

  • the repository will no longer be updated;
  • users will no longer be able to clone/fetch/checkout the repository;
  • users will no longer be able to push to the repository; and
  • the repository will be hidden from view in default queries.

It seems that deactivating a repo also deactivates publishing.

Getting back to the task title: Understand which Diffusion repositories we mirror, observe, host.

From my POV this task ("understand") is resolved.


  • [1] SELECT DISTINCT CONCAT("https://phabricator.wikimedia.org/diffusion/", r.id, "/manage/uris/"), r.name FROM phabricator_repository.repository_uri u INNER JOIN phabricator_repository.repository r ON r.phid = u.repositoryPHID WHERE u.ioType = "observe" AND u.isDisabled = 0 AND r.details LIKE "%\"tracking-enabled\":\"active\"%";
  • [2] SELECT CONCAT("https://phabricator.wikimedia.org/diffusion/", r.id, "/manage/uris/") AS repoURI, r.name, u.uri AS MirroredToUri FROM phabricator_repository.repository_uri u INNER JOIN phabricator_repository.repository r ON r.phid = u.repositoryPHID WHERE r.details LIKE "%\"tracking-enabled\":\"active\"%" AND u.ioType = "mirror" AND u.isDisabled = 0;
  • [3] SELECT CONCAT("https://phabricator.wikimedia.org/diffusion/", r.id, "/manage/uris/") AS repoURI, r.name, u.uri AS MirroredToUri FROM phabricator_repository.repository_uri u INNER JOIN phabricator_repository.repository r ON r.phid = u.repositoryPHID WHERE r.details LIKE "%\"tracking-enabled\":\"active\"%" AND u.ioType = "mirror" AND u.isDisabled = 0 AND r.id NOT IN (SELECT r.id FROM phabricator_repository.repository_uri u INNER JOIN phabricator_repository.repository r ON r.phid = u.repositoryPHID WHERE r.details LIKE "%\"tracking-enabled\":\"active\"%" AND u.ioType = "observe" AND u.isDisabled = 0);
  • [4] SELECT DISTINCT CONCAT("https://phabricator.wikimedia.org/diffusion/", r.id, "/manage/uris/"), r.name, r.phid FROM phabricator_repository.repository_uri u INNER JOIN phabricator_repository.repository r ON r.phid = u.repositoryPHID WHERE r.phid NOT IN (SELECT u2.repositoryPHID FROM phabricator_repository.repository_uri u2 WHERE u2.ioType = "observe") AND r.details LIKE "%\"tracking-enabled\":\"active\"%";

I posted a list of Diffusion repositories both pulling from another place and pushing to another place in non-public T341971#9352094 for investigation.

Note to myself: Those Diffusion repositories that get mirrored to GitHub display an additional custom "Download Archive" (from GitHub) button due to custom rPHEXf0b35bebcaf4d0927f454b681180868f81b6f6da and rPHABf85036a1f2f6805258160e95930a910e1b96ed81 and rPHABe7135d411890698c87d5b01cb85eb6015cd2f29d .
If we ever decide not to mirror anything to Github anymore, then this custom code should be removed.

For the records, from #wikimedia-gitlab irc:

<bd808> andre: I stumbled upon https://github.com/grdl/gitlab-mirror-maker today. And it corrected my faulty memory. GItLab CE can push mirror to other git hosts; it cannot pull mirror from other git hosts as that is an EE feature.
<bd808> gitlab-mirror-maker is designed to mirror the repos that a single user owns, so we would likely need to fork or do some solid work upstream if we decided to use it.

FYI: A list of active repos mirrored from Diffusion into GitHub. Per T341971 this is done by K32 and K35.

mysql:phstats@m3-slave.eqiad.wmnet [phabricator_maniphest]> SELECT CONCAT("https://phabricator.wikimedia.org/diffusion/", r.id, "/manage/uris/") AS repoURI, r.name, u.uri AS MirroredToUri FROM phabricator_repository.repository_uri u INNER JOIN phabricator_repository.repository r ON r.phid = u.repositoryPHID WHERE r.details LIKE "%\"tracking-enabled\":\"active\"%" AND u.ioType = "mirror" AND u.isDisabled = 0 AND u.uri LIKE "%github%";
+---------------------------------------------------------------+-------------------------------------+----------------------------------------------------------------------+
| repoURI                                                       | name                                | MirroredToUri                                                        |
+---------------------------------------------------------------+-------------------------------------+----------------------------------------------------------------------+
| https://phabricator.wikimedia.org/diffusion/1908/manage/uris/ | OOjs Router                         | https://github.com/wikimedia/oojs-router.git                         |
| https://phabricator.wikimedia.org/diffusion/1921/manage/uris/ | tool-gridengine-status              | https://github.com/wikimedia/tool-gridengine-status                  |
| https://phabricator.wikimedia.org/diffusion/1921/manage/uris/ | tool-gridengine-status              | https://github.com/toolforge/gridengine-status.git                   |
| https://phabricator.wikimedia.org/diffusion/1922/manage/uris/ | tool-admin-web                      | https://github.com/toolforge/admin.git                               |
| https://phabricator.wikimedia.org/diffusion/1944/manage/uris/ | tool-replag                         | https://github.com/toolforge/replag.git                              |
| https://phabricator.wikimedia.org/diffusion/1956/manage/uris/ | extension-CookieWarning             | https://github.com/wikimedia/mediawiki-extensions-CookieWarning      |
| https://phabricator.wikimedia.org/diffusion/1958/manage/uris/ | tool-versions                       | https://github.com/toolforge/versions.git                            |
| https://phabricator.wikimedia.org/diffusion/2043/manage/uris/ | tool-precise-tools                  | https://github.com/toolforge/precise-tools.git                       |
| https://phabricator.wikimedia.org/diffusion/2048/manage/uris/ | tool-my-first-flask-oauth-tool      | https://github.com/wikimedia/tool-my-first-flask-oauth-tool          |
| https://phabricator.wikimedia.org/diffusion/2048/manage/uris/ | tool-my-first-flask-oauth-tool      | https://github.com/toolforge/my-first-flask-oauth-tool.git           |
| https://phabricator.wikimedia.org/diffusion/2073/manage/uris/ | tool-keystone-browser               | https://github.com/wikimedia/tool-keystone-browser.git               |
| https://phabricator.wikimedia.org/diffusion/2073/manage/uris/ | tool-keystone-browser               | https://github.com/toolforge/openstack-browser.git                   |
| https://phabricator.wikimedia.org/diffusion/2080/manage/uris/ | tool-grid-jobs                      | https://github.com/toolforge/grid-jobs.git                           |
| https://phabricator.wikimedia.org/diffusion/2083/manage/uris/ | operations-dumps-import-tools       | https://github.com/wikimedia/operations-dumps-import-tools           |
| https://phabricator.wikimedia.org/diffusion/2093/manage/uris/ | operations-software-wmfmariadbpy    | https://github.com/wikimedia/operations-software-wmfmariadbpy        |
| https://phabricator.wikimedia.org/diffusion/2117/manage/uris/ | tool-mysql-php-session-test         | https://github.com/toolforge/mysql-php-session-test.git              |
| https://phabricator.wikimedia.org/diffusion/2127/manage/uris/ | extension-ArticleToCategory2        | https://github.com/wikimedia/mediawiki-extensions-ArticleToCategory2 |
| https://phabricator.wikimedia.org/diffusion/2155/manage/uris/ | tool-tool-db-usage                  | https://github.com/toolforge/tool-db-usage                           |
| https://phabricator.wikimedia.org/diffusion/2376/manage/uris/ | Phabricator Antivandalism Extension | https://github.com/wikimedia/phabricator-antivandalism.git           |
| https://phabricator.wikimedia.org/diffusion/2458/manage/uris/ | Keyholder                           | https://github.com/wikimedia/keyholder.git                           |
| https://phabricator.wikimedia.org/diffusion/2610/manage/uris/ | tool-spacemedia                     | https://github.com/toolforge/tool-spacemedia.git                     |
| https://phabricator.wikimedia.org/diffusion/2715/manage/uris/ | tool-k8s-status                     | https://github.com/toolforge/tool-k8s-status.git                     |
| https://phabricator.wikimedia.org/diffusion/2846/manage/uris/ | operations-software-wmfbackups      | https://github.com/wikimedia/operations-software-wmfbackups.git      |
| https://phabricator.wikimedia.org/diffusion/2957/manage/uris/ | mediawiki-libs-metrics-platform     | https://github.com/wikimedia/mediawiki-libs-metrics-platform         |
| https://phabricator.wikimedia.org/diffusion/3061/manage/uris/ | operations-software-wmfdb           | https://github.com/wikimedia/operations-software-wmfdb               |
| https://phabricator.wikimedia.org/diffusion/3237/manage/uris/ | tool-python-toolforge               | https://github.com/toolforge/python-toolforge.git                    |
+---------------------------------------------------------------+-------------------------------------+----------------------------------------------------------------------+
26 rows in set (0.049 sec)