Page MenuHomePhabricator

[EPIC][Infra] Move Wikibase and WikibaseLexeme Git submodules to suitable Git host
Open, HighPublic

Description

Context

The Wikibase and WikibaseLexeme extension Git repositories have several submodules: some libraries and the new (mobile) termbox in Wikibase, and the new Special:NewLexeme page in WikibaseLexeme. Git repository options for them include:

  • GitHub: Currently the “canonical” repository for several of them (i.e. where actual development happens, if it happens); however, due to production network restrictions, GitHub cannot be used as the upstream URL of the submodules (T301273, I7b1e3cbafc)
  • Wikimedia Phabricator (Diffusion): Currently used as a mirror for the four repositories that live on GitHub, and registered in .gitmodules for them to work around the network restrictions mentioned above; however, access to Phabricator from cloud networks is severely rate-limited, so it is effectively not possible to clone these repositories from within GitHub Actions
  • Wikimedia Gerrit: Currently the “canonical” repository and .gitmodules URL for the three remaining submodules, but enforces a different development model
  • Wikimedia GitLab: Currently not used for any submodules, but allows a mostly GitHub-like development model without the network restrictions affecting GitHub and Phabricator

Main Objectives

  • Move the submodules for Wikibase and WikibaseLexeme from Phabricator, to GitLab. This includes
    • [Wikibase] view/lib/wikibase-serialization
    • [Wikibase] view/lib/wikibase-data-values
    • [Wikibase] view/lib/wikibase-data-model
    • [WikibaseLexeme] resources/special/new-lexeme

Mitigated Risks

  • Development slowdowns and stalls due to Diffusion's rate limiting
  • High context switching due to different tool use

Potential Tasks

...

Notes

We have several options to achieve our objective:

  1. Create the GitLab repositories and manually keep them up to date (i.e. push to them whenever we push to the original GitHub repositories, or at least whenever we want to update the submodule commit in Wikibase / WikibaseLexeme) while keeping development on GitHub
  2. Have the GitLab repositories automatically mirror the GitHub repositories, by having GitHub push to them from GitHub actions (having GitLab pull from GitHub automatically is sadly no longer supported – moved to GitLab Premium in 13.9)
  3. Actually move development of the repositories to GitLab and treat GitLab as the “canonical” source

I think these options are not mutually exclusive – in fact, it’s probably easiest to start with 1 and then optionally upgrade to 2 and/or 3 later if we want to.

Open question: Which GitLab namespace should we use? CC @thcipriani or @LSobanski from our earlier discussion – is there a recommended namespace for GitLab repositories that will be used in the production MediaWiki environment?

Event Timeline

One case that needs testing is whether GitHub pulling from GitLab could be influenced by the work done in T366882: implement anti-abuse features for GitLab (Move GitLab behind the CDN). Do we know how many concurrent connections GitHub Actions create?

I don’t think we have any specific access numbers. If the limits are reasonably high, I think we’ll just have to hope for the best – it doesn’t sound like we have other options anyway.

ItamarWMDE renamed this task from Move Wikibase and WikibaseLexeme Git submodules to suitable Git host to [Infra][EPIC] Move Wikibase and WikibaseLexeme Git submodules to suitable Git host.Sep 26 2024, 1:28 PM
ItamarWMDE updated the task description. (Show Details)
ItamarWMDE added subscribers: WMDE-leszek, ItamarWMDE.

@Lucas_Werkmeister_WMDE I've rewritten this ticket to match our EPIC format, please feel free to update any missing information or correct any mistakes.

@WMDE-leszek I saw you sent a message regarding this topic, are your plans impacted by this, are they covered by this epic?

ItamarWMDE renamed this task from [Infra][EPIC] Move Wikibase and WikibaseLexeme Git submodules to suitable Git host to [EPIC][Infra] Move Wikibase and WikibaseLexeme Git submodules to suitable Git host.Sep 26 2024, 1:31 PM
ItamarWMDE moved this task from Incoming to [DOT] Prioritized Epics on the wmde-wikidata-tech board.

Epic Review:

  • The Wikidata.org team decided that our first priority in this epic would be to move WikibaseLexeme to GitLab
  • The rest of Wikibase Lib are of lesser priority. In case we see ourselves starting to make more contributions to these repostiories, we will move them as well.
  • Otherwise, as it appears that the responsible team for these libraries is Wikibase Reuse Team, we will leave it to the to decide if, when and how to do this migration.

Open question: Which GitLab namespace should we use? CC @thcipriani or @LSobanski from our earlier discussion – is there a recommended namespace for GitLab repositories that will be used in the production MediaWiki environment?

We use the large repos namespace for everything. There are services which live there that currently create images for production. Submodules destined for production also make sense there (as a top-level namespace).

We've adopted the pattern of a second-level namespace to limit access (e.g., repos/releng for releng stuff).

I note there is a repos/wikidata already and some WMDE folks are group members.

Would that make sense to use?

Epic Review:

  • The Wikidata.org team decided that our first priority in this epic would be to move WikibaseLexeme to GitLab
  • The rest of Wikibase Lib are of lesser priority. In case we see ourselves starting to make more contributions to these repostiories, we will move them as well.
  • Otherwise, as it appears that the responsible team for these libraries is Wikibase Reuse Team, we will leave it to the to decide if, when and how to do this migration.

Added Wikibase Reuse Team team to this task for review—have there been any internal discussion to update this task or any questions we can help answer?

Thanks for returning to this @thcipriani , and bumping it in my inbox.
I believe Gitlab question is clear - though I am not sure we'll end up moving all of it to Gitlab in the end. WMDE will look into this issue, but it is not a high priority.

For WMDE aspect of this, I have no opinion about resources/special/new-lexeme but I wonder should those view/lib pseudo-libraries really be moved to Gitlab?
I could see that, per ADR 14 it might be instead preferred to "simply" put those into wikibase repository. Those pseudo-libraries could then still be extracted (git-repo-filtered) to a github repository if really intended to keep those as separate npm pacakges too.
This way number of git pulls per CI job etc would drop, and there wouldn't be a need for Diffusion repos, which only exist to be a non-third-party location for code originating from github.

Thanks a lot for reviving this topic. Hitting the rate limit when checking out wikibase was also an issue for third party users as mentioned here and here.

"simply" put those into wikibase repository

I like that, keep it simple. 😗

Adding Suite tag to follow along.

@WMDE: Given some recent thoughts in T359549 on potentially restricting connections to Diffusion due to aggressive crawlers, which made parent task T349921: integration-agent-docker machines excessively pull some Wikibase related Git repos in Diffusion come to my mind, is there anything we can do to make moving off GitHub easier for you? Setting up repositories or anything?

Let us confirm how problematic would be for us to move those repositories into Gerrit. I'll be back (with some details)

Let us confirm how problematic would be for us to move those repositories into Gerrit. I'll be back (with some details)

I was under the impression that everyone is forced to move to GitLab (https://www.mediawiki.org/wiki/GitLab/Roadmap) :-( Am I wrong?

I think in the long run, it would be best to have something like https://radicle.xyz/ and let the protocol figure where to fetch from.

I was under the impression that everyone is forced to move to GitLab (https://www.mediawiki.org/wiki/GitLab/Roadmap)

No longer. However, using GitLab is still an option, and if the projects are currently on GitHub, moving to GitLab is a smaller change, which may be better. (On the other hand, since Wikibase itself is on Gerrit, moving to Gerrit could allow WMDE developers to maintain fewer mental models.)

This is sounds like swarm-intelligence;-) Gerrit seems to be somehow rate limited as well. At least when we were trying to use the github runner to build mw images using gerrit instead of the github mirror builds were unstable and failed from time to time. Now, as the transition to PHP8 is complete, we might no longer need to build custom images anyhow, so this is not a strong argument.

PS: https://github.com/wmde/wikibase-release-pipeline is neither gerrit nor gitlab so one needs to use github anyhow.

@Physikerwelt @Tacsipacsi note that the option I am looking at understanding here at WMDE is about "consolidating" those github repositories into main Wikibase repository, as, among other things, those "libraries" do not have any other use outside of Wikibase AFAICT. Therefore different "code hosting" (WMF Gitlab, github, etc) options discussion is not really relevant for the time being

Makes sense. However, the task description doesn’t reflect this, so please make sure to update it at the end of your exploration (or earlier if you have spare time for it).

@Physikerwelt @Tacsipacsi note that the option I am looking at understanding here at WMDE is about "consolidating" those github repositories into main Wikibase repository, as, among other things, those "libraries" do not have any other use outside of Wikibase AFAICT. Therefore different "code hosting" (WMF Gitlab, github, etc) options discussion is not really relevant for the time being

git subtree would probably be an easy option for that. If that's desirable, happy to help run the commands to capture all history from those and merge with Wikibase, what do you think?

Reedy triaged this task as High priority.Nov 7 2025, 4:44 PM

Makes sense. However, the task description doesn’t reflect this, so please make sure to update it at the end of your exploration (or earlier if you have spare time for it).

Maybe a subtasks can help? Something like

  • Agree on a timeline and make contributors aware
  • run git subtweet
  • fix side effects

I currently can’t recommend any decentralised way to resolve the related problems of speed bumps when fetching sub repos.

Apologies for slowness. The impact/urgency of this has been somewhat unclear to me, hence kicking it down the road for longer than maybe necessary.
I expect WMDE will still make the plan for this before the year ends. Subtasks would follow knowing the approach