Page MenuHomePhabricator

Drop our mirroring of code to Diffusion and empty the repos
Open, LowPublic

Description

  • Diffusion is not the canonical source of anything
    • → this is a feature. Diffusion is used as mirror to improve integration with Phabricator features, like:
      • mentioning commits only with their hash name
      • having commits shown under your Phabricator user page
      • being able to close a task by "Closes T123" in the commit message, without the need of any bot
  • Having it causes confusion
    • → what kind of confusion?
  • Having it caused disruption
    • → what kind of disruption?
  • Having it caused wastes time
    • → Probably true for Phabricator administrators. But there are still benefits.
    • → Unclear what kind of waste of time for other people.
  • There are a number of bugs
    • Which bug in particular?
  • These bugs are getting fixed by anyone.
  • There's now at least one bug which interferes with actually getting things done T358940: GerritBot comments for 7-digit Gerrit changes conflict with Diffusion commit hashes
    • → The problem was fixed by the kind matmarex during Mar 21 2024, 19:18

Event Timeline

Uh? 🤔

Just to clarify, here the proposal is to close this kind of repositories (?)

https://phabricator.wikimedia.org/source/mediawiki/

Please at least evaluate all the features we are going to shutdown, with the above idea:

Aklapper changed the task status from Open to Stalled.Apr 5 2024, 9:05 AM
Aklapper triaged this task as Low priority.

I have disagreed each time this general idea has been promoted in the past and I will continue to disagree until there is a material difference in the arguments made by either side.


[15:56]  <    bd808> brennen, thcipriani: I am now wondering if Striker should setup a Diffusion mirror for each gitlab repo it creates. This happened as a side effect for the repos I just migrated so there will be a feature gap growing for each new repo going forward. Github gives a much better (IMO) code browser than gitiles, but It can be really nice to be able to link to a git hash in Phabricator. Thoughts?
[15:56]  <    bd808> Deciding on this would change my work for T317272
[15:56]  < stashbot> T317272: Remove legacy Diffusion related code from Striker - https://phabricator.wikimedia.org/T317272
[16:14]  <thcipriani> this was always mukundas argument for why differential should mirror everything. It is a really nice feature. I think having a mirror would make sense.
[16:29]  <  brennen> thcipriani, bd808: concur with tyler.
[16:31]  <    bd808> cool. that gives me a direction at least. It also makes me feel that I may have purged some data prematurely... but I can recreate it I think.

Do we want to sync code into Diffusion? It's a massive point of confusion for no added value. I'd rather this task was "Ensure GitLab repos are not mirrored in Diffusion".

It's a massive point of confusion for no added value.

I find it useful for the ability to embed nice links like R3254:03cff13258db: dal: Refactor and update SQL queries into tasks. The syntax needed for that is {git-hash-here} when the repo is mirrored by Diffusion. When not mirrored one can still provide explict links like https://gitlab.wikimedia.org/toolforge-repos/ifttt/-/commit/03cff13258dbb76740e19d49a4c7d25e7e75cdb8, but there is more work required to turn that into a more informative link for the casual reader like dal: Refactor and update SQL queries. As part of T296893: Replace Diffusion integration with Gitlab integration in Striker (toolsadmin) I explicitly added T317345: Mirror Striker managed GitLab repos in Diffusion for this reason.


Would love to preserve nice embedded links.

If we can link those to the canonical host, that'd be cool. Otherwise, having diffusion monitor the canonical hosts (as now) in an option (or have canonical hosts mirror, whichever).


We could also just stop mirroring and empty the Diffusion repos whose only benefit is the auto-linking.

I prefer the ability to explicitly link commits using the {hash} and {diffusion repo:hash} methods over autoliniking which for diffusion commits and other things can be annoying. Auto linked references are not however the only use-case for Diffusion today.

Diffusion is providing a helper service for mirroring repositories to other codeforges. The open-core GitLab "Community Edition" product does not support pull mirroring, but Diffusion does. This means Diffusion can be used to pull from Gerrit or another origin and then push into GitLab. GitLab CE's push mirroring is also deficient in that it requires each origin repo to individually configure credentials for the downstream. These credentials cannot be hidden from the GitLab repo owner, nor can they be shared across repos. This makes GitLab CE's push mirroring support insufficient to implement bulk mirroring from our GitLab to other codeforges such as GutHub.

In my personal opinion, Diffusion provides a much nicer code and commit history browsing experience over Gitiles (the code browser currently used with the Wikimedia Gerrit deployment). You can compare and decide for yourself if you concur, but if we rip out Diffusion then there will not be a choice.

Aklapper closed subtask Restricted Task as Resolved.Jun 10 2024, 8:00 PM

Thanks for reporting. I have carefully examined every single point highlighted, and I think the situation is under control.

Hoping to do a good thing, I put my details directly in the original description.

Please feel free to increase the details and reopen it 🌈 And thank you for any upstream bug reports.

To me this sounds like a valid request not to mirror repositories between numerous systems and not to spend time trying to keep things updated and in sync. If at all, this task may get declined at some point but it does not sound invalid.

While I agree with @Aklapper that this task is not “invalid,” I also think that dropping Diffusion mirrors would be a big loss, and in the end users would spend more working it around than what administrators would win:

  • The handy links in tasks wouldn’t work anymore (including already-inserted ones!), let that be a commit link (bb1d0aca814d), or a link to a certain file (rMW includes/recentchanges/ChangesFeed.php) or a certain line of a file (rMW includes/recentchanges/ChangesFeed.php:37 (at bb1d0aca814d)). This means having to resort to less useful links in future comments/descriptions, for which readers need more time to figure out what they mean; and even more time trying to figure out what the broken links to Diffusion (which probably won’t even be rendered as links) mean.
  • The powerful repository browser. While basic code viewing functionality is present in both Gitiles and on GitHub, Gitiles cannot search for files or code at all, and GitHub doesn’t allow searching for code without logging in, which means having to have a GitHub account and potentially also means going through 2FA (searching for files is allowed on GitHub without logging in). Code navigation by clicking on an identifier is a useful feature of GitHub, but again only for logged-in users – for logged-out users, it works only within a file, which is hardly better than Ctrl+F, especially in languages as dynamic as PHP and JavaScript. (And remember: I’m not arguing for removing the GitHub mirror – the discoverability of GitHub for newbies is unbeatable –, only against removing the Diffusion one.)
  • The diff views that are integrated with the rest of Phabricator:
    • Neither Gitiles nor GitHub render Phabricator task numbers as links (the Gerrit code review interface does, but Gitiles doesn’t; the workaround is clicking on the Change-Id – which is rendered as a link in Gitiles, though not on GitHub –, and clicking once again to get to Phabricator; or copying and pasting the task ID), while Diffusion of course renders them.
    • Similarly, the authorship information is more integrated into Wikimedia: if the author/comitter email address is connected to a Phabricator account, it links to that account, from which one can not only find Phabricator comments of that person, but also their Wikimedia (CentralAuth/mediawiki.org) account, so it’s possible to write to the talk page of the author instead of opening a task on Phabricator if that’s more appropriate; in contrast, Gitiles shows whatever one entered in user.name (which may be a fully different name, e.g. a made-up user name used on Wikimedia but full name used in Git) and generates no links, while GitHub links to the GitHub profile if there’s any, and nowhere if there isn’t (Diffusion also links nowhere if there’s no Phabricator account, but that’s less likely).