Related Objects
- Mentioned In
- T296980: Migrate extdist to production and a better architecture
T249949: Stop using integration/composer and then archive the repo - Mentioned Here
- T296980: Migrate extdist to production and a better architecture
T143969: Unable to mirror repository from git.legoktm.com into diffusion
T70122: Use new labs project (extdist.wmflabs.org) for ExtensionDistributor
T249949: Stop using integration/composer and then archive the repo
Event Timeline
And switch to packaged composer per T249949: Stop using integration/composer and then archive the repo
And maybe in production? (Ganeti VM might be an easier start)
It looks like we used to use Gerrit/Gitweb to generate the tarballs, but then Gitweb was replaced by Gitiles and the easy-to-abuse tar feature was no longer exposed so we (temporarily) switched to GitHub until T70122 in 2014 when we moved to a wmflabs.org VM. But given we link to these from mediawiki.org (example) and that people are expected to execute code based on them, it doesn't seem great to run in wmflabs long-term. Proxying would avoid the PII issue but not the contents issue.
What would it take to run this in prod instead?
Short answer:
- figuring out whether it's acceptable to run composer in a prod VM
- figuring out whether we're OK cloning submodules from GitHub (or any other Git host I suppose), see comments starting with T143969#2647761
Long answer:
ExtensionDistributor's key function set is: it includes submodules (necessary for VisualEditor primarily, but also some others), it installs composer dependencies and it generates gitinfo.json files for Special:Version.
The architecture is pretty bad, both the MW extension and extdist.wmflabs.org independently talk to Gerrit, so when a new commit is pushed, the two can get out of sync. We also don't provide long-term stable URLs for curl ... which people regularly ask for.
I wrote up a better proposal with Yuvi's help in Feb 2015 at https://www.mediawiki.org/wiki/Extension:ExtensionDistributor/tardist but it stalled for reasons I don't remember. Today if we wanted to deploy an API like this, it would go in k8s except k8s doesn't have proper disk storage, so we'd use swift instead and it just becomes a much more complicated thing than what I have time to develop/maintain.
If the two bullets are acceptable risk, then I'd be down to move the current setup into a Ganeti VM, it's already all puppetized anyways.
We're going directly to bullseye instead. I have copied the useful part of this discussion to T296980: Migrate extdist to production and a better architecture.