Page MenuHomePhabricator

Scap reliance on extension-list in turn requires extensions to be deployed to WMF production before being deployed to the Beta Cluster
Open, MediumPublicBUG REPORT

Description

The current process requires deployment to beta after requesting reviews.

But deploying to beta requires us to branch the code for the production deployment for two weeks so it can be added to extension-list so it can be deployed to beta, which strikes me as bad.

It looks like we could use --extension-dir for mergeMessageFileList.php in scap for production. For beta, that would result in a much larger l10n directory for no benefit. Creating a beta-specific file in wmf-config (akin to -labs.php files) seems much nicer than creating one file for both beta and production.


(was: Scap should not rely on extension-list, instead pass --extension-dir to mergeMessageFileList.php)

@hashar wrote:

I am pretty sure the only use case is maintenance/mergeMessageFileList.php which apparently already has some kind of detection mechanism built-in:

$possibilities = [
    "$extdir/$extname/extension.json",
    "$extdir/$extname/skin.json",
    "$extdir/$extname/$extname.php"
];
$found = false;
foreach ( $possibilities as $extfile ) {
    if ( file_exists( $extfile ) ) {

That is enabled by passing it --extensions-dir and it "nicely" errors out whenever an extension lack one. The scap task still rely on the extension-list file

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mmodell added a revision: Restricted Differential Revision.Feb 3 2016, 11:20 PM

We should just get rid of extension-list...

@Legoktm: I agree, it seems unnecessary? But update-l10n breaks without it, currently... I'll dig deeper into it.

It looks like mergeMessageFileList.php can scan the extensions directory, as an alternative to passing an extension-list file, however, we need to scan extensions/* and skins/*

Should I make a way that we can pass multiple directories to the script or perhaps an option to recursively search for everything starting from the root of the mediawiki/core branch checkout?

Anyone have an opinion?

Add support for passing multiple paths to extension-dir arg in mergeMessageFileList.php

https://gerrit.wikimedia.org/r/#/c/268337/

Change 268337 had a related patch set uploaded (by Legoktm):
Support multiple extension-dir paths to be passed to mergeMessageFileList

https://gerrit.wikimedia.org/r/268337

Final step is generating the extension list when the branch is cut.

mmodell lowered the priority of this task from High to Medium.Mar 29 2016, 10:07 PM

From code review of https://gerrit.wikimedia.org/r/#/c/287565/ @hashar wrote:

Do we still need the extension-list anyway? IIRC it is only used for merging l10n messages and the logic could be added to that maintenance script instead.

For CI purposes, we have moved most if not all extensions to use ExtName/ExtName.php as the entry point, and some are migrated to extension.json / skin.json. So the lookup should be quite trivial and we could get rid of extension-list entirely.

@mmodell replied:
It would be great if we could remove the need for this, but somewhere it has to be either generated or computed on demand. My main goal was to remove the chicken-or-egg problem when adding or removing an extension during branch cutting.

I am pretty sure the only use case is maintenance/mergeMessageFileList.php which apparently already has some kind of detection mechanism built-in:
...

$possibilities = [
    "$extdir/$extname/extension.json",
    "$extdir/$extname/skin.json",
    "$extdir/$extname/$extname.php"
];
$found = false;
foreach ( $possibilities as $extfile ) {
    if ( file_exists( $extfile ) ) {

...
That is enabled by passing it --extensions-dir and it "nicely" errors out whenever an extension lack one. The scap task still rely on the extension-list file ... Worth giving that a try?


D288 would move scap to use --extensions-dir and we would no more need the extension-list file.

NOTE: --extensions-dir has never been used AFAIK. So we will want to properly test it, maybe by switching beta to it first.

@hashar is right, this isn't necessary if mergeMessageFileList.php can handle things without the extension list.

mmodell renamed this task from extension-list should live in the mediawiki branch rather than mediawiki-config to SCAP should not rely on extension-list, instead pass --extension-dir to mergeMessageFileList.php.Jul 18 2016, 7:08 PM
mmodell updated the task description. (Show Details)
mmodell set Security to None.
mmodell updated the task description. (Show Details)

Below is a draft comment I wrote at some point. committing it for the record

@mmodell replied:
It would be great if we could remove the need for this, but somewhere it has to be either generated or computed on demand. My main goal was to remove the chicken-or-egg problem when adding or removing an extension during branch cutting.

I am pretty sure the only use case is maintenance/mergeMessageFileList.php which apparently already has some kind of detection mechanism built-in:
...

$possibilities = [
    "$extdir/$extname/extension.json",
    "$extdir/$extname/skin.json",
    "$extdir/$extname/$extname.php"
];
$found = false;
foreach ( $possibilities as $extfile ) {
    if ( file_exists( $extfile ) ) {

...
That is enabled by passing it --extensions-dir and it "nicely" errors out whenever an extension lack one. The scap task still rely on the extension-list file ... Worth giving that a try?


D288 would move scap to use --extensions-dir and we would no more need the extension-list file.

NOTE: --extensions-dir has never been used AFAIK. So we will want to properly test it, maybe by switching beta to it first.

The TLDR is that I wrote the --extensions-dir feature years ago but it never got used afaik.

I'm not afraid of using it, I'm mostly concerned of how we'll handle beta. We currently have the extension meta repo symlink'd for easy installation. Sadly, this would make l10n regeneration impossibly slow if we only used --extensions-dir

beta uses a clone of mediawiki/extensions but surely we could instead use the list of repos for which we cut a branch?

Possibly. But we also deploy some extensions to beta that we don't branch for prod yet. I'd rather just stop using that meta repo entirely--can we think up another solution?

Could we add a "beta-only" section in make-wmf-branch's config.json?

Could we add a "beta-only" section in make-wmf-branch's config.json?

Maybeeeee..... but beta doesn't rely on make-wmf-branch so it'd be a weird dependency. Really all this crud should move to wmf-config as scap plugins.

Krinkle renamed this task from SCAP should not rely on extension-list, instead pass --extension-dir to mergeMessageFileList.php to Scap should not rely on extension-list, instead pass --extension-dir to mergeMessageFileList.php.May 31 2019, 8:58 PM

The subtask is now closed and all of extension-list is now on extension.json but I want to point out that Wikibase doesn't have extension.json, it has extension-repo.json and extension-client.json

@hashar: Could you please answer the last comment? Thanks in advance!

We still rely on operations/mediawiki-config having the list of all extension.json / skin.json in wmf-config/extension-list. The related code is:

scap/tasks.py
# attempt to read extension-list from the branch instead of wmf-config
ext_list = os.path.join(cfg["stage_dir"], "php-%s" % version, "extension-list")

if not os.path.isfile(ext_list):
    # fall back to the old location in wmf-config
    ext_list = "%s/wmf-config/extension-list" % cfg["stage_dir"]

utils.sudo_check_call(
    "www-data",
    "/usr/local/bin/mwscript mergeMessageFileList.php "
    '--wiki="%s" --list-file="%s" '
    '--output="%s"' % (wikidb, ext_list, new_extension_messages),
)

A patch I wrote ten years ago (ad1609059f1) made mergeMessageFileList.php to auto discovers extensions and skins by passing --extension-dir. The idea was to save us the manual maintenance of wmf-config/extension-list and instead rely on crawling the extensions and skins registered as submodule of the mediawiki/core wmf branches.

Would it work? Well maybe. I am not sure what will happens to Beta-Cluster-Infrastructure which has all the extensions and skins hosted on Gerrit registered as submodules which would potentially cause very large l10n files and cache to be generated.

This task isn't resolved. It would be nice to get it resolved as it would simplify deploying to beta as a step of the preparing an extension for deployment checklist.

The current process requires deployment to beta after requesting reviews.

But deploying to beta requires us to branch the code for the production deployment for two weeks so it can be added to extension-list so it can be deployed to beta, which strikes me as bad.

It looks like we could use --extension-dir for mergeMessageFileList.php in scap for production. For beta, that would result in a much larger l10n directory for no benefit. Creating a beta-specific file in wmf-config (akin to -labs.php files) seems much nicer than creating one file for both beta and production.

I note here that extension-list-labs existed and was removed almost exactly 6 years ago with commit message "Last extension-list-labs. Don't ever bring it back." But I don't think I understand why, exactly. Maybe someone with a long memory can clue me in?

My thinking is: in production, we need to find every extension.json on disk and we can do that without a list. But in Beta, we need a seemingly random subset of extensions on disk, and a list is an easy way to solve that (we're using a list now, but it's the list for production).

My thinking is: in production, we need to find every extension.json on disk and we can do that without a list.

There is also the security aspect. Auto-discovery can open a lot of RCE holes.

FWIW, Tyler's comments relate to an issue Tyler and me were discussing last week: Currently, it is technically impossible to deploy (as in enable) a new extension to Beta without first deploying it to production (which makes no sense at all).

Explanation of why is that the case: For the extension's code to be available within beta, it needs to be registered as a Git submodule of mediawiki/extensions. In theory, that should be enough to enable the code in Beta (by making changes to CS-labs.php and IS-labs.php in the config repo). However, for the extension to be meaningfully disabled, we need to be building i18n cache for it, which requires it. being in extension-list, which requires it to be branched for 2+ weeks prior the deployment, which means the extension has to be deployed to production before beta. This causes confusion regarding Security review, etc., as we're technically doing a production deployment, but not really.

My thinking is: in production, we need to find every extension.json on disk and we can do that without a list.

There is also the security aspect. Auto-discovery can open a lot of RCE holes.

What if we used autodiscovery for beta only? That'd resolve the issue described above, and it also wouldn't change anything in production. We would have a lot of additional room for RCE in Beta, but that's a significantly smaller problem (perhaps insignificant enough to ignore), especially given the amount of people with deployment-prep access (and given any of them has full root, thus giving them more than enough opportunities to create as many RCE holes as they want to).

As long as it's not touching production, I'm happy ^_^

@Ladsgroup wrote:

There is also the security aspect. Auto-discovery can open a lot of RCE holes.
As long as it's not touching production, I'm happy ^_^

I originally wrote the patch to save us from having to manually maintain wmf-config/extension-list and remove one extra step when adding a new extension. It still needs to be added to the branch script in order to make its way to production and enabling it also requires an extra wfLoadExtension(). I don't see what kind of remote code execution that would enables.

Anyway, that was from back in 2012 when we created much more extensions and this task T125678 is a follow up of a discussion we had among release engineering to eventually simplify the addition of an extension and the deployment (by removing yet another "mysterious" file).

Else, I guess we can decline the task :)

@Ladsgroup wrote:

There is also the security aspect. Auto-discovery can open a lot of RCE holes.
As long as it's not touching production, I'm happy ^_^

I originally wrote the patch to save us from having to manually maintain wmf-config/extension-list and remove one extra step when adding a new extension. It still needs to be added to the branch script in order to make its way to production and enabling it also requires an extra wfLoadExtension(). I don't see what kind of remote code execution that would enables.

A compromised host or container as a non-mwdeploy user, they can add file but not change them. This is defense in depth.

Not to mention that list is used by codesearch to determine what extensions are in production.

My thinking is: in production, we need to find every extension.json on disk and we can do that without a list.

There is also the security aspect. Auto-discovery can open a lot of RCE holes.

What if we used autodiscovery for beta only? That'd resolve the issue described above, and it also wouldn't change anything in production. We would have a lot of additional room for RCE in Beta, but that's a significantly smaller problem (perhaps insignificant enough to ignore), especially given the amount of people with deployment-prep access (and given any of them has full root, thus giving them more than enough opportunities to create as many RCE holes as they want to).

One thing that worries me about beta is there are a ton of extensions there—960 on disk. We generate l10n for only what's in extension-list (196 extensions, ⅕th of the total extensions on disk), which takes 5.4G of space (13% of our 40G /srv/ mount). If we generate l10n via autodiscovery on beta, seems like we'd run out of space.

A separate list for beta seems a small price to pay to solve the problem you've highlighted: the need to deploy to production two weeks before you're able to Beta which seems backwards. We'd need to add some small scap support for it, too (merge request above covers it).

@hashar via IRC mentioned an alternative to my proposal: create a deployment-prep branch of mediawiki/extensions and do autodiscovery on beta.

Appealing about that idea:

  • saves disk space in beta vs. status quo.
  • only one place to update to add an extension in beta (vs my proposal which would necessitate editing extension-list-labs)
bd808 renamed this task from Scap should not rely on extension-list, instead pass --extension-dir to mergeMessageFileList.php to Scap reliance on extension-list in turn requires extensions to be deployed to WMF production before being deployed to the Beta Cluster.Dec 1 2025, 10:48 PM
bd808 removed a project: Patch-For-Review.
bd808 updated the task description. (Show Details)
bd808 changed the subtype of this task from "Task" to "Bug Report".

I note here that extension-list-labs existed and was removed almost exactly 6 years ago with commit message "Last extension-list-labs. Don't ever bring it back." But I don't think I understand why, exactly. Maybe someone with a long memory can clue me in?

https://lists.wikimedia.org/pipermail/engineering/2018-March/000520.html has @demon's explanation of why. I found that in a footnote at https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment#cite_note-3

My thinking is: in production, we need to find every extension.json on disk and we can do that without a list. But in Beta, we need a seemingly random subset of extensions on disk, and a list is an easy way to solve that (we're using a list now, but it's the list for production).

One of the reasons Chad laid out in his email was:

  1. Remind everyone that code needs appropriate review (security, performance, DBA, etc) before going to beta. This has always been policy, but it sometimes comes up as a question. This makes process self documenting.

but he also said:

  1. Eventually get rid of extension-list too. Ideally we want to swap the l10n rebuild stuff to use its --extension-dir so we can just copy the code in and get picked up automatically. Yay less maintenance.

(spoiler: #5 was the reason I embarked on this adventure).

I think this is where things got a bit stuck in that Chad did not keep going down this path. If we had kept moving forward the need to couple the extensions bundled in the MediaWiki OCI container deployed in WMF production with the l10n build in Beta Cluster would disappear.

One path to Chad's vision would be changing the Beta Cluster deployment so that it used something other than a checkout of the master branch of mediawiki/extensions.git to deploy extensions. If we curated a smaller set of extensions that matched what was enabled in LocalSettings across all of the Beta Cluster wikis then we could use the mergeMessageFileList.php --extension-dir DIRECTORY method to build the l10n cache in both environments.

thcipriani closed https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/259

Draft: MergeMessageFileLists.php: Use a realm-specific extension-list

The train ran for a week and continued to run in the next week without any identified problems. I think we can call this experiment a success and proceed to the cleanup phase where we update docs on mediawiki.org to stop telling people that they need to rush to get their extension/skin on the train early.

I will take a shot at fixing up the mediawiki.org docs. I will also create a feature request about the possibility of using --extension-dir instead of an extension-list file as a follow up.