Page MenuHomePhabricator

Special:Import form loses configuration data when $wgImportSources contains duplicate subprojects
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

image.png (966×972 px, 22 KB)

Many languages ​​are not included in the importable list, such as en and zh.

What should have happened instead?:
It should include all languages ​​available on the production site.

Details

Event Timeline

Beta cluster is testing "a new import source configuration". Not sure when it started, import from Chinese Wikipedia to beta cluster works fine two years ago per log.

photo_2025-11-14_15-28-19.jpg (956×902 px, 149 KB)

Additional note: Based on the current code, the import source should include all wikis available on the beta cluster. However, the missing languages seem to follow a subtle pattern, as shown in the figure (sourced from Special:SiteMatrix, with non-existent wikis removed).

https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/wmf-config/CommonSettings-labs.php contains:

	// Test of new import source configuration on labs cluster
	$wgImportSources = [];
	require_once __DIR__ . '/import.php';
	$wgHooks['ImportSources'][] = 'wmfImportSources';

The hook handler that is registered there is from https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/wmf-config/import.php

/**
 * Set up import sources for transwiki import (T17583)
 *
 * @param array &$sources
 */
function wmfImportSources( &$sources ) {
	// $lang is the site language from, eg, CommonSettings.php
	// phpcs:ignore MediaWiki.NamingConventions.ValidGlobalName.allowedPrefix
	global $wgConf, $lang;
	$wikipedias = array_flip( WmfConfig::readDbListFile( 'wikipedia' ) );
	$privates = array_flip( WmfConfig::readDbListFile( 'private' ) );
	// REMEMBER when editing this function, the values here are *interwiki prefixes*.
	// Sometimes the interwiki map does things you don't expect.
	// Look at dumpInterwiki.php in WikimediaMaintenance for guidance.
	// Enforce a sensible order
	$sources = [
		// Put really common special wikis first
		'meta', 'commons', 'incubator',
		'wikipedia' => [],
		'wiktionary' => [],
		'wikibooks' => [],
		'wikinews' => [],
		'wikiquote' => [],
		'wikisource' => [ 'oldwikisource' ],
		'wikiversity' => [ 'betawikiversity' ],
		'wikivoyage' => [],
		'chapter' => [],
		// Add a selection of non-private special wikis
		'foundation', 'mediawikiwiki', 'nostalgia', 'outreach', 'strategy',
		'tenwiki', 'testwiki', 'test2wiki', 'testwikidata', 'usability',
		'wikidata', 'wikispecies', 'wikitech',
		'wmania' => [
			'wm2005', 'wm2006', 'wm2007', 'wm2008', 'wm2009', 'wm2010', 'wm2011',
			'wm2012', 'wm2013', 'wm2014', 'wm2015', 'wm2016',
		],
	];
	// Add all regular language projects as import sources
	foreach ( $wgConf->getLocalDatabases() as $dbname ) {
		// No importing from private wikis
		if ( isset( $privates[$dbname] ) ) {
			continue;
		}
		[ $project, $subdomain ] = $wgConf->siteFromDB( $dbname );
		if ( $project === 'wikimedia' ) {
			$sources['chapter'][] = $subdomain;
		} elseif ( $subdomain === 'en' || $subdomain === $lang ) {
			// Put $lang and en at the top for convenience
			array_unshift( $sources[$project], $subdomain );
		} elseif (
			// Don't list sites under "wikipedia" that are not Wikipedias (e.g. meta)
			( $project !== 'wikipedia' || isset( $wikipedias[$dbname] ) ) &&
			// Beta Wikiversity is handled separately
			$subdomain !== 'beta'
		) {
			$sources[$project][] = $subdomain;
		}
	}
}

T17583: Enable importing across all Wikimedia projects is referenced in the code comments.

The config itself came from https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/157338 (2015-12-07)

The wmfImportSources function is fundamentally unchanged since its introduction. It also seems to be abandonware with 10+ years of soak time in Beta with no follow up.

@TTO do you have any objections to ripping this config experiment out and going back to a static map like prod uses?

@TTO do you have any objections to ripping this config experiment out and going back to a static map like prod uses?

I also pinged @TTO about this on their mw.o talk page.

bd808 changed the task status from Open to In Progress.Mar 19 2026, 1:00 AM
bd808 claimed this task.
bd808 triaged this task as Medium priority.
bd808 moved this task from To Triage to Backlog on the Beta-Cluster-Infrastructure board.

Thanks for the ping, @bd808. The idea was to allow any WMF wiki to import from any other WMF wiki without requiring a config change every time a wiki wants to import from a previously unlisted wiki. I guess something is broken with it.

Obviously I would prefer for it to be fixed, and eventually deployed in production. But I'm not going to do any further work on it, so do as you will.

bd808@deployment-deploy04:~$ mwscript shell.php zhwiki
Psy Shell v0.12.20 (PHP 8.3.30 — cli) by Justin Hileman
> $wgImportSources;
= []

> wmfImportSources( $wgImportSources );
= null

> $wgImportSources;
= [
    0 => "meta",
    1 => "commons",
    2 => "incubator",
    "wikipedia" => [
      "zh",
      "en",
      "aa",
      "ar",
      "bn",
      "ca",
      "crh",
      "cs",
      "de",
      "eo",
      "es",
      "fa",
      "fr",
      "he",
      "hi",
      "ja",
      "ko",
      "nl",
      "ru",
      "simple",
      "sq",
      "sr",
      "sv",
      "test2",
      "test",
      "uk",
      "vi",
    ],
    "wiktionary" => [
      "en",
      "de",
      "fr",
      "he",
    ],
    "wikibooks" => [
      "en",
      "es",
    ],
    "wikinews" => [
      "en",
    ],
    "wikiquote" => [
      "en",
    ],
    "wikisource" => [
      "en",
      "oldwikisource",
    ],
    "wikiversity" => [
      "en",
      "betawikiversity",
    ],
    "wikivoyage" => [
      "zh",
      "en",
      "pl",
    ],
    "chapter" => [],
    3 => "foundation",
    4 => "mediawikiwiki",
    5 => "nostalgia",
    6 => "outreach",
    7 => "strategy",
    8 => "tenwiki",
    9 => "testwiki",
    10 => "test2wiki",
    11 => "testwikidata",
    12 => "usability",
    13 => "wikidata",
    14 => "wikispecies",
    15 => "wikitech",
    "wmania" => [
      "wm2005",
      "wm2006",
      "wm2007",
      "wm2008",
      "wm2009",
      "wm2010",
      "wm2011",
      "wm2012",
      "wm2013",
      "wm2014",
      "wm2015",
      "wm2016",
    ],
  ]

>

The zh and en language variants for the wikipedia family show up in the expected place in $wgImportSources after running the hook handler. When I load the https://zh.wikipedia.beta.wmcloud.org/wiki/Special:Import page and pick wikipedia as the source wiki those two language variants are missing from the drop down list entirely. They are not at the head of the list nor in their respective alphabetical locations. At the moment I am not sure where those elements are being dropped.

I suspect the conversion of the import form to OOUI was done badly.

Have a look at the data-ooui attribute on the subproject dropdown (this is taken from the delivered HTML source of the page before any JS has run):

{"_":"OO.ui.DropdownInputWidget","options":[{"data":"wikivoyage::zh","label":"zh"},{"data":"wikivoyage::en","label":"en"},{"data":"wikipedia::aa","label":"aa"},{"data":"wikipedia::ar","label":"ar"},{"data":"wikipedia::bn","label":"bn"},{"data":"wikipedia::ca","label":"ca"},{"data":"wikipedia::crh","label":"crh"},{"data":"wikipedia::cs","label":"cs"},{"data":"wiktionary::de","label":"de"},{"data":"wikipedia::eo","label":"eo"},{"data":"wikibooks::es","label":"es"},{"data":"wikipedia::fa","label":"fa"},{"data":"wiktionary::fr","label":"fr"},{"data":"wiktionary::he","label":"he"},{"data":"wikipedia::hi","label":"hi"},{"data":"wikipedia::ja","label":"ja"},{"data":"wikipedia::ko","label":"ko"},{"data":"wikipedia::nl","label":"nl"},{"data":"wikipedia::ru","label":"ru"},{"data":"wikipedia::simple","label":"simple"},{"data":"wikipedia::sq","label":"sq"},{"data":"wikipedia::sr","label":"sr"},{"data":"wikipedia::sv","label":"sv"},{"data":"wikipedia::test2","label":"test2"},{"data":"wikipedia::test","label":"test"},{"data":"wikipedia::uk","label":"uk"},{"data":"wikipedia::vi","label":"vi"},{"data":"wikisource::oldwikisource","label":"oldwikisource"},{"data":"wikiversity::betawikiversity","label":"betawikiversity"},{"data":"wikivoyage::pl","label":"pl"},{"data":"wmania::wm2005","label":"wm2005"},{"data":"wmania::wm2006","label":"wm2006"},{"data":"wmania::wm2007","label":"wm2007"},{"data":"wmania::wm2008","label":"wm2008"},{"data":"wmania::wm2009","label":"wm2009"},{"data":"wmania::wm2010","label":"wm2010"},{"data":"wmania::wm2011","label":"wm2011"},{"data":"wmania::wm2012","label":"wm2012"},{"data":"wmania::wm2013","label":"wm2013"},{"data":"wmania::wm2014","label":"wm2014"},{"data":"wmania::wm2015","label":"wm2015"},{"data":"wmania::wm2016","label":"wm2016"}],"$overlay":true,"name":"subproject","value":"wikivoyage::zh","inputId":"ooui-php-24","required":false}

The expected entry wikipedia::en is getting clobbered by the entry for wikivoyage::en.

Brokenness is at https://github.com/wikimedia/mediawiki/blob/master/includes/Specials/SpecialImport.php#L388. This assumes subprojects are unique across the entire list of import sources, which we know not to be the case.

And indeed this bad behaviour appears to have been introduced by the OOUI conversion in T108792 by @Esanders.

And indeed this bad behaviour appears to have been introduced by the OOUI conversion in T108792 by @Esanders.

In local testing this diff makes the form functional again by de-duplicating the select labels:

diff --git i/includes/Specials/SpecialImport.php w/includes/Specials/SpecialImport.php
index 7fcec70856d..6c8773b5d4f 100644
--- i/includes/Specials/SpecialImport.php
+++ w/includes/Specials/SpecialImport.php
@@ -385,7 +385,7 @@ class SpecialImport extends SpecialPage {
                 foreach ( $this->importSources as $key => $value ) {
                     if ( is_array( $value ) ) {
                         foreach ( $value as $subproject ) {
-                            $subprojects[ $subproject ] = $key . '::' . $subproject;
+                            $subprojects[ $subproject . '.' . $key ] = $key . '::' . $subproject;
                         }
                     }
                 }

I added the $wgImportSources generated in T410109#11729016 to my LocalSettings.php for test data.

@Esanders does this look remotely correct as a fix for the duplicate label stripping bug? It feels like I must be missing something.

bd808 renamed this task from Some production sites on the beta cluster were unable to be selected as import targets to Special:Import form loses configuration data when $wgImportSources contains duplicate subprojects.Mar 23 2026, 3:55 PM

You'd need to look at HTMLSelectField.php and how that builds the OOUI widget, but your patch seems reasonable. It's possible the array key of $subprojects is just used for de-duplication. You could test this by just pushing to a plain array instead (no keys).

@bd808 Looks correct to me.

The issue here is essentially that HTMLForm (not just OOUI) does not support having two options in a dropdown with the same label, because of the data structure (labels are array keys). The same issue comes up in other places, e.g. T360326. Your change makes the labels unique, so it should fix the problem.

I'll merge that patch if someone submits it to Gerrit.

Change #1259243 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[mediawiki/core@master] SpecialImport: de-duplicate select labels by appending subproject value

https://gerrit.wikimedia.org/r/1259243

Change #1259243 merged by jenkins-bot:

[mediawiki/core@master] SpecialImport: ensure sub-select labels are unique

https://gerrit.wikimedia.org/r/1259243