Page MenuHomePhabricator

Consolidate wgUrlShortenerAllowedDomains and wgUrlShortenerApprovedDomain (no regexes in config)
Open, LowPublic

Description

Status quo

Example from operations/mediawiki-config:
$wgUrlShortenerAllowedDomains = [
	'(.*\.)?wikipedia\.org',
	'(.*\.)?wiktionary\.org',
	'(.*\.)?wikibooks\.org',
	'(.*\.)?wikinews\.org',
	'(.*\.)?wikiquote\.org',
	'(.*\.)?wikisource\.org',
	'(.*\.)?wikiversity\.org',
	'(.*\.)?wikivoyage\.org',
	'(.*\.)?wikimedia\.org',
	'(.*\.)?wikidata\.org',
	'(.*\.)?wikifunctions\.org',
	'(.*\.)?mediawiki\.org',
	'.*\.svc\.toolforge\.org',
];
$wgUrlShortenerApprovedDomains = [
	'*.wikipedia.org',
	'*.wiktionary.org',
	'*.wikibooks.org',
	'*.wikinews.org',
	'*.wikiquote.org',
	'*.wikisource.org',
	'*.wikiversity.org',
	'*.wikivoyage.org',
	'*.wikimedia.org',
	'*.wikidata.org',
	'*.wikifunctions.org',
	'*.mediawiki.org',
];

Problem

When changing the UrlShortener extension configuration, there are two problems:

  1. One must know to update both wgUrlShortenerAllowedDomains (used for the UI) and wgUrlShortenerApprovedDomain (used for the backend validation). Otherwise it will quietly fail by working in the backend but being hidden from the UI, or showing in the UI but not actually working.
  2. One must write a regex without mistakes and without unit tests.

In particular:

  • We can derive both the validation rules and the user interface message from a single human-readable array. We already do this today in MediaWiki core for $wgNoFollowDomainExceptions, as well as $wgExternalLinksIgnoreDomains (T405005) which is backed by UrlUtils::matchesDomainList, which is a stable API provided by MediaWiki core. And the SpecialLinkSearch UI uses code almost identical to UrlShortener to display these in a user-friendly way (SpecialLinkSearch.php vs SpecialUrlShortener.php)
  • This does not require regexes on either site, not in configuration, and not generated internally, either.

Proposal 1: Simple with subdomains

  • Migrate validation logic to use wgUrlShortenerApprovedDomains directly with UrlUtils::matchesDomainList, like MediaWiki core does. The UI stays the same as today.
  • Remove wgUrlShortenerAllowedDomains, no replacement or deprecation needed.

Given that AllowedDomains takes a regex today, it could in theory be used for any arbitrary pattern that matches partial domains or match only apex-without-subdomain. We would no longer support that. In practice, this is AFAIK only used with one or more escaped domain names and (.*\.)? prefix (allowing subdomains).

For back-compat, we would strip any *. prefix from the ApprovedDomains list, and UrlUtils::matchesDomainList allows subdomains by default.

Proposal 2: Keep subdomain control

While the above would suffice for the default configuration and for the WMF configuration, if we want to support allowing a specific domain, without allowing its subdomains, we could keep subdomain control like so:

  • Migrate validation logic to:
    • Take entries in wgUrlShortenerApprovedDomains that start with .*, strip it, and check with UrlUtils::matchesDomainList (allows subdomains)
    • Any other entries are checked with in_array against $urlParts['host'] (does not allow subdomains).
  • Remove wgUrlShortenerAllowedDomains, no replacement or deprecation needed.

Event Timeline

Krinkle triaged this task as Low priority.

One thing to consider too (which is not a very strong desire, don't get me wrong) was to actually not advertise some domains we are allowing (explicitly the svc for toolforge). Not a big deal but it'd be nice to have two different values. One for showing to users and one for checking. Of course, it doesn't need to stay as is either.

Aklapper added a subscriber: DAlangi_WMF.

Unassigning inactive task assignee (please do so as part of offboarding)