Page MenuHomePhabricator

UrlShortener is shortening URLs from all domains
Closed, ResolvedPublicSecurity

Description

The summary is that: https://meta.wikimedia.org/wiki/Special:UrlShortener can currently shorten url to any domain.

In T255491, we want rename $wgUrlShortenerDomainsWhitelist to $wgUrlShortenerAllowedDomains per the reasons in the parent task.

Because the default value of the variable differs from the Wikimedia configuration, we introduced the new one in r609572 and made both to work. They do work currently, and the intent is that, once all wikis are on 1.35.0-wmf.41 (which contained the new variable, I think) then we can switch to it and later remove the former from the extension.

Since $wgUrlShortenerAllowedDomains is not yet defined in WMF config, it caused T258056, which we worked around, but unfortunately one issue still remains, and as I checked it this morning I found locally first, later on metawiki, that I can shorten url from any domain.

The allowed domains are correctly returned from wmf config

$allowed = UrlShortenerUtils::getAllowedDomainsRegex()
=> "^(.*\.)?wikipedia\.org$|^(.*\.)?wiktionary\.org$|^(.*\.)?wikibooks\.org$|^(.*\.)?wikinews\.org$|^(.*\.)?wikiquote\.org$|^(.*\.)?wikisource\.org$|^(.*\.)?wikiversity\.org$|^(.*\.)?wikivoyage\.org$|^(.*\.)?wikimedia\.org$|^(.*\.)?wikidata\.org$|^(.*\.)?mediawiki\.org$|"

But note the offending vertical bar character at the end of the line: ...mediawiki\.org$|". because this character comes after the end of line delimiter it's causing the regex to literally match anything.

But why do we added it?, this is to allow correct concatenation of the two regexes from the two variables. The reason why it appears from the end of the line now, is because the other regex is now empty array []. So it's actually like this ...mediawiki\.org$| .... empty array is here, that supposed to end with correct $ delimeter"

If we defined both $wgUrlShortenerDomainsWhitelist and $wgUrlShortenerAllowedDomains, without the vertical bar, the resultant regex will not be correct too.

To fix this we need to duplicate the values of $wgUrlShortenerDomainsWhitelist to $wgUrlShortenerAllowedDomains in wmf-config/CommonSettings.php, this will make the vertical bar to concat the regexes and disappear from the end of the regex, and ensure this is available on Meta-wiki as soon as possible. (But this has pitfall, see below)

However, it now occurred to me, that we forget one thing important through all these. Since short urls can only be created on Meta wiki, it would not matter if either of the variables is available or is even wrong on any wiki, so probably there's no need to wait for the variable change to be deployed on all wikis. Only what is available on Meta-wiki matters.

Another way to fix it (in the extension now), is to only add the vertical bar in the final regex if the second regex is not empty (what would happen when we do switch in CommonSettings.php)

$allowedDomains = $allowedDomainsOld . '|'  $allowedDomainsNew; // Current code
$join = is_array( $wgUrlShortenerAllowedDomains ) ? '|' : '';
$allowedDomains = $allowedDomainsOld . $join . $allowedDomainsNew; // only add vertical bar if the second arrays is not empty

The reason why the CommonSettings.php config change has problems too, is that it will fix the problem now, but the moment we remove $wgUrlShortenerDomainsWhitelist from there, it will become false (default) which will cause T258056 to recur.

Probably there are better ways to fix this too.

Event Timeline

Ammarpad renamed this task from UrlShortener is shortening all URLs to UrlShortener is shortening URLs from all domains.Jul 16 2020, 7:55 AM
Ammarpad updated the task description. (Show Details)

Will something like this work?

if ($old && !$new) {
return $old;
} elseif ($new && !old) {
return $new;
} else {
return $old . '|' . $new;
}

I'd prefer we fix this in the extension rather than config though. If you want to upload a patch to this task I can review & deploy it.

Legoktm triaged this task as Unbreak Now! priority.Jul 16 2020, 8:52 AM

Yes, I believe it will work and finally be like this: (note the formatting is not great here).

public static function getAllowedDomainsRegex() {
global $wgUrlShortenerDomainsWhitelist, $wgUrlShortenerAllowedDomains, $wgServer;
if ( $wgUrlShortenerDomainsWhitelist === false && $wgUrlShortenerAllowedDomains === false ) {
 // Allowed Domains not configured, default to wgServer
 $serverParts = wfParseUrl( $wgServer );
 $allowedDomains = preg_quote( $serverParts['host'], '/' );
 } else {
// Collapse the allowed domains into a single string, so we have to run regex check only once
 $allowedDomainsOld = implode( '|', array_map(
	function ( $item ) {
		return '^' . $item . '$';
	},
	// @phan-suppress-next-line PhanTypeMismatchArgumentInternal
	$wgUrlShortenerDomainsWhitelist
 ) );

 // Transition period: Collect and merge regex from the new config flag. 
 $allowedDomainsNew = implode( '|', array_map(
	function ( $item ) {
		return '^' . $item . '$';
	},
	is_array( $wgUrlShortenerAllowedDomains )
		? $wgUrlShortenerAllowedDomains
		: []
 ) );

 if ( $wgUrlShortenerDomainsWhitelist && !$wgUrlShortenerAllowedDomains ) {
	$allowedDomains = $allowedDomainsOld;
 } elseif ( $wgUrlShortenerAllowedDomains && !$wgUrlShortenerDomainsWhitelist ) {
	$allowedDomains = $allowedDomainsNew;
 } else {
	$allowedDomains = $allowedDomainsOld . '|' . $allowedDomainsNew;
 }
}

return $allowedDomains;
}

I think we can reasonably just apply this through gerrit rather than as a security patch.

Can we fetch if there's any non-Wikimedia domain currently using the w.wiki URL shortener and disable those?

Deployed, and seems working. Can someone else confirm?

Can we fetch if there's any non-Wikimedia domain currently using the w.wiki URL shortener and disable those?

I believe they're very few. But yes, someone can check to see whether further action is needed. The extension table is not public.

Deployed, and seems working. Can someone else confirm?

Tested, it works for me too.

Should we ask stewards to delete the ones pointing to the outside?

I think stewards can only delete them if given the short codes. I created one with a localhost link and a yahoo link (I think), but frankly I cannot even remember the codes

Someone has to query the database table and search for URLs that do not match the wmf regex, or run the dumpURLs.php maintenance script, look for the unwanted codes, and then give them to stewards to delete.

I think stewards can only delete them if given the short codes. I created one with a localhost link and a yahoo link, but frankly I cannot even remember the codes

Someone has to query the database table and search for URLs that do not match the wmf regex, or run the dumpURLs.php maintenance script, look for the unwanted codes, and then give them to stewards to delete.

I do that.

Can you make a regression test in the mean time to make sure it doesn't happen again?

Can you make a regression test in the mean time to make sure it doesn't happen again?

Yes, tests were already added in 85252b3 to cover the edge cases.

chasemp lowered the priority of this task from Unbreak Now! to Medium.Jul 16 2020, 12:46 PM
chasemp added a subscriber: chasemp.

Seems addressed but cleanup remains? Downgrading priority to reflect for now if so. Secteam will also discuss in our weekly.

Thanks to all, esp @Ladsgroup and @Ammarpad

These are the ones that are not in the allowed list of domains (I haven't clicked on any of those):

WzU|https://tabernacle.toolforge.org/#/tab/sparql/SELECT%20DISTINCT%20%3FObra%20%3FNome_da_Obra%20%3Fimagem%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Alabel%20%7Bbd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cpt-br%2Cen%22.%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3FObra%20rdfs%3Alabel%20%3FNome_da_Obra.%7D%0A%20%20%3FObra%20wdt%3AP195%20wd%3AQ56677470.%0A%20%20%7B%3FObra%20wdt%3AP31%20wd%3AQ14659.%7D%0A%20%20UNION%20%7B%3FObra%20wdt%3AP31%20wd%3AQ56556193.%7D%0A%20%20UNION%20%7B%3FObra%20wdt%3AP31%20wd%3AQ524980.%7D%0A%20%20UNION%20%7B%3FObra%20wdt%3AP31%20wd%3AQ799000.%7D%0A%20%20UNION%20%7B%3FObra%20wdt%3AP31%20wd%3AQ552345.%7D%0A%20%20UNION%20%7B%3FObra%20wdt%3AP31%20wd%3AQ161439.%7D%0A%20%20UNION%20%7B%3FObra%20wdt%3AP31%20wd%3AQ131647.%7D%0A%20%20UNION%20%7B%3FObra%20wdt%3AP31%20wd%3AQ41207.%7D%0A%20%20UNION%20%7B%3FObra%20wdt%3AP31%20wd%3AQ80071.%7D%0A%20%20UNION%20%7B%3FObra%20wdt%3AP31%20wd%3AQ18336.%7D%0A%20%20%3FObra%20wdt%3AP18%20%3Fimagem.%0A%7D/P18%3BP1684%3BP6568
Wzu|https://www.youtube.com/watch?v=wIFHO55De-c
X2D|https://tabernacle.toolforge.org/#/tab/sparql/SELECT%20%3FDescritor%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Alabel%20%7Bbd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cpt%2Cpt-br%2Cen%2Ces%2Cfr%2Cde%22.%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3FObra%20rdfs%3Alabel%20%3FNome_da_Obra.%7D%0A%20%20%3FObra%20wdt%3AP195%20wd%3AQ56677470.%0A%20%20%3FObra%20wdt%3AP180%20%3FDescritor.%0A%20%20MINUS%20%7B%0A%20%20%20%20%3FDescritor%20rdfs%3Alabel%20%3FNome_do_Descritor.%0A%20%20%20%20FILTER(LANG(%3FNome_do_Descritor)%3D%22pt-br%22)%0A%20%20%7D%0A%7D/Lpt-br%3BLpt
X2W|https://wiki.tockdom.com/wiki/MKWii_Network_Protocol/SELECT
X3F|https://elcomercio.pe/
X3M|https://docs.google.com/document/d/1FtvCBFPDIzROxykhVks_1Q2THpnJg4Q9C-m_iND3jjs/edit?disco=AAAAJ82g3tk
X3m|https://localhost/core/index.php/Special:Statistics

Deleted, per Ladsgroup's request.

I think stewards can only delete them if given the short codes. I created one with a localhost link and a yahoo link (I think), but frankly I cannot even remember the codes

I never had to delete a Short URL but from the look of the interface, yes, we could oversight (suppress) only if we know the offending Short URL as we don't have access to the shorturl table that is nonpublic last time I checked.

Thanks to @Ladsgroup for querying the table and to @Urbanecm for the deletions. I checked https://w.wiki/X3F and returns a "Short URL not found" message so I think those deletions worked as expected.

What's left to do here?

Put the patches in Gerrit, backport where appropriate and open up the task?

What's left to do here?

Put the patches in Gerrit, backport where appropriate and open up the task?

Remain only one backport patch now: r614772. The patches are already uploaded on Gerrit.

Remain only one backport patch now: r614772. The patches are already uploaded on Gerrit.

Ok, we can make this task public then. Should these not also try to go to 1.31, 1.34 and 1.35 if possible?

sbassett lowered the priority of this task from Medium to Low.Jul 20 2020, 8:52 PM
sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".

Ok, we can make this task public then. Should these not also try to go to 1.31, 1.34 and 1.35 if possible?

I am not sure of 1.31 and 1.34, the problem does not exist there. It was introduced in 1.35 and the first two fixes are already backported to the branch, remain the final one now. But @Ladsgroup would know better.

Ok, we can make this task public then. Should these not also try to go to 1.31, 1.34 and 1.35 if possible?

I am not sure of 1.31 and 1.34, the problem does not exist there. It was introduced in 1.35 and the first two fixes are already backported to the branch, remain the final one now. But @Ladsgroup would know better.

Yup, that's correct.

Change 745651 had a related patch set uploaded (by Reedy; author: Ammarpad):

[mediawiki/extensions/UrlShortener@REL1_35] Fix config variables regex concatenation

https://gerrit.wikimedia.org/r/745651

Change 745651 merged by jenkins-bot:

[mediawiki/extensions/UrlShortener@REL1_35] Fix config variables regex concatenation

https://gerrit.wikimedia.org/r/745651