Page MenuHomePhabricator

Files which come from ForeignAPIRepo and are also foreign on the upstream wiki break thumbnailing and file description pages
Open, LowPublic

Description

When a file is "recursively" foreign (e.g. the local wiki has English Wikipedia as a ForeignAPIRepo, and the file is actually hosted on Commons which is a ForeignDBviaLBRepo for English Wikipedia), the wiki will end up with an empty file description page and, when hotlinking, broken thumbnails, since the URLs used for these assume the file is local to the configured foreign repo.

Sample page: https://sandbox.semantic-mediawiki.org/wiki/MultipleFileRepos

Can be worked around by making sure that upstreams of upstreams have their own $wgForeignFileRepos entry and come sooner in the list, but (besides being non-intuitive, cumbersome and fragile) that will give the wrong result when the upstream and the upstream's upstream have different files with the same name.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Tgr We briefly talked about this in Portland. Admittedly this report is a bit thin in information. I will now try to set up multiple foreign repos via $wgForeignFileRepos on a MediaWiki 1.32.x

Basically it was possible to fetch the files but not the descriptions. This went berserk on the file pages.

The use case would be a e.g. a German language wiki on French history that embeds files from commons, dewiki and frwiki.

Some kind of cache key conflict I imagine? In theory it shouldn't be problematic.

@Tgr Now I know again what the issue was:

Setting this is cool as expected:

$wgUseInstantCommons = true;

Settings this breaks InstantCommons only (404 Not Found), i.e. en and fr are cool:

# commonswiki
$wgUseInstantCommons = true;

# enwiki
$wgForeignFileRepos[] = [
	'class' => 'ForeignAPIRepo',
	'name' => 'enwiki',
	'apibase' => 'https://en.wikipedia.org/w/api.php',
	'url' => 'https://upload.wikimedia.org/wikipedia/en',
	'thumbUrl' => 'https://upload.wikimedia.org/wikipedia/en/thumb',
	'hashLevels' => 2,
	'transformVia404' => true,
	'fetchDescription' => true,
	'descriptionCacheExpiry' => 43200,
	'apiThumbCacheExpiry' => 86400
];

# frwiki
$wgForeignFileRepos[] = [
	'class' => 'ForeignAPIRepo',
	'name' => 'frwiki',
	'apibase' => 'https://fr.wikipedia.org/w/api.php',
	'url' => 'https://upload.wikimedia.org/wikipedia/fr',
	'thumbUrl' => 'https://upload.wikimedia.org/wikipedia/fr/thumb',
	'hashLevels' => 2,
	'transformVia404' => true,
	'fetchDescription' => true,
	'descriptionCacheExpiry' => 43200,
	'apiThumbCacheExpiry' => 86400
];

Since most files are on commons I prefer to do InstantCommons only. It will however be nice if all three work. See this page for a live demo.

Setup (updated)

  • MediaWiki 1.32.0 (8a2f437) 00:17, 10. Feb. 2019
  • PHP 7.0.33-0+deb9u1 (apache2handler)
  • MariaDB 10.1.37-MariaDB-0+deb9u1

$wgUseInstantCommons adds the foreign repo declaration to the end of the list so enwiki is the first foreign repo where it is found, and then thumbnail URL calculation gets messed up (you can see the thumbnail links to https://upload.wikimedia.org/wikipedia/en/thumb/... while the real URL is https://upload.wikimedia.org/wikipedia/commons/thumb/...
Similar for the description page, it is fetched from enwiki so you end up with the contents of https://en.wikipedia.org/wiki/File:Mallnitz_Seebachtal_Wasserf%C3%A4lle_02.jpg?action=render

Possible fixes:

  • disable transformVia404. I have no idea if this has any effect on how the file is fetched from the upstream, vs. how it is linked locally... but worth a shot. Also if it works it is probably still not suitable for busy servers as non-404 transforms are expensive.
  • put Commons first (you'll have to define the foreign repo manually for that). If both Commons and enwiki/frwiki have a file with the same name, you get the wrong file.
  • disable hotlinking (ie. remove the thumbUrl line - note btw that having both that and a thumb cache expiry does not make much sense). Probably fixes thumbnail loading but more load on your server + more fragile as thumbnailing is done locally. Does not fix description page issues.
  • add a new 'localOnly' flag to the foreign file repo definition, make ForeignAPIRepo ignore non-local files with that flag. This would be the nice way. Not sure if there is an easy way to figure out where the file is from when doing description page fetches.
  • somehow figure out when a "recursively foreign" file is actually from a repo that exists lower down your $wgForeignFileRepos list, and use that repo instead. A bit better than the previous one (imagine the situation where your foreign repos are wiki1, wiki2 and wiki3 in that order, wiki3 is also a foreign repo for wiki2, and wiki2 and wiki3 have an identically named file; although I doubt that kind of situation would come up in practice), not sure how realistic it is (we don't have any kind of global wiki ID; I guess file URL prefixes could serve as keys?). Also has the same problem for description pages as the previous one.
  • in combination with some of the above, maybe change action=render behavior for file pages of non-local files? I think the (very long term) plan is to do something like that during the dismantling of WikiPage. Would be nice to fetch the description page via the API, but that's blocked on having efficiently cacheable REST APIs, which does not seem to be happening any time soon...
Tgr renamed this task from Possibility to configure more than one foreign file repo via $wgForeignFileRepos' ForeignAPIRepo class to Files which come from ForeignAPIRepo and are also foreign on the upstream wiki break thumbnailing and file description pages.Feb 13 2019, 9:52 PM
Tgr updated the task description. (Show Details)

This will probably still happen if the wiki has a single foreign repo (and that wiki has its own foreign repo) in which case none of the above solutions work. One obstacle to fixing this properly is that file description pages are not fetched via the API, as mentioned above. The other is that thumbnail URLs have to be composed locally (or fetched via the API, which would work recursively but is expensive). In theory we could have a hotlink property instead of the current thumbUrl hack and fetch correct the prefix via API. That actually might not be too hard.

$wgUseInstantCommons adds the foreign repo declaration to the end of the list so enwiki is the first foreign repo where it is found,

I believe this was the decisive tip. After adding "commonswiki" directly without depending on the InstantCommons feature all files show and seem to be fetched form the correct location including description etc.

So this is the working setup which includes 404 handling:

## Repositories

# commonswiki
$wgForeignFileRepos[] = [
	'class' => 'ForeignAPIRepo',
	'name' => 'commonswiki',
	'apibase' => 'https://commons.wikimedia.org/w/api.php',
	'url' => 'https://upload.wikimedia.org/wikipedia/commons',
	'thumbUrl' => 'https://upload.wikimedia.org/wikipedia/commons/thumb',
	'hashLevels' => 2,
	'transformVia404' => true,
	'fetchDescription' => true,
	'descriptionCacheExpiry' => 43200
];
    
# enwiki
$wgForeignFileRepos[] = [
	'class' => 'ForeignAPIRepo',
	'name' => 'enwiki',
	'apibase' => 'https://en.wikipedia.org/w/api.php',
	'url' => 'https://upload.wikimedia.org/wikipedia/en',
	'thumbUrl' => 'https://upload.wikimedia.org/wikipedia/en/thumb',
	'hashLevels' => 2,
	'transformVia404' => true,
	'fetchDescription' => true,
	'descriptionCacheExpiry' => 43200
];

# frwiki
$wgForeignFileRepos[] = [
	'class' => 'ForeignAPIRepo',
	'name' => 'frwiki',
	'apibase' => 'https://fr.wikipedia.org/w/api.php',
	'url' => 'https://upload.wikimedia.org/wikipedia/fr',
	'thumbUrl' => 'https://upload.wikimedia.org/wikipedia/fr/thumb',
	'hashLevels' => 2,
	'transformVia404' => true,
	'fetchDescription' => true,
	'descriptionCacheExpiry' => 43200
];

Thanks for the note about the thumb cache expiry.

I have not tested what happens if files with a certain name exist locally and/or on one of the foreign file repos. To me it appears that the local file should always take precedence and when no local file is available fetching should stop as soon as a file was found in the order of repos defined via the configuration.

Local files do always take precedence (LocalRepo is always the first to be checked). If the file does not exist locally but does exist on both enwiki and commonswiki, the above config uses the one from commonswiki, which is probably not what you'd want. A fairly minor issue though, such files are rare.

Kghbln triaged this task as Low priority.
Kghbln added a subscriber: Paladox.

Sorry, I obviously have no clue how Phabricator works. :(