Page MenuHomePhabricator

FileImporter should not percent encode URLs in log entries
Open, Needs TriagePublic


In ukwiki deletion log, I see entries like

11:17, 13 червня 2020 AlexKozur обговорення внесок заблокувати вилучив сторінку Файл:Берлін 2011-2.JPG (Цей файл зараз на сайті Wikimedia Commons як (переміщено за допомогою FileImporter).) (переглянути/відновити) (подякувати)

Instead of the links should go asБерлін_2011-2.JPG (better still would be if the extension could detect intewiki map entry and use the shortest form possible)

It seems to be fileimporter-delete-summary and the link is passed in but I see that there is also a similar message fileimporter-cleanup-summary, so it is good to look into this more broadly.

Event Timeline

Base created this task.Jun 13 2020, 1:58 PM
thiemowmde added subscribers: thiemowmde, Lena_WMDE.

The FileImporter extension doesn't do anything special here. This is standard MediaWiki core behavior. All we do is use Title::getFullURL(), which returns the URL like this. All URLs are encoded like this.

Undoing this encoding is probably not a good idea, as it might result in a broken URL. After all, core does this encoding for a reason.

One possible solution is already mentioned above: Format the link as [[commons:…]] with the correct interwiki prefix. We work with interwiki links like this in other places in the codebase already. However, most of these are the other way around, from Commons back to the source wiki. We need to request the source wiki's interwiki map (example). Look at the existing InterwikiTablePrefixLookup, which does something very similar already. Note that just card-coding commons: is not sustainable, even if it might work in >99% of the cases.

Change 620944 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Use wfExpandIRI() for compact, readable URLs in summaries

Change 620944 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Use wfExpandIRI() for compact, readable URLs in summaries