Page MenuHomePhabricator

Collect information about old Parser `getTargetLanguage` usage
Closed, ResolvedPublic

Description

Background Information

In order to understand if the behavior being used by the getTargetLanguage, i. e. the target language being the page language versus the user language, we should collect some information about this usage to decide how to create parity in the Parsoid interface.

How

Add a logging message in the target language function when page and user language are different.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
MSantos triaged this task as Medium priority.Jun 29 2022, 8:02 AM

Change 809587 had a related patch set uploaded (by MSantos; author: MSantos):

[mediawiki/core@master] Log target language behavior

https://gerrit.wikimedia.org/r/809587

Change 809937 had a related patch set uploaded (by MSantos; author: MSantos):

[mediawiki/extensions/Kartographer@master] Log target language behavior

https://gerrit.wikimedia.org/r/809937

Change 809587 abandoned by MSantos:

[mediawiki/core@master] Log target language behavior

Reason:

As per discussion in slack, this change is spammy and we should do it in Kartographer only hence abandoning it in favor of I3f944b81ff8b4e998741dc53a3cd6d92b1b2f254

https://gerrit.wikimedia.org/r/809587

Change 809937 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Log target language behavior

https://gerrit.wikimedia.org/r/809937

After this log being available for a while, we can see some insights from the language usage: https://logstash.wikimedia.org/goto/ed727275e91709dcc1187f4a31a9ed58.

We can see a few occurrences of user languages not being the same as the page language, see the top 3 results below:

normalized_message 	                                                Count 
Target language (hu) is different than page language (en) (T311592)	7,463
Target language (fr) is different than page language (en) (T311592)	4,676
Target language (de) is different than page language (en) (T311592)	3,170

The wikis affected are listed below:

server	                #
uk.wikipedia.org	115,580
ar.wikipedia.org	49,937
vec.wikipedia.org	28,870
commons.wikimedia.org	8,888
ru.wikipedia.org	6,291

Assuming the Page Language is always the same for TargetLanguage is not secure, although the impact is low. cc/ @ssastry, @Arlolra, and @cscott

I'm closing this ticket since the investigation is done, but some actions will be needed in the Kartographer compatibility with parsoid.

Change 883845 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Kartographer@master] [DNM] Remove obsolete page language logging

https://gerrit.wikimedia.org/r/883845

As far as I can tell the log entries are all from Wikimedia Commons. On Commons we split the parser cache by language. The basic wiki language and therefor the page language for all pages is English. Still we render the page in the user language and store it as such in the parser cache. The user language becomes the target language.

What this means is – as far as I understand it – that we need to continue using the parser's target language. We can not simplify this to use the page language instead.

Change #883845 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Remove obsolete page language logging

https://gerrit.wikimedia.org/r/883845