Page MenuHomePhabricator

LanguageFallbackChain does not end in 'en' for language codes that are not a valid format
Closed, ResolvedPublic

Description

We expect all language fallback chains for Terms (i.e. Labels, Descriptions, and Aliases) to end with ('en').
These chains are constructed in \Wikibase\Lib\LanguageFallbackChainFactory\Wikibase\Lib\LanguageFallbackChainFactory.
This is consistent with current expected behaviour and also core https://github.com/wikimedia/mediawiki/blob/master/includes/language/LanguageFallback.php#L95

Acceptance criteria

  • adjust TermLanguageFallbackChain itself to always return at least en
  • add a test case to TermLanguageFallbackChainTest that tests with a language that is not a valid language code (such as an emoji or ⧼Lang⧽)

Notes:

This has already caused the following incidents:

Event Timeline

Adding Pablo as the IM in charge for the ongoing incident, Lydia as the PM for Wikidata, Thiemo as the dev that provided the patch and Adam as the one in charge of tech tickets.

I believe this caused both T259744 and T259745. I already created T259779 where I describe an idea that is a little different from "should always and in en". One of the two tickets might be a duplicate now.

Michael renamed this task from LanguageFallbackChain does not end in 'en' to LanguageFallbackChain does not end in 'en' (i.e. repo wiki content language).Aug 12 2020, 3:20 PM
Michael updated the task description. (Show Details)

This is not technically a Regression as it was always broken for "languages" like ⧼Lang⧽ or %E2%A7%BCLang%E2%A7%BD . However, since we have validation for our TermLanguageFallbackChain, this problem/bug has become more apparent, because now the chain can be empty whereas before it at least contained the invalid interface language.

Also, I think this is still causes rare-ish production errors. While they may not have a big impact, they do show up in our logs. Confusingly, they only show up since the last train on August 12th, not since the week before as I would have expected: T260384: Wikimedia\Rdbms\Database::makeList: empty input for field wbxl_language

image.png (204×1 px, 25 KB)

Change 620689 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Inject MediaWiki language services into LanguageFallbackChainFactory

https://gerrit.wikimedia.org/r/620689

Change 620689 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Inject MediaWiki language services into LanguageFallbackChainFactory

https://gerrit.wikimedia.org/r/620689

Is this done?

Unfortunately: no. This was only a minor refactoring removing some calls to deprecated methods. The underlying problem that this ticket is about (and that might be the source of some production errors/logspam) is still undone and actually quite tricky. But we are on it :)

Picked up in storytime, with the plan of initially creating an investigation ticket.

Addshore renamed this task from LanguageFallbackChain does not end in 'en' (i.e. repo wiki content language) to LanguageFallbackChain does not end in 'en' for language codes that are not a valid format.Aug 18 2020, 12:59 PM
Addshore updated the task description. (Show Details)

Change 621262 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/Wikibase@master] Fix Term Fallback chains being empty

https://gerrit.wikimedia.org/r/621262

Change 621262 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Fix Term Fallback chains being empty

https://gerrit.wikimedia.org/r/621262

Change 621305 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/Wikibase@master] DNM: Account for empty fallback chain on Unicode language string

https://gerrit.wikimedia.org/r/621305

I'm somewhat stuck with adding the test in this patch: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/621305
The trouble is that I can't think of a way to actually test that the mock is adjusted that wouldn't also be green if it weren't.

Change 621305 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Account for empty fallback chain on Unicode language string

https://gerrit.wikimedia.org/r/621305

When this is deployed with the next train, then we should check whether T260384 still occurs. (logstash)

Michael set Due Date to Aug 26 2020, 10:00 PM.Aug 20 2020, 12:45 PM

I think this can now be closed:

image.png (200×1 px, 23 KB)