Page MenuHomePhabricator

[20h] Why is the url key undefined in language objects for categories?
Open, MediumPublicPRODUCTION ERROR

Description

Notes

This warning was recorded 25 times in the last 7 days. It's infrequency made it not excluded at first and only show up this week as potential regression, but turned out to be a pre-existing issue around for at least 30 days that has not been addressed yet. Filing retroactively with this task.

Source: https://gerrit.wikimedia.org/g/mediawiki/extensions/MobileFrontend/+/9fb7a9519d6911091d6d4738e8e7f8bde7b4d8c8/includes/specials/SpecialMobileLanguages.php#96

It was seen on multiple wikis and page. I've attached two records as example

dinwiki

Request ID: W6GboQrAIEUAAIzApv0AAAAH

channelmobile
levelWARNING
messageurl key is undefined in language object
langObject.langjam
langObject.titleCategory:Ieja
wikidin.wikipedia.org
url/wiki/K%C3%ABc%C3%ABweek:MobileLanguages/Bek%C3%A4takthook:Athi%C9%9B

eswiki

Request ID: W4n8cgpAADgAACAL3XwAAAAY

channelmobile
levelWARNING
messageurl key is undefined in language object
langObject.langbn
langObject.titleবিষয়শ্রেণী:গোল্ডেন গ্লোব পুরস্কার (সেরা অভিনেত্রী - সঙ্গীতধর্মী বা হাস্যরসাত্মক চলচ্চিত্র) বি�
wikies.wikipedia.org
url/wiki/Especial:MobileLanguages/Categor%C3%ADa:Ganadoras_del_Globo_de_Oro_a_la_mejor_actriz_de_comedia_o_musical_(cine)

See also:

Developer notes

So, I took some time investigating this.

The issue seems to be impacting category pages only.

I assume languages for different categories are sourced from Wikidata.

For this page:
https://www.wikidata.org/wiki/Q5964

I see a link to jam (Jumiekan Patwa) wikipedia (https://jam.wikipedia.org/wiki/Category:Kiandidet_fi_kuik-kuik_diliishan)

However when I visit
https://din.wikipedia.org/wiki/Bek%C3%A4takthook:Candidates_for_speedy_deletion or https://din.m.wikipedia.org/wiki/K%C3%ABc%C3%ABweek:MobileLanguages/Bek%C3%A4takthook:Candidates_for_speedy_deletion I cannot access the Jumiekan Patwa version of that language via Universal Language selector OR mobile, even though it exists.

A few more examples are given https://logstash.wikimedia.org/goto/748b767cd56ca9186f972df2f85e962a

This means we're not making certain languages available to our users and seems worthy of further investigation.

I'm a bit out of my depth here, so help from language team/wikidata in understanding how these links are made is needed.

Event Timeline

Jdlrobson moved this task from Incoming to Needs Prioritization on the Web-Team-Backlog board.
Jdlrobson lowered the priority of this task from High to Medium.Sep 19 2018, 9:46 PM
Jdlrobson added subscribers: pmiazga, Jdlrobson.

These logs seem to be intentional and were added in 456264d8 by @pmiazga
I'm guessing they should they go to a different logger channel? https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/master/includes/specials/SpecialMobileLanguages.php#L95

The log channel is fine. But I assume the reason for the log was not about indefinitely storing analytical information that we can query directly if needed. Rather, I assume the reason is to investigate whether a problem exists, and if found, why and how to solve it.

Well, now that you have the information, I guess a decision needs to be made:

  1. E.g. the missing of url is totally normal in a use case not previously foreseen. In which case the code can be updated to accomodate that reality without the need to log anything.
  1. Or, if the circumstances under which url is missing are still suspicious, the data should help determine its cause, and help fix/prevent it at the source of this data, in which case you'd probably keep the logic to detect the problem, but it would no longer be triggered under normal circumstances (as is the case today).
  1. Alternatively, if this log message is not actionable, not indicative of a problem, and is still of interest to you in some way, then it would make sense to use severity info() instead of warning().

So, I took some time investigating this.

The issue seems to be impacting category pages only.

Jdlrobson renamed this task from Special:MobileLanguages emits warning "url key is undefined in language object" to Language objects for categories sometimes missing url field (Special:MobileLanguages emits warning "url key is undefined in language object").Sep 26 2018, 9:21 PM
Jdlrobson updated the task description. (Show Details)
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:08 PM

Is this still relevant?

And is this related to ULS and Compact Links?

No, it has no relation to frontend or other ULS code afaics. It's an error from PHP.

Sorry thought these were the same. Will rephrase the title.

Jdlrobson renamed this task from Language objects for categories sometimes missing url field (Special:MobileLanguages emits warning "url key is undefined in language object") to Why is the url key undefined in language objects for categories?.Nov 20 2020, 7:02 PM
Jdlrobson removed a project: MobileFrontend.

I guess this needs input from a wikidata expert?

Reproduction path not clear. It has been speculated that a possible way to investigate it would be inspecting the relevant Wikibase logic live on production server to identify at which point the sitelink is lost.

The investigation will be timeboxed to 20 hours. Results of the investigation will be reported on the ticket.

Jakob_WMDE renamed this task from Why is the url key undefined in language objects for categories? to Why is the url key undefined in language objects for categories? [20h].Nov 26 2020, 1:27 PM
ItamarWMDE renamed this task from Why is the url key undefined in language objects for categories? [20h] to [20h] Why is the url key undefined in language objects for categories?.Nov 26 2020, 1:49 PM

I'm wondering if T233520 is potentially related.

This one turns out not to be Wikibase’s fault at all. Wikibase correctly passes a jam:Category:Kiandidet fi kuik-kuik diliishan sitelink into the ParserOutput; but MediaWiki can’t parse it – compare my sandbox:

Screenshot_2020-11-27 Dulooi Lucas Werkmeister (WMDE) sandbox - Wikipedia.png (243×703 px, 26 KB)

The solution to this riddle is that Jam is the Dinka name for the Talk namespace, and therefore this interwiki link is actually parsed as an internal link, and no link to Patois Wikipedia is generated. (As far as I can tell, this means it’s not possible to link from dinwiki to jamwiki at all, whether through Wikibase or manually, unless you “take a detour” with something like [[en:jam:...]].) One way to solve this would probably be to change the interface by which Wikibase communicates the sitelinks to MediaWiki: currently, we have an object like { siteId: 'jamwiki', title: 'Category:Kiandidet fi kuik-kuik diliishan' }, which we then turn into a string only for MediaWiki to try to parse it again – and MediaWiki interprets the string differently than we intend, and thinks the jam: indicates the Talk namespace.

But that doesn’t explain why e. g. the Tamil sitelink for Q7086090 is missing on Spanish and English Wikipedia.

Edit: For Jam being the Dinka name for the Talk namespace, see also T309688: Test that namespace names are not identical to language codes.

But that doesn’t explain why e. g. the Tamil sitelink for Q7086090 is missing on Spanish and English Wikipedia.

Okay, this is a different underlying cause, but the same general issue that Wikibase puts string language links into the ParserOutput, and then MediaWiki fails to parse them correctly. This time, the title “பகுப்பு:பன்னாட்டு இயற்கைப் பாதுகாப்புச் சங்கத்தின் செம்பட்டியல் - தீவாய்ப்புக் கவலை குறைந்த இனம்” is valid on tawiki, because the “பகுப்பு” is the Category namespace and the remaining part is only 244 bytes long; but on enwiki, the title “ta:பகுப்பு:பன்னாட்டு இயற்கைப் பாதுகாப்புச் சங்கத்தின் செம்பட்டியல் - தீவாய்ப்புக் கவலை குறைந்த இனம்” doesn’t parse, because enwiki doesn’t know about the namespace prefix, and the whole string (minus “ta:”) is over 255 bytes long.

(Side note: be careful when copying that title around – on my system, some diacritic dots like to get lost for some reason, which significantly changes the string. This can be confusing.)

(Aside: this was jointly debugged in a 1½h call between @Silvan_WMDE, @toan, @noarave, @Tonina_Zhelyazkova_WMDE, and myself; I reckon that means we used up about 14h of the 20h timebox. But I think the investigation concluded successfully, and now someone™ needs to figure out who is even responsible for fixing this.)

I started looking a bit through the related code, and it seems we’re not the first to notice that squashing the interwiki prefix and title into one string is not the best way to store this data:

LinksUpdate::__construct( Title $title, ParserOutput $parserOutput, $recursive = true )
# Convert the format of the interlanguage links
# I didn't want to change it in the ParserOutput, because that array is passed all
# the way back to the skin, so either a skin API break would be required, or an
# inefficient back-conversion.
$ill = $parserOutput->getLanguageLinks();
$this->mInterlangs = [];
foreach ( $ill as $link ) {
	list( $key, $title ) = explode( ':', $link, 2 );
	$this->mInterlangs[$key] = $title;
}

That “I didn’t want to change it in the ParserOutput” comment dates back to 2006, when the langlinks table was first introduced.

(This also suggests a way to fix this issue without breaking ParserOutput compatibility: use explode( ':', $link, 2 ) instead of Title::newFromText( $languageLinkText ) wherever the links are used, such as in Skin::getLanguages(). I’m not sure if that’s a good idea, though… it’s probably better to add a list of languageLinkTitles to the ParserOutput and fix this properly. Wikibase would then presumably use makeTitle() to create Titles that don’t suffer from the namespace or length issues.)

One way to solve this would probably be to change the interface by which Wikibase communicates the sitelinks to MediaWiki

it’s probably better to add a list of languageLinkTitles to the ParserOutput and fix this properly. Wikibase would then presumably use makeTitle() to create Titles that don’t suffer from the namespace or length issues.

I’ve started to sketch this out at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/645403, if anyone wants to see if that looks like a reasonable approach or not. (The change still needs more work, but I don’t think I’ll spend much more time on it for now.)

Hello @AMooney, any chance we could also get someone from PET have a look at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/645403 and advise whether it looks like a right/preferred/sensible approach to the issue reported here? Thanks!

daniel subscribed.

Putting this on the expedition board, since the proposed patch overlaps with refactoring work we are doing.

We discussed it, and I commented on the patch. The support the general idea (represent language links as objects, not strings, in ParserOutput and OutputPage). The devil is in the detail, though.

Thanks. I’ll copy one comment here, since it seems more relevant for anyone else looking at this task (emphasis mine):

The length restriction is a hard limit imposed by the langlinks table. We could represent longer titles in ParserOutput, but they won't go into the database. If the insert query fails because of this, this could even lead to no links at all going into the database for the page at hand.

This might mean that a proper solution for part of this issue – the inability to store some interwiki links whose namespace is too long – would require a database change.

We moved this off the campsite iteration board while checking the stalled/waiting column as it does not seem immediately actionable for us.

Open questions include:

  • Who would be the best team to work on it? We or the platform team?
    • Should this maybe be split between tackling the issue of one wiki's interwiki prefix being another wiki's talk namespace and the issue of some namespace+pagename being too long?
  • What would be the next steps here for solving the technical challenges?
  • What priority does this have both from a product and a tech perspective? (Lucas' patch is from half a year ago)