Page MenuHomePhabricator

MF interlanguage labels are incomplete when the page title contains an en dash
Open, LowPublic2 Estimated Story PointsBUG REPORT

Description

See https://it.wikipedia.org/wiki/Mission:_Impossible_-_Dead_Reckoning_-_Parte_uno?useformat=mobile#/languages: for many languages (e.g., English), the label is just "Mission: Impossible", but if you click the link you will see that the actual page title contains an en dash and then more text. For instance, the page title on enwiki is "Mission: Impossible – Dead Reckoning Part One". The en dash and everything that follows are stripped from the label.

QA

Event Timeline

Oh, I see. MF uses the en dash as delimiter for the language name. But the en dash is a perfectly valid character in page titles. In fact, there are roughly 73k articles on the English Wikipedia with " – " in their title. Lots of them are redirects, and while I don't know how many exactly (T204089), based on the first 5000 results it looks like roughly half of them are NOT redirects. All these pages are affected by this bug, and again, that's for enwiki alone.

Jdlrobson subscribed.

The language links have a title with the format <page title> - <page language>

MobileFrontend splits to get the two. It hasn't factored in the edge case where the title also contains that character.
It should be updated to account for multiple delimiters. Patch very much welcome.

https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/0652ba81c60c25d7714b7e965c251cd415caedda/src/mobile.startup/PageHTMLParser.js#L243

Jdlrobson renamed this task from MF interlanguage labels are malformed when the page title contains an en dash to MF interlanguage labels are incomplete when the page title contains an en dash.Oct 16 2023, 3:49 PM

I was going to fix this by simply splitting the string at the last delimiter. However, that also doesn't work because the string is not guaranteed to contain the delimiter (i.e., the language name) at all. That being the case, the fix needs to be different. Honestly, I don't think the title and language name should be put together at all, if possible. That'd be more complex though, and I'm not sufficiently familiar with the code base.

@Daimona the title is generated here: https://github.com/wikimedia/mediawiki/blob/0258088e16489bd7c5cbdac8f25d64c346664244/includes/skins/Skin.php#L1301
Adding data-title as a key there should make it accessible to Minerva without the title. e.g. provide it without the message interlanguage-link-title
\

Change 968652 had a related patch set uploaded (by Jon Harald Søby; author: Jon Harald Søby):

[mediawiki/extensions/MobileFrontend@master] Fix en dash display in interlanguage link list

https://gerrit.wikimedia.org/r/968652

ovasileva set the point value for this task to 2.Oct 26 2023, 5:27 PM

Change 968652 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Fix en dash display in interlanguage link list

https://gerrit.wikimedia.org/r/968652

Just to reiterate that the concerns I raised on gerrit were not purely hypothetical, https://es.m.wikipedia.beta.wmflabs.org/w/index.php?title=Misi%C3%B3n_imposible%3A_sentencia_mortal_-_Parte_1#/languages still shows a truncated title, suggesting that the patch above does not fix the issue.

@Daimona the title does not look truncated to me. What am I missing?

Screenshot 2023-10-27 at 4.32.11 PM.png (546×1 px, 47 KB)

@Daimona the title does not look truncated to me. What am I missing?

It seems to depend on the interface language. I was reading the page in Spanish (default):

image.png (479×662 px, 25 KB)

If I switch to English, then I get the dash as in your screenshot.

Interesting. In Spanish the language appears like so:

"Mission: Impossible – Dead Reckoning Part One (inglés)"

I guess there is some localization happening here so we will need to extract the language differently based on language using the message (interlanguage-link-title)

Interesting. In Spanish the language appears like so:

"Mission: Impossible – Dead Reckoning Part One (inglés)"

I guess there is some localization happening here so we will need to extract the language differently based on language using the message (interlanguage-link-title)

Indeed, there are quite a few languages that have customized that message so it doesn't contain the hard-coded delimiter: https://translatewiki.net/wiki/Special:Translations?message=Interlanguage-link-title&namespace=8 . The way I see it, there are two ways forward:

  1. Extract the necessary components from the message using the message and some regex.
  2. Like you suggested in T349000#9254760, add a data-title attribute to the a element so we can use that.

I already have a prototype for the first one working, but it feels a bit messy, so I feel like the second one is cleaner and more robust. What do you think?

Yeh I think #2 seems like the right solution here.

Actually, I thought of a 3rd way to do it: Use the href attribute and use mw.util.percentDecodeFragment() to get the title from that. I can upload a patch for that, but the one I have doesn't get the langname (i.e. translated language name) for the return object in getLanguages, which might be a problem.

Actually, I thought of a 3rd way to do it: Use the href attribute and use mw.util.percentDecodeFragment() to get the title from that. I can upload a patch for that, but the one I have doesn't get the langname (i.e. translated language name) for the return object in getLanguages, which might be a problem.

I think this would be fine. I was wonderiong how this works with display title e.g https://en.m.wikipedia.org/wiki/IOS#/languages but it seems like the existing interface does not consider that (possibly another bug?) If we want to fix that bug perhaps the 2nd way is the better approach.

Jdlrobson added a subscriber: Edtadros.

Change 973890 had a related patch set uploaded (by Jon Harald Søby; author: Jon Harald Søby):

[mediawiki/extensions/MobileFrontend@master] Make getLanguages() more robust

https://gerrit.wikimedia.org/r/973890

What are the downsides of using a data-title attribute ?

What are the downsides of using a data-title attribute ?

That would certainly be easier, but we'd also need a data-langname attribute for the localized language names.

That does sound like it would be useful to have?

Indeed. Where would those need to be added? Core?