Page MenuHomePhabricator

[Compact links] Prioritise interwiki to language "X" if a lang="X" attribute is present in the page's HTML
Closed, ResolvedPublic

Description

Copied from Pau's specifications:
*'''Surface languages used in content.''' Articles contain content in different languages (e.g., an article about a Russian city will have the name in Russian even on the english wikipedia). Those languages indicate a special connection between the topic and the language. This may not be a language the user understands (so this criteria should not be a priority) but it may be a good option if there are remaining slots in the initial list of languages.

Example:

  • I visit https://en.wikipedia.org/wiki/Leo_Tolstoy with compact interlanguage linbks enabled
  • The page contains <b>Lev Nikolayevich Tolstoy</b> (<a href="/wiki/Russian_language" title="Russian language">Russian</a>: <span lang="ru">Лев Никола́евич Толсто́й</span>
  • Whatever interface language and other conditions apply to me, the interwiki to "ru" is displayed.

As long as we limit it to 1-2 languages per page this should be a rather effective trick in the lucky (but important) cases where it applies, but it depends how efficient it is to scan the whole page like that.

Details

Reference
bz68077

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:35 AM
bzimport set Reference to bz68077.
bzimport added a subscriber: Unknown Object (MLST).
Arrbee set Security to None.
Amire80 triaged this task as Normal priority.Dec 4 2014, 8:42 AM
Amire80 removed a project: Language-Team.
jayvdb added a subscriber: jayvdb.Jun 20 2015, 8:10 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 5 2015, 9:04 AM
Amire80 moved this task from Backlog to Other on the ULS-CompactLinks board.Mar 26 2016, 4:05 PM
Elitre added a subscriber: Elitre.May 16 2016, 7:09 PM

Change 296615 had a related patch set uploaded (by Amire80):
Show languages that appear in the page's text

https://gerrit.wikimedia.org/r/296615

This issue has been mentioned several times in recent community feedback, so I committed a very simple patch.

It needs review not just for correctness and functionality, but also for performance and browser compatibility: is the $( '#mw-content-text [lang]' ) query reasonably fast and won't it break on some browsers?

It's very simplistic, so if anybody believes that there are problems with this patch or that now is not a good time to change functionality, feel free to delay or abandon it.

Pginer-WMF added a comment.EditedJun 30 2016, 7:56 AM

Some aspects ar enot clear for me from the description or the patchset info:

  • Will this override the user previous choices? browser settings? geo-ip-based guesses? (i.e., where it sits in terms of priority among the different criteria)
  • Will add more languages to the list of suggested languages?
  • Will this act as a fallback when there are not previous choices by the user?
  • Will there be any performance impact in longer articles for which a big amount of content needs to be processed?

Some aspects ar enot clear for me from the description or the patchset info:

  • Will this override the user previous choices? browser settings? geo-ip-based guesses? (i.e., where it sits in terms of priority among the different criteria)
  • Will add more languages to the list of suggested languages?
  • Will this act as a fallback when there are not previous choices by the user?

The answer to the first three questions is that it comes after previous languages and ULS common languages (which includes geo-IP), and before the extra fallback (T135366).

  • Will there be any performance impact in longer articles for which a big amount of content needs to be processed?

Not sure. That's what I asked in my previous comment. Maybe Niklas or Santhosh will have a definitive answer.

Will there be any performance impact in longer articles for which a big amount of content needs to be processed?

Yes there is performance cost. To avoid/minimize that I rewrote how the compacting strategies are executed at https://gerrit.wikimedia.org/r/296710

Change 298255 had a related patch set uploaded (by Santhosh):
Show languages that appear in the page's text

https://gerrit.wikimedia.org/r/298255

Change 296615 abandoned by Santhosh:
Show languages that appear in the page's text

Reason:
Abandoned because of new patch https://gerrit.wikimedia.org/r/298255

https://gerrit.wikimedia.org/r/296615

santhosh claimed this task.Jul 11 2016, 9:20 AM
santhosh moved this task from Backlog to In Review on the Language-Q1-2016-17 Sprint 1 board.

Change 298255 merged by jenkins-bot:
Show languages that appear in the page's text

https://gerrit.wikimedia.org/r/298255

Amire80 closed this task as Resolved.Aug 4 2016, 9:19 AM

Verified in production in the Hebrew Wikipedia.

Amire80 moved this task from QA to Done on the Language-Q1-2016-17 Sprint 2 board.Aug 4 2016, 9:19 AM
Elitre awarded a token.Aug 4 2016, 2:42 PM