Page MenuHomePhabricator

CX2: Apply the right target when creating a missing link
Closed, ResolvedPublic

Description

In Content Translation, when a link from the source article cannot be added to the translation because the corresponding article does not exist in the target wiki, it is marked as unadapted (in grey) and users have the option to mark it as missing (in red) as described in T193233.

The link target of a red link points to an article that does not exist in the target Wikipedia. Determining such target is not easy. For example, a link that reads as "lemons" in the source article (English) points to the "Lemon" article, in the translation to German the expected result (assuming the article does not exist in German) would be to have a red link with the text "zitronen" that points to the missing "Zitrone" article.

The current approach is to use the title of the source article ("Lemon" in the example) for the link target of the red link. That will result in a red link labelled as "zitronen" (if MT is used) that points to "Lemon" in German Wikipedia. This is problematic since its encouraging the creation of an article titled in a different language than the one used in the wiki, and can collide with other existing articles (e.g., "Lemon" in German Wikipedia is a disambiguation page that links to articles of people with such surname).

Proposed solution

The proposed solution consists in the following approaches that should be attempted in order until one is successful:

  1. If there is a label in the target language for the Wikidata item, use it as the link target for the red link. For example Q500 shows that the "Lemon" has a "Zitron" label for German, but even for wikis where the article does not exist such as Wolof Wikipedia it still has the "limoŋ" label in Wolof which would very likely be the article title to use there (better than just "Lemon").
  2. If machine translation is available for the language pair and applied to the paragraph, a translation of the source article name will be used for the target. In the example we'll take "Lemon" (not "lemons") and send it to Yandex to hopefully get "Zitrone".
  3. (Covered as part of a follow-up ticket: T224408) If there is no machine translation available (i.e., "copy from source" approach is used), the source link target will be kept for the translation, but the user will be able to edit/confirm as if the link tool were used. In the example, the "lemons" link will be transferred to German as "lemons", and the link card will show the non-existing "Lemon" page for the user to select. In this case the user will have to correct both the label and link target. An example sequence for this case is illustrated below.
Screenshot 2019-02-05 at 13.39.13.png (423×1 px, 123 KB)
Screenshot 2019-02-05 at 13.41.15.png (448×1 px, 132 KB)
Screenshot 2019-02-05 at 13.40.37.png (396×1 px, 107 KB)

What if there is already another page with the selected title?
(Covered as part of a follow-up ticket: T224408)

Some of the approaches described above (1 and 2) should find a red link that is applied automatically. However, it is possible that the page title found is already taken by an existing article (which may or may not be the equivalent one). In such cases, the user is given the option to edit/confirm the link target.

As illustrated above, it is possible that even though "Lemon" in English is equivalent to "Zitron" in German, there is also an unrelated "Lemon" article in German. If we decide that the right target for the red link in German is "Lemon", it won't be possible to create such red link because the article is not missing.

In the case that an article (or redirect) already exists for the intended target, the result of clicking "mark as missing" won't be creating a red link directly. Instead, the insert link box will be opened with the intended text as search term. In this way the user can modify it (e.g., making it "Lemon (fruit)" instead) to create the appropriate red link (or link to the article if it happens to exist in the target language). The expected result is shown below:

Screen Shot 2018-06-20 at 13.14.20.png (524×1 px, 170 KB)

Event Timeline

Vvjjkkii renamed this task from CX2: Apply the right target when creating a missing link to 8kaaaaaaaa.Jul 1 2018, 1:02 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 8kaaaaaaaa to CX2: Apply the right target when creating a missing link.Jul 2 2018, 11:33 AM
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Pginer-WMF lowered the priority of this task from Medium to Low.Sep 19 2018, 10:58 AM
Pginer-WMF raised the priority of this task from Low to Medium.Nov 26 2018, 10:59 AM

As I wrote in T210397 (which I merged into this task), using the same target name as in the source wiki is probably the worst choice, and we currently apply it by default almost without telling anything to the user. It is shown in a relatively small font size at the bottom of the card with the "Mark as missing" button, and it's easy to miss. This creates a lot of articles with irrelevant links.

My suggestion for a simple solution is:

  1. Add an input box above the "Mark as missing" button.
  2. By default, put the text that is already displayed as the link target in the translation column, and let the translator change.
  3. Optional: If a Wikidata label is present, put it in the input box.
  4. When "Mark as missing" is clicked, use the text from the input box.

Good point @Amire80.
I think it is worth exploring ways for the user to be more aware about where the link target will be pointing to, even if it relies on more explicit user intervention to confirm/correct (while keeping the process as fluent as possible).

Good point @Amire80.
I think it is worth exploring ways for the user to be more aware about where the link target will be pointing to, even if it relies on more explicit user intervention to confirm/correct (while keeping the process as fluent as possible).

Yes, this makes sense—awareness is key here. The process should be fluent, but the current process is too fluent. I didn't count, but I see very often that CX creates articles with red links that point to titles from the source language, so letting the user intervene is OK.

The existing design of the link inspector could probably be reused for making the correct link target.

Pginer-WMF raised the priority of this task from Medium to High.May 21 2019, 7:32 AM

It may be useful to split this task into two parts: one is using Wikidata and machine translation to change the default value, another one to show the link editing dialog on the special cases.

I am currently working on the first one and trying to implement that inside cxserver.

It may be useful to split this task into two parts: one is using Wikidata and machine translation to change the default value, another one to show the link editing dialog on the special cases.

I am currently working on the first one and trying to implement that inside cxserver.

Ok. I created a follow-up ticket (T224408: Ask for user confirmation when adding a missing link and the target page name is unclear) and marked the parts covered there. Feel free to add any further clarification if needed.

Change 512687 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/services/cxserver@master] [WIP] Target title fallbacks for red links

https://gerrit.wikimedia.org/r/512687

Change 512687 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Target title fallbacks for red links

https://gerrit.wikimedia.org/r/512687

The testing was done per to https://gerrit.wikimedia.org/r/#/c/mediawiki/services/cxserver/+/512687/

Adds new property targetFrom with possible values of link (via linked page through Wikidata), label (via translated Label through Wikidata), mt (machine translation) and source (no change).

Translation optionLinks behaviorResult
en->es Copy original contentRotor (mathematics) - does not exist in eswiki (the link is grey); Curl (mathematics) - exists in eswiki (the link is active)
Screen Shot 2019-06-14 at 6.56.30 PM.png (285×948 px, 59 KB)
en->es ApertiumThe links are not translated, but Curl (mathematics) exists in eswiki - it's a bug
Screen Shot 2019-06-14 at 6.56.04 PM.png (287×979 px, 59 KB)
Screen Shot 2019-06-14 at 10.29.24 PM.png (319×1 px, 99 KB)
en-> ca Copy original contentthe link inserted by a user gets an incorrect suggestion Juice will match en:Juice (aggregator) acceptable behavior
Screen Shot 2019-06-14 at 9.59.49 PM.png (553×1 px, 185 KB)

en->es Apertium The links are not translated, but Curl (mathematics) exists in eswiki - it's a bug

The markup reconstruction algorithm used with Apertium can sometimes lose mark-up. I'm not sure if there is a task yet collecting examples of that.

en->es Apertium The links are not translated, but Curl (mathematics) exists in eswiki - it's a bug

The markup reconstruction algorithm used with Apertium can sometimes lose mark-up. I'm not sure if there is a task yet collecting examples of that.

Since Apertium only supports plain text translations, reconstructing the markup is not going to be 100% reliable.
It would be great to have more clarity about how much of the markup gets lost in this process, but is not clear how much it is worth to make these efforts to improve the current algorithm in our side. We expect Apertium to improve their support for HTML in the future, which would make our post-processing no longer relevant.

For the purpose of this task, it may be worth testing with a translation service that supports HTML such as Google or Yandex.

(1) Copy original content option

The test article is HC Dukla Prague which does not exist in cs (Czech wiki) ( ( the article is found via wikidata Maintenance Queries)

  • links that exist in cswiki are successfully matched
  • links that are missing in cswiki, display a warning to a user that links are missing with the original link in English
  • links that a user wants to enter get the matching links (if possible)
  • links that a user inserts and have no matching articles in a target language, are adopted as blue links - filed as T225986

(2) Since Apertium supports only plain text translations, the actual testing needs to be done in production.