Page MenuHomePhabricator

Content translation sometimes fails to add a sitelink for new articles
Open, MediumPublic

Description

In the Europeana Art History Challenge (https://www.wikidata.org/wiki/Wikidata:Europeana_Art_History_Challenge) we're using content translation. Users find an article to translate and do so. Turns out that content translation isn't adding the sitelink to the Wikidata item so the translation goes unnoticed. We already had a case where two users translated the same article without knowing because the sitelink wasn't added. Especially for new users it's almost impossible to explain that after the translation you have to manually add the link again.

What should happen:

  • User goes through the content translation interface
  • User translates an article from source language to destination language
  • User saves the page
  • Process starts to add the new item to the existing wikidata item (or a new one)

Probably the process that adds the missing link in the background could be similar to the move implementation we currently have

Edit by @Amire80, 2017-01-23:
I discussed this with @daniel and @Lydia_Pintscher at Dev Summit 2017, and we more or less agree that the addition of the sitelink should be moved from the frontend to the backend, although it has a few challenges.

The current process is that the article is published using the cxpublish API. It's a simple process that uses the usual edit API to save a wiki page with a bit of extra processing—edit summary, change tag, Echo notification, etc. Then the Wikibase JavaScript Library is invoked to add the sitelink.

The advantage of using the Wikibase JavaScript Library is that it is simple—it just takes source and target titles and does everything else automatically. The disadvantage is that is may fail silently if the user closes the window too quickly, and possibly because of other reasons.

It would be better to stop using JavaScript for this and make the sitelink adding part of the publishing API. Wikibase's LinkTitles class can be used for the actual linking and executed in a job similar to UpdateRepoOnMoveJob. However, doing this is more complicated than using the Wikibase JavaScript Library. In particular, the linking should be done with the same username as the one that was used for publishing the translated article, but since the repo is on another site, it's non-trivial to reuse the log-in session in a backend job.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 26 2016, 6:07 PM

I thought it does that already?
@Amire80 ?

Yes, this is supposed to happen, using the Wikibase JavaScript library, and it's automatic and transparent to the user. Usually it works, but occasionally it does not and I'm not sure why and when. Examples would be useful. In which articles did it happen?

(FWIW, @Ladsgroup sometimes runs a bot that connects articles that weren't properly auto-connected by ContentTranslation, but I don't know how regularly.)

Interesting! Is there any way to alert when the user hits save that the wikidata item is or could not be updated? I mean I have in the past saved to a userpage as a test, so obviously linking that to the wikidata item would not work (I just saved under another name). Then when I move the article to mainspace, the original item should be shown again somehow. Maybe if the link is unsuccessful that the saved article should just include wikitext in the form of [[Category:Articles without Wikidata items]] or just <!---Original Wikidata item is Q9999--->. I suppose only the first is visible to the Visual editors though.

Interesting! Is there any way to alert when the user hits save that the wikidata item is or could not be updated?

Mmm... probably. Can you please open a separate task?

I mean I have in the past saved to a userpage as a test, so obviously linking that to the wikidata item would not work (I just saved under another name).

The Content Translation code only auto-links translation if it is in the article space, under the following assumptions:

  • That other namespaces can be disallowed in WIkidata, such as User:.
  • That people who translate to other namespaces know what they are doing :)

Then when I move the article to mainspace, the original item should be shown again somehow.

Making this automatic would be out of scope of Content Translation, which tries to focus on the article creation stage and not on further editing or curation (this may change in the future, but I don't expect this to happen for at least a year).

Maybe if the link is unsuccessful that the saved article should just include wikitext in the form of [[Category:Articles without Wikidata items]] or just <!---Original Wikidata item is Q9999--->. I suppose only the first is visible to the Visual editors though.

The original Wikidata item can be found more or less easily by checking the first revision of the translated article. Its edit summary includes a link to the source article, and from it the Wikidata item can be found.

@Ladsgroup's bot does something like this, but he may add further details.

Hey,
Sorry for late response.
I reworked my bot and reran it a week ago. Before that, the bot actually searched for all articles created by CX, and checked if they are connected or not and based on possible scenarios, it added the link to the wikidata but since CX got really big (\o/), Using this method became eventually very slow. What I do right now is searching for unconnected articles that have been made by CX (which is a SQL query) and acts based on that. I'm running it in a weekly basis and I'm going to publish the source code after some minor polishing. It fixed a lot of cases but a significant proportion of them left out. Biggest issue here is interwiki conflicts. I can list them anywhere you want (Or update a certain page). Which way would be the most convenient way for you?

@Ladsgroup Thanks a lot! If result too can be also published somewhere (number of articles etc - missed to link etc), that will help us.

Amire80 renamed this task from Content translation should add sitelink for new article to Content translation sometimes fails to add a sitelink for new articles.May 1 2016, 1:44 PM
Amire80 added a project: WorkType-Maintenance.
Amire80 moved this task from Needs Triage to Bugs on the ContentTranslation board.

@Ladsgroup Thanks a lot! If result too can be also published somewhere (number of articles etc - missed to link etc), that will help us.

This is list of articles that had this issues, I fixed some of them by bot but I think this would help you figure it out why it's happening. Sorry it's not very user-friendly.

@Ladsgroup Thanks a lot! If result too can be also published somewhere (number of articles etc - missed to link etc), that will help us.

This is list of articles that had this issues, I fixed some of them by bot but I think this would help you figure it out why it's happening. Sorry it's not very user-friendly.

Thanks!

(Tip to whoever is viewing this: It was initially showed with bad special characters, and I used Firefox's View->Encoding->Unicode to fix this. I haven't touched this menu for at least a year ;) )

Amire80 added a subscriber: hoo.EditedMay 1 2016, 7:02 PM

... But there's still the question of why does it happen. From a quick skim of the list, I cannot see anything in common.

There may be other reasons for this, but I mainly suspect that using JavaScript for this is the culprit: if the user closes the window, it might not get linked. When I implemented the linking a year ago (T87410), @hoo suggested using the JavaScript library instead of doing server-side; I'm not sure why. This mostly worked, but as we see, there are occasional failures. Of course I'd like it to be more robust. Maybe @hoo or somebody else has a better idea now?

I have some wild guesses, we need to check more carefully. Maybe IP of the editor is blocked on Wikidata but since the user is IP-block exempt (e.g. an admin), she/he can edit in the wiki.

I inspected the three cases in fa.wp:

  • One case was actually because the article made in July 2015 and then someone else made another article and put the link into to wikidata, so now there two articles about a topic under two different names and the first one can't be connected since the second one got connected to wikidata sooner
  • The other two cases are very recent. It seems the author published the article and then again translated it and published it again. See this https://www.wikidata.org/w/index.php?title=Q1428898&action=history

This looks a more concerning issue. It seems even though we had the article in fa.wp, the user is allowed to use CX to make the article and CX just changed the sitelink in wikidata (or user did it directly, I can't say for sure) instead of sending a warning or logging it somewhere.
Maybe that's a caching issue, Do you know that if someone adds a sitelink it's not being shown to the clients (e.g. English Wikipedia) unless a forced purge would be made? (?action=purge&forcesitelinks=1 or something like this)

hoo added a comment.May 3 2016, 2:38 PM

Maybe that's a caching issue, Do you know that if someone adds a sitelink it's not being shown to the clients (e.g. English Wikipedia) unless a forced purge would be made? (?action=purge&forcesitelinks=1 or something like this)

I don't know how CX tries to find out whether a page for a given item already exists on a Wiki. If it uses wbgetentities for that, changes should be applied to that almost in real time (in less than 1s for sure).

This looks a more concerning issue. It seems even though we had the article in fa.wp, the user is allowed to use CX to make the article and CX just changed the sitelink in wikidata (or user did it directly, I can't say for sure) instead of sending a warning or logging it somewhere.

CX allows users to translate even if target article exist, but there are big warnings shown in CX interface and there is a dialog to confirm this explicitly from user. There are some valid usescases where people like to use CX and overwrite existing articles(for example: one line articles in target wiki)

I have tried but have not been able to replicate this issue yet. It does appear to update the target item in real time or very close to real time.

Amire80 triaged this task as Medium priority.May 19 2016, 5:45 PM
Amire80 updated the task description. (Show Details)Jan 23 2017, 2:14 PM
Amire80 added a subscriber: daniel.

I discussed this with @daniel and @Lydia_Pintscher at Dev Summit 2017. It was pretty long, I updated the task description.

Amire80 updated the task description. (Show Details)Jan 23 2017, 2:17 PM

Hello, is there any updates around this task? before few days new complains raised about it on arwiki. Also using this quarry we found 535 pages translated via Content translation tool and linked with empty WD item (by bot or users) instead of item of the source article.

Ata added a subscriber: Ata.Aug 4 2020, 2:06 PM