Page MenuHomePhabricator

run a bot that adds sitelinks to articles that were created using ContentTranslation and not linked to other languages
Closed, ResolvedPublic

Description

Hi,

We started deploying ContentTranslation in January without adding interlanguage links to the created translated articles. Initially we were publishing pages as drafts, and we assumed that people will add interlanguage links after proper publishing to the main space.

Later we started publishing directly to the main space, so it made sense to add interlanguage links automatically. After some delays we have a patch to do this. See T87410 and https://gerrit.wikimedia.org/r/#/c/214119/ .

We have over 4000 published articles, and all of them are supposed to have interlanguage links, but some still don't because the people forgot to add them. I linked some of them manually, but with so many pages it's inefficient to do it manually.

There should be a bot that does the following:

  • Goes over all the articles created using ContentTranslation. I guess that it's better to get the list from the central contenttranslation database, but maybe it's better to get them from the tags? @santhosh, @Nikerabbit, your opinion?
  • If the article has a site link - all good, nothing more to do. (This will probably be the majority.)
  • If the article doesn't have a site link, then link it with the article from which it was translated.
  • Quite a lot of translated articles were moved. Some redirects were kept and some weren't. If a redirect was kept, link the target page with the article from which it was translated.
  • If an article cannot be found, then it was either deleted, or moved without leaving a redirect. The bot should create a list of those to check manually. It would be nice to at least have a list of what to the article - just deleted or moved without leaving a redirect; it should be possible for a bot to do this.

Event Timeline

Amire80 raised the priority of this task from to Low.
Amire80 updated the task description. (Show Details)
Amire80 added subscribers: Amire80, Ladsgroup, Romaine and 2 others.

(This is not a bug in Wikidata on in ContentTranslation, but I tag both projects because they are related.)

I guess that it's better to get the list from the central contenttranslation database

Is that available on labs? And is the database documented somewhere?

Which labs do you mean? There are separate tables for production and beta-labs. Those are in the wikishared database and named cx_drafts, cx_translations, cx_translators. I have not checked whether those are available on the database copies on labs.

I guess that it's better to get the list from the central contenttranslation database

Is that available on labs? And is the database documented somewhere?

I suspect that we don't have a convenient API to get a list of all translations, but this is probably a one-time thing, for which it's possible to get a one-time simply query that lists all the published articles.

I suspect that we don't have a convenient API to get a list of all translations

https://www.mediawiki.org/wiki/Content_translation/Published_translations

Thanks, this should work.

Another option is to run the following query in every wiki:
https://www.mediawiki.org/wiki/Content_translation/analytics/queries#Pages_created_in_the_main_namespace

I'm not sure what's more efficient - it's up to the bot writer.

Writing something for that is pretty easy and I like to help about ContentTranslation, Just give me list of wikis which content translation is enabled, then consider it done :)

@Ladsgroup, thanks!

See https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php

Search for wmgUseContentTranslation . You can go over all the true values, and you can skip "testwiki" of course.

Amire80 removed subscribers: Ricordisamoa, Bene, Romaine.

Reopened.

The auto-adding of interlanguage links seems to work most of the time, but occasionally it seems to fail (we are trying to understand why).

Until it's totally reliable, it would be nice to run this bot periodically, for example every week. Can it be scheduled?

I added the code in crontab. And the code is here in case I got hit by a truck :)