Page MenuHomePhabricator

Content Translation occasionally publishes in the wrong language
Closed, ResolvedPublic

Description

This is the original report: https://www.mediawiki.org/wiki/Topic:T0di3reb139m62qu

One example is "Dr. Kucho!" in the English Wikipedia. This revision was created by CX, and should have been in pt.wikipedia, but it appears in en.wikipedia. The revision tags appear correct: {"from":"en","to":"pt"} (run select * from change_tag where change_tag.ct_rev_id = 758536335; in the enwiki database), but how is it possible to have "to":"pt" in any Wikipedia other than pt?

I found 62 total instances in the English Wikipedia (select * from change_tag where ct_tag = 'contenttranslation' and ct_params not like '%"to":"en"%';), so this is not totally esoteric:

mysql:wikiadmin@db1089 [enwiki]> select * from change_tag where ct_tag = 'contenttranslation' and ct_params not like '%"to":"en"%';
+-----------+-----------+-----------+--------------------+-----------------------------+
| ct_rc_id  | ct_log_id | ct_rev_id | ct_tag             | ct_params                   |
+-----------+-----------+-----------+--------------------+-----------------------------+
| 757576423 |      NULL | 677163407 | contenttranslation | {"from":"en","to":"fa"}     |
| 757641069 |      NULL | 677218942 | contenttranslation | {"from":"en","to":"pt"}     |
| 759049796 |      NULL | 678414993 | contenttranslation | {"from":"en","to":"de"}     |
| 759640134 |      NULL | 678904083 | contenttranslation | {"from":"en","to":"zh"}     |
| 773981023 |      NULL | 690146422 | contenttranslation | {"from":"en","to":"eo"}     |
| 774010472 |      NULL | 690172078 | contenttranslation | {"from":"en","to":"es"}     |
| 774011600 |      NULL | 690173040 | contenttranslation | {"from":"en","to":"es"}     |
| 774012199 |      NULL | 690173553 | contenttranslation | {"from":"en","to":"es"}     |
| 774012503 |      NULL | 690173791 | contenttranslation | {"from":"en","to":"es"}     |
| 774017040 |      NULL | 690177299 | contenttranslation | {"from":"en","to":"es"}     |
| 774017134 |      NULL | 690177364 | contenttranslation | {"from":"en","to":"es"}     |
| 774903508 |      NULL | 690838686 | contenttranslation | {"from":"en","to":"it"}     |
| 775452654 |      NULL | 691253359 | contenttranslation | {"from":"en","to":"zh"}     |
| 778639761 |      NULL | 693544079 | contenttranslation | {"from":"eo","to":"fr"}     |
| 778857069 |      NULL | 693696564 | contenttranslation | {"from":"eo","to":"fr"}     |
| 780500518 |      NULL | 694721653 | contenttranslation | {"from":"en","to":"pt"}     |
| 781672055 |      NULL | 695512931 | contenttranslation | {"from":"en","to":"pt"}     |
| 786358382 |      NULL | 698681273 | contenttranslation | {"from":"en","to":"pt"}     |
| 787015300 |      NULL | 699117024 | contenttranslation | {"from":"en","to":"it"}     |
| 790139419 |      NULL | 701208315 | contenttranslation | {"from":"en","to":"id"}     |
| 795566308 |      NULL | 704057341 | contenttranslation | {"from":"en","to":"ta"}     |
| 801628382 |      NULL | 707287237 | contenttranslation | {"from":"en","to":"pt"}     |
| 805054540 |      NULL | 709213435 | contenttranslation | {"from":"en","to":"ru"}     |
| 805056343 |      NULL | 709214537 | contenttranslation | {"from":"en","to":"ru"}     |
| 807581773 |      NULL | 710582766 | contenttranslation | {"from":"en","to":"pt"}     |
| 809272240 |      NULL | 711547351 | contenttranslation | {"from":"en","to":"zh"}     |
| 809273029 |      NULL | 711547815 | contenttranslation | {"from":"en","to":"zh"}     |
| 814967862 |      NULL | 714796704 | contenttranslation | {"from":"en","to":"fr"}     |
| 814967904 |      NULL | 714796735 | contenttranslation | {"from":"en","to":"fr"}     |
| 814968387 |      NULL | 714797046 | contenttranslation | {"from":"en","to":"fr"}     |
| 815803061 |      NULL | 715259970 | contenttranslation | {"from":"en","to":"pt"}     |
| 815906799 |      NULL | 715321689 | contenttranslation | {"from":"en","to":"fa"}     |
| 817059447 |      NULL | 715968444 | contenttranslation | {"from":"en","to":"zh"}     |
| 817805061 |      NULL | 716322208 | contenttranslation | {"from":"en","to":"pt"}     |
| 818308404 |      NULL | 716612415 | contenttranslation | {"from":"en","to":"pt"}     |
| 824645421 |      NULL | 719765530 | contenttranslation | {"from":"en","to":"pa"}     |
| 829205751 |      NULL | 722203663 | contenttranslation | {"from":"en","to":"simple"} |
| 830798899 |      NULL | 723035021 | contenttranslation | {"from":"en","to":"ta"}     |
| 835268919 |      NULL | 725431268 | contenttranslation | {"from":"en","to":"bn"}     |
| 835269320 |      NULL | 725431488 | contenttranslation | {"from":"en","to":"bn"}     |
| 836999840 |      NULL | 726371307 | contenttranslation | {"from":"en","to":"pt"}     |
| 837701048 |      NULL | 726776516 | contenttranslation | {"from":"en","to":"ko"}     |
| 838151632 |      NULL | 727033418 | contenttranslation | {"from":"pt","to":"ca"}     |
| 839794574 |      NULL | 727860892 | contenttranslation | {"from":"de","to":"cs"}     |
| 841446268 |      NULL | 728714530 | contenttranslation | {"from":"en","to":"zh"}     |
| 843290522 |      NULL | 729766412 | contenttranslation | {"from":"en","to":"uk"}     |
| 844205482 |      NULL | 730303777 | contenttranslation | {"from":"en","to":"jv"}     |
| 845564166 |      NULL | 731058743 | contenttranslation | {"from":"en","to":"zh"}     |
| 845725471 |      NULL | 731143074 | contenttranslation | {"from":"en","to":"th"}     |
| 846221381 |      NULL | 731417488 | contenttranslation | {"from":"en","to":"jv"}     |
| 846221812 |      NULL | 731417760 | contenttranslation | {"from":"en","to":"jv"}     |
| 846225757 |      NULL | 731419770 | contenttranslation | {"from":"en","to":"jv"}     |
| 848930931 |      NULL | 732966339 | contenttranslation | {"from":"en","to":"ur"}     |
| 851907192 |      NULL | 734571980 | contenttranslation | {"from":"en","to":"ur"}     |
| 853102614 |      NULL | 735219005 | contenttranslation | {"from":"en","to":"da"}     |
| 853178017 |      NULL | 735265334 | contenttranslation | {"from":"en","to":"da"}     |
| 855169546 |      NULL | 736061011 | contenttranslation | {"from":"en","to":"ar"}     |
| 856128034 |      NULL | 736599553 | contenttranslation | {"from":"en","to":"ur"}     |
| 864963860 |      NULL | 741264669 | contenttranslation | {"from":"en","to":"ur"}     |
| 879065168 |      NULL | 748305847 | contenttranslation | {"from":"en","to":"el"}     |
| 890316159 |      NULL | 754242746 | contenttranslation | {"from":"en","to":"es"}     |
| 898934041 |      NULL | 758536335 | contenttranslation | {"from":"en","to":"pt"}     |
+-----------+-----------+-----------+--------------------+-----------------------------+
62 rows in set (0.04 sec)

Note that in some of those neither "from" nor "to" equals "en".

Also, to check the dates I ran the following:

mysql:wikiadmin@db1089 [enwiki]> select ct_rev_id, ct_params, rev_timestamp from change_tag, revision where ct_tag = 'contenttranslation' and ct_params not like '%"to":"en"%' and ct_rev_id = rev_id;
+-----------+-----------------------------+----------------+
| ct_rev_id | ct_params                   | rev_timestamp  |
+-----------+-----------------------------+----------------+
| 690146422 | {"from":"en","to":"eo"}     | 20151111160352 |
| 690838686 | {"from":"en","to":"it"}     | 20151116003037 |
| 691253359 | {"from":"en","to":"zh"}     | 20151118170631 |
| 693544079 | {"from":"eo","to":"fr"}     | 20151203063649 |
| 707287237 | {"from":"en","to":"pt"}     | 20160227233701 |
| 709213435 | {"from":"en","to":"ru"}     | 20160309190947 |
| 710582766 | {"from":"en","to":"pt"}     | 20160317205101 |
| 711547351 | {"from":"en","to":"zh"}     | 20160323153451 |
| 715259970 | {"from":"en","to":"pt"}     | 20160414174656 |
| 715968444 | {"from":"en","to":"zh"}     | 20160419012929 |
| 716322208 | {"from":"en","to":"pt"}     | 20160421034413 |
| 716612415 | {"from":"en","to":"pt"}     | 20160422185704 |
| 719765530 | {"from":"en","to":"pa"}     | 20160511170819 |
| 722203663 | {"from":"en","to":"simple"} | 20160526161659 |
| 725431268 | {"from":"en","to":"bn"}     | 20160615162536 |
| 725431488 | {"from":"en","to":"bn"}     | 20160615162712 |
| 726776516 | {"from":"en","to":"ko"}     | 20160624090358 |
| 727033418 | {"from":"pt","to":"ca"}     | 20160626053711 |
| 730303777 | {"from":"en","to":"jv"}     | 20160718042951 |
| 731143074 | {"from":"en","to":"th"}     | 20160723075832 |
| 732966339 | {"from":"en","to":"ur"}     | 20160804131419 |
| 734571980 | {"from":"en","to":"ur"}     | 20160815070606 |
| 735219005 | {"from":"en","to":"da"}     | 20160819091428 |
| 735265334 | {"from":"en","to":"da"}     | 20160819160714 |
| 736061011 | {"from":"en","to":"ar"}     | 20160824221331 |
| 736599553 | {"from":"en","to":"ur"}     | 20160828161021 |
| 741264669 | {"from":"en","to":"ur"}     | 20160926115833 |
| 754242746 | {"from":"en","to":"es"}     | 20161211163811 |
| 758536335 | {"from":"en","to":"pt"}     | 20170106011655 |
+-----------+-----------------------------+----------------+
29 rows in set (0.05 sec)

There are much less results here, and I guess that those that appear in the first query belong to deleted pages.

Details

Related Gerrit Patches:
mediawiki/extensions/ContentTranslation : masterActually enforce valid token for translation view
mediawiki/extensions/ContentTranslation : masterDisallow opening the translation view in the wrong wiki

Event Timeline

Amire80 created this task.Jan 9 2017, 4:26 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 9 2017, 4:26 AM

In other top target languages:

  • es: 5 articles
  • ca: 5 articles
  • fr: 10 articles
  • ru: 8 articles
  • ar: 6 articles
  • it: 3 articles

uk, pt: 0 articles

Amire80 added a comment.EditedJan 17 2017, 12:15 AM

A query with page titles:

mysql:wikiadmin@db1089 [enwiki]> select ct_rev_id, ct_params, page_title from change_tag, revision, page where ct_rev_id = rev_id and rev_page = page_id and ct_tag = 'contenttranslation' and ct_params not like '%"to":"en"%';
+-----------+-----------------------------+----------------------------------+
| ct_rev_id | ct_params                   | page_title                       |
+-----------+-----------------------------+----------------------------------+
| 690146422 | {"from":"en","to":"eo"}     | Hamza_Hakimzade_Niyazi           |
| 690838686 | {"from":"en","to":"it"}     | Philippe_C._Schmitter            |
| 691253359 | {"from":"en","to":"zh"}     | 甲基异丙基酮                     |
| 693544079 | {"from":"eo","to":"fr"}     | Aleks_Andre/Humphrey_Tonkin      |
| 707287237 | {"from":"en","to":"pt"}     | Sensuous_Chill                   |
| 709213435 | {"from":"en","to":"ru"}     | Cramond                          |
| 710582766 | {"from":"en","to":"pt"}     | Saudades_do_Brasil               |
| 711547351 | {"from":"en","to":"zh"}     | Jacob_Lurie                      |
| 715259970 | {"from":"en","to":"pt"}     | Victor_Lopes/sandbox             |
| 715968444 | {"from":"en","to":"zh"}     | Alliance_for_Open_Media          |
| 716322208 | {"from":"en","to":"pt"}     | Victor_Lopes/sandbox             |
| 716612415 | {"from":"en","to":"pt"}     | Victor_Lopes/sandbox             |
| 719765530 | {"from":"en","to":"pa"}     | Yuanyang_County,_Henan           |
| 722203663 | {"from":"en","to":"simple"} | Sunni_Cultural_Center,_Karanthur |
| 725431268 | {"from":"en","to":"bn"}     | Duncan_Passage                   |
| 725431488 | {"from":"en","to":"bn"}     | Duncan_Passage                   |
| 726776516 | {"from":"en","to":"ko"}     | Chigozie_Agbim                   |
| 727033418 | {"from":"pt","to":"ca"}     | Underdog_(terme)                 |
| 730303777 | {"from":"en","to":"jv"}     | Hannah_Arendt                    |
| 731143074 | {"from":"en","to":"th"}     | Park_Bo-gum                      |
| 732966339 | {"from":"en","to":"ur"}     | Vadi-e_Rahmat                    |
| 734571980 | {"from":"en","to":"ur"}     | Kojid_Rural_District             |
| 735219005 | {"from":"en","to":"da"}     | PHansen/Rewilding_Europe         |
| 735265334 | {"from":"en","to":"da"}     | PHansen/Orcagna                  |
| 736061011 | {"from":"en","to":"ar"}     | IT_portfolio_management          |
| 736599553 | {"from":"en","to":"ur"}     | Guest_star_(astronomy)           |
| 741264669 | {"from":"en","to":"ur"}     | Bhajji_State                     |
| 754242746 | {"from":"en","to":"es"}     | José_María_Cabral_Bermúdez       |
| 758536335 | {"from":"en","to":"pt"}     | Dr._Kucho!                       |
+-----------+-----------------------------+----------------------------------+
29 rows in set (1.44 sec)

(This only has results for pages that weren't deleted.)

Arrbee triaged this task as Medium priority.Jan 18 2017, 7:34 PM
Pginer-WMF added a subscriber: Pginer-WMF.EditedFeb 18 2019, 3:05 PM

This recent talk page comment reported the issue:

When I translated the article on Lukas Jacobs from Dutch to French, there was a bug with the translation tool:
https://nl.wikipedia.org/w/index.php?title=Lukas_Jacobs&type=revision&diff=53198485&oldid=52731206&diffmode=source
The translation has replaced the dutch page and not been created on the french Wikipidia. I use Firefox.
I have canceled and made the creation manually. But where does the bug come from?
In the translation tool, when I see my published traductions, it is well written :
Nederlands -> français

The edit tags show that the article was created using CX2.

@Pginer-WMF
It seems like a serious issue which is not straightforwardly reproducible. Below is an updated info for the queries in the ticket and some investigation for the most recently reported case.

(1) Looking at the case from your comment above (https://phabricator.wikimedia.org/T154888#4961728) Nederlands -> français. No records for a published article in a wrong language:

research@analytics-store.eqiad.wmnet [nlwiki]> select count(*), ct_tag_id, ct_params, ctd_name  from change_tag, change_tag_def  where ct_tag_id=ctd_id and  ct_tag_id in (select ctd_id from change_tag_def where ctd_name in ('contenttranslation', 'contenttranslation-v2')) and  ct_params not like '%"to":"nl"%';
+----------+-----------+-----------+----------+
| count(*) | ct_tag_id | ct_params | ctd_name |
+----------+-----------+-----------+----------+
|        0 |      NULL | NULL      | NULL     |
+----------+-----------+-----------+----------+
1 row in set (0.03 sec)

Next (2) and (3), it's just re-checking the numbers previously reported in the ticket
(3) For enwiki the number of records with not like '%"to":"en"%" and ct_tag = 'contenttranslation' has increased:
research@analytics-store.eqiad.wmnet [enwiki]> select count(*) from change_tag where ct_tag = 'contenttranslation' and ct_params not like '%"to":"en"%';

+----------+
| count(*) |
+----------+
|      109 |
+----------+
1 row in set (48.60 sec)

And the max timestamp:

research@analytics-store.eqiad.wmnet [enwiki]> select  ct_params,max(rev_timestamp) from change_tag, revision where ct_tag = 'contenttranslation' and ct_params not like '%"to":"en"%' and ct_rev_id = rev_id;
+-------------------------+--------------------+
| ct_params               | max(rev_timestamp) |
+-------------------------+--------------------+
| {"from":"en","to":"eo"} | 20180310100648     |
+-------------------------+--------------------+

(4) Looking into different wikis:

  • 'contenttranslation-v2' tag will always has NULL as ct_params, the translation with wrongly published might be identified by the same revid (both ('contenttranslation', 'contenttranslation-v2' will be attached)
  • the query that I ran:
[eswiki]> select ct_tag_id, ctd_name,count(*) from change_tag, change_tag_def where ct_tag_id=ctd_id and  ct_tag_id in (select ctd_id from change_tag_def where ctd_name in ('contenttranslation', 'contenttranslation-v2')) and ct_params not like '%"to":"es"%' group by ct_tag_id;
eswikicontenttranslation 10
frwikicontenttranslation 6
cawikicontenttranslation 6
ruwikicontenttranslation 5
arwikicontenttranslation 7
itwikicontenttranslation 6
ukwikicontenttranslation 0
ptwikicontenttranslation 4

Do note that ct_params has not been populated after d106a3922827674200a64add94e3165afcc5bb7f.

Do note that ct_params has not been populated after d106a3922827674200a64add94e3165afcc5bb7f.

Is that limiting our ability to investigate/confirm/fix the issue? Should we ask for these events to be recorded again for a given period of time?

Another comment may be about a related problem:

Same as title of my post it is translating to Croatian even though I am clearly translating to Serbian. What is happening, it worked fine yesterday.

Do note that ct_params has not been populated after d106a3922827674200a64add94e3165afcc5bb7f.

Is that limiting our ability to investigate/confirm/fix the issue? Should we ask for these events to be recorded again for a given period of time?

As long as we have user reports, it's not necessary to have that data. The biggest benefit it gives us a reliable number for how often this happens, and once we fix the issue it could be helpful for confirm that the fix worked.

Taking the last reported example, here is the data we have:

MariaDB [wikishared]> select * from cx_translations where cx_translations.translation_target_title = 'Lukas Jacobs'\G
*************************** 1. row ***************************
                    translation_id: 604613
          translation_source_title: Lukas Jacobs
          translation_target_title: Lukas Jacobs
       translation_source_language: nl
       translation_target_language: fr
            translation_source_url: //nl.wikipedia.org/wiki/Lukas Jacobs
            translation_target_url: //fr.wikipedia.org/wiki/Lukas Jacobs
                translation_status: published
       translation_start_timestamp: 20190213095813
translation_last_updated_timestamp: 20190213101000
              translation_progress: {"any":0.9166666666666666,"human":0.6666666666666666,"mt":0.25,"mtSectionsCount":3,"translatedSectionsCount":11}
            translation_started_by: [snip]
        translation_last_update_by: [snip]
    translation_source_revision_id: 52731206
    translation_target_revision_id: 53198485
            translation_cx_version: 2

Correlating this with our event logging:

[log]> select timestamp, wiki, event_action, event_sourceLanguage, event_targetLanguage from ContentTranslation_11628043 where event_targetTitle = 'Lukas Jacobs'\G
*************************** 1. row ***************************
           timestamp: 20190213100829
                wiki: nlwiki
        event_action: publish-failure
event_sourceLanguage: nl
event_targetLanguage: fr
*************************** 2. row ***************************
           timestamp: 20190213101001
                wiki: nlwiki
        event_action: publish
event_sourceLanguage: nl
event_targetLanguage: fr
2 rows in set (0.07 sec)

The publish-failure is {"code":"abusefilter-warning","message":{"key":"abusefilter-warning-tekstnacatofiw","params":["Tekst na interwiki of categorie",20]} and probably isn't relevant.

The two causes I can think of:

  • Sitemapper gets confused and accidentally uses wrong wiki
  • Our redirection to target language does not happen, translating happens on the current wiki, and publishing goes to the current wiki

Based on the logs, it is definitely the latter.

The place where redirection implicitly happens is https://gerrit.wikimedia.org/g/mediawiki/extensions/ContentTranslation/+/2ea3bb31821c92ce98cd0ba517df532a3eba1ab9/modules/base/ext.cx.sitemapper.js#199

Theory one: wgContentTranslationTranslateInTarget is not true. I don't see how this is possible. It is always set in the ResourceLoaderGetConfigVars hook, and in our configuration it is always true except on testwiki. mwgrep shows no code that would manipulate wgContentTranslationTranslateInTarget in any way. The only way I see it could be unset is that some other code either modifies mw.config.values directly or modifies the values in ResourceLoaderGetConfigVars hook. I haven't seen evidence of either.

Theory two: mw.Uri() would fail and cause invalid URL to be generated. This seems ruled out because everything breaks left and right if this is the case: I tested on https://dev.translatewiki.net/wiki/Special:ContentTranslation?from=nl1&to=fr1&dubug=faa%&#%qg and it throws Uncaught TypeError: Cannot read property 'protocol' of undefined so the article would not load at all.

Theory tree: wgContentTranslationSiteTemplates.view is empty string. This seems even less plausible than theory one. Most other values cause errors.

Theory four: Someone manipulates the URL manually after starting a translation. As long as the cookie is still valid, we do not redirect to the dashboard and translating in the "wrong" wiki is possible. Why would one do that, I don't know, but I can confirm it is possible. This should be easy to detect and we can add forced redirection either client side or server side.

Pginer-WMF raised the priority of this task from Medium to High.Mar 5 2019, 8:19 AM

I am going to add code that checks (and prevents) theory 4. I will also investigate if we can add old or new logging to know whether this is happening.

Nikerabbit removed Nikerabbit as the assignee of this task.Apr 19 2019, 2:47 PM

Unclaiming while I am away.

Change 511411 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/ContentTranslation@master] Disallow opening the translation view in the wrong wiki

https://gerrit.wikimedia.org/r/511411

Change 511411 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Disallow opening the translation view in the wrong wiki

https://gerrit.wikimedia.org/r/511411

We no longer have the ct_params field to check whether this is working. My plan is as follows:
Collect revision ids from cx_translations.translation_target_revision_id where translation_target_language != 'en'. Then check on enwiki whether this revisions can be found in the revision or archive table with the CX tag. If those are found before this patch has been deployed, but not after, then the fix can be considered working.

This can either be done manually, or with small PHP snippet that can be fed to eval.php or even placed as a maintenance script in CX.

Looks like this is not fixed yet. Using my approach to check for pages published in 2019 in enwiki with CX2 tag where target language is not en I get the following list. There is a very small chance for false positives.

page_namespace  page_title      rev_id  rev_timestamp
118     Gastronomía_kurda       877347095       20190108030850
118     List_of_royal_consorts_of_the_Kingdom_of_the_Two_Sicilies       879996738       20190124181826
0       Dog_Days        883957555       20190218175540
2       Ribeiro2002Rafael/Festival_Eurovisão_da_Canção_Júnior_2007      889905757       20190328185316
0       Tutush_I        890610459       20190402120921
0       GeoFS   892168469       20190412171910
0       Redirection_(computing) 893726507       20190423060234
2       Mr._TTG/VnStat  894034319       20190425053142
0       Almost_Family   899635783       20190531100346
0       Donald_Harvey   900568157       20190606121436

https://en.wikipedia.org/w/index.php?title=Donald_Harvey&oldid=900568157
https://en.wikipedia.org/w/index.php?title=Almost_Family&direction=next&oldid=897103126

And this doesn't even include the archive table.

By testing the scenario I attempted to fix, I notice it is not actually fixed. Going back to read the code.

Change 516137 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/ContentTranslation@master] Actually enforce valid token for translation view

https://gerrit.wikimedia.org/r/516137

Change 516137 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Actually enforce valid token for translation view

https://gerrit.wikimedia.org/r/516137

Etonkovidova added a subscriber: Catrope.

The following query (and its modifications) can be used to monitor the number and details of translations published in the wrong wikis (thx, @Catrope).

select count(*)  from cx_translations WHERE SUBSTRING(translation_target_url, 3, LENGTH(translation_target_language)) != translation_target_language and translation_status !='deleted';

@Nikerabbit - two questions:

  1. It seems that majority of wrong translation_target_url is in language variants. Is it helpful to know?
  1. I was looking at max(translation_last_updated_timestamp): 20190626175624 as indicative of the most recent incorrectly published translations. I assumed that it's a publishing date, but now I am not so sure.

The following query (and its modifications) can be used to monitor the number and details of translations published in the wrong wikis (thx, @Catrope).

select count(*)  from cx_translations WHERE SUBSTRING(translation_target_url, 3, LENGTH(translation_target_language)) != translation_target_language and translation_status !='deleted';

That won't detect this issue, since in these cases the target_url has the correct value (even though it was published to the wrong wiki). The wiki domain code and target language code differs are per our codemap.

Query to find out articles published in languages other than English:

SELECT translation_target_title, translation_last_updated_timestamp, translation_target_revision_id
FROM cx_translations
WHERE translation_target_language != 'en' AND translation_start_timestamp > '20190425000000' AND translation_target_revision_id IS NOT NULL;

Executed as sql wikishared < cx.sql > output.csv. Then I did cat output.csv | cut -f3 | egrep '(88|89|90|91).......' (Basically get revisions ids that match recent revision ids in English Wikipedia) with result of:

894034319
899635783
900568157

We can see that these are the same I found earlier:

2       Mr._TTG/VnStat  894034319       20190425053142
0       Almost_Family   899635783       20190531100346
0       Donald_Harvey   900568157       20190606121436

While this only checks for one wiki for wrongly published articles, I am fairly confident that this issue is now fixed since there are no new entries since my follow-up fix was deployed almost two weeks ago.

@Pginer-WMF @Jpita Do you think this is enough of evidence?

@Pginer-WMF @Jpita Do you think this is enough of evidence?

Sounds good to me. I think we can declare it as resolved. If we hear about similar incidents in the future we can create a new ticket and analyze the specifics.

Pginer-WMF closed this task as Resolved.Jul 5 2019, 7:35 AM