Page MenuHomePhabricator

Extend CX2 translations graph to show also published translations that need review
Open, HighPublic

Description

The graph for CX2 translations shows the number of articles that editors create using version 2 of Content translation over time.

Some of the published articles may be published with too much unmodified content (T190279). These are added to a special tracking category (T190798) on each wiki, but it is hard to get the overall number for all languages. Extending the graph to include those articles would be useful. To support this, the following is proposed:

  • Keep the current line for published translations. We may want to rename it from "cx2_published_translations" to "All published" or similar to emphasise that this line includes all translations (those marked as having too much unmodified content, and those not marked in such way).
  • Add a new line named "Published needing review" to represent the articles created despite the "too much unmodified content" warning.

The image below illustrates the expected result:

Details

Related Gerrit Patches:
analytics/limn-language-data : masterAdd need-review chart to published CX2 translation
mediawiki/extensions/ContentTranslation : masterLog need-review events
mediawiki/extensions/ContentTranslation : masterAdd a counter for published translations with unreviewed MT

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 19 2018, 5:59 PM
Pginer-WMF triaged this task as Medium priority.Nov 19 2018, 5:59 PM
Pginer-WMF edited projects, added ContentTranslation; removed CX-analytics.
Pginer-WMF edited projects, added CX-analytics; removed ContentTranslation.
Pginer-WMF added a subscriber: Amire80.

Change 476480 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] Add a counter for published translations with unreviewd MT

https://gerrit.wikimedia.org/r/476480

Change 476480 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Add a counter for published translations with unreviewed MT

https://gerrit.wikimedia.org/r/476480

Petar.petkovic moved this task from In Review to QA on the Language-Team (Language-2018-October-December) board.

@Petar.petkovic, I don't see the new line in the graph. Is this waiting for deployment or is further development needed?

I checked today (the task is on my list of tasks to check after wmf.8 deployment) - the link https://language-reportcard.wmflabs.org/cx2/scripts.js?v=abce880:46 displays the error in the Console:

Wed, 12 Dec 2018 23:23:12 GMT https://language-reportcard.wmflabs.org/cx2/scripts.js?v=abce880:46

TypeError: e is undefined scripts.js:36:7482

The link was working last week.

This is what I think happened: @Amire80 updated the tab names, so that the new link is https://language-reportcard.wmflabs.org/cx2/#cx-2-translations (cf. https://language-reportcard.wmflabs.org/cx2/#translations) but the link in the task description was not updated.

Thx, @Nikerabbit - https://language-reportcard.wmflabs.org/cx2/#cx-2-translations is working, but the new graph "Published needing review" is still not present.

santhosh reassigned this task from santhosh to Amire80.Dec 14 2018, 5:40 AM
santhosh added a subscriber: santhosh.

I don't know if @Amire80 implemented anything for the language-reportcard. The patches above are my patches that adds a graph to grafana https://grafana.wikimedia.org/d/000000598/content-translation See the graph about "Published translations with high amount of unreviewed MT"

I will unassign and assign to Amir

Given that this may take some more time due to technical complexities, I created a ticket for an initial report that will provide some initial insights and useful data for upcoming communications: T218020: Measure percentage of translations published with and without the expected level of modified content

Pginer-WMF raised the priority of this task from Medium to High.Apr 5 2019, 4:40 PM

Change 502312 had a related patch set uploaded (by Amire80; owner: Amire80):
[mediawiki/extensions/ContentTranslation@master] Log need-review events

https://gerrit.wikimedia.org/r/502312

Change 502312 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Log need-review events

https://gerrit.wikimedia.org/r/502312

Amire80 added a subscriber: mforns.May 8 2019, 1:30 PM

So I have the events being logged, and to show them on the chart, I'll have to do something like this regularly:

select
  count(distinct(concat_ws(',', event.sourceTitle, event.sourceLanguage, event.targetTitle, event.targetLanguage, event.token))) as count
from
  event.contenttranslation
where
  event.action = 'need-review' and
  year = 2019 and
  month = 5 and
  day = 3;

This will have to be in hive. As a first step, I'll probably make a new dashboard for this, and once it works, I'll merge the existing dashboard into it.

A question for @mforns: The query above selects just one number, per day. If I understand correctly, the output is supposed to include two columns: a date and a number. If I'm going to use RU, what's the right way to add the number? And what's the most robust way to pass this date to the query in a way that will show it as a column, and will break it correctly for the year/month/day conditions?

mforns added a comment.May 8 2019, 2:19 PM

@Amire80
When we use RU for Hive, we have to use a script instead of the query.
That is so, because RU doesn't have yet a Hive client. So we use a bash script that calls hive -e "<query>".
The way RU passes dates (and other params) to the script is different from the way it passes dates to sql files.
In a nutshell, to add a date column in a Hive query (bash script) use:

SELECT
    ...
    '$1' AS date,
    ...

$1 is the first parameter that RU passes to the script, which is the date in question.
You can find this and other infos in the RU documentation:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater
Also, take a look at this example of another Hive-based RU report:
https://github.com/wikimedia/analytics-limn-language-data/blob/master/interlanguage/percent_interlanguage_navigation_curr
You can basically copy the way hive is called (hive -e "..." 2> /dev/null | grep -v parquet.hadoop).
And also, copy the way $1 is used.

I eventually decided to do this in mysql, and perhaps later move everything to hive if it's desirable. This is supposed to make the initial deployment easier.

Change 509007 had a related patch set uploaded (by Amire80; owner: Amire80):
[analytics/limn-language-data@master] Add need-review chart to published CX2 translation

https://gerrit.wikimedia.org/r/509007

Change 509007 merged by jenkins-bot:
[analytics/limn-language-data@master] Add need-review chart to published CX2 translation

https://gerrit.wikimedia.org/r/509007

OK, so now thanks to @mforns the chart works at https://language-reportcard.wmflabs.org/cx2/#cx-2-translations-that-need-review . However, this task ask to put the chart at the "CX2 translations that need review" tab together with the "CX2 translations" tab, as just one tab with two lines. I guess that this may be possible by editing the JSON configuration at https://meta.wikimedia.org/wiki/Config:Dashiki:CX2Translations , but I'm not sure how exactly.

Pginer-WMF removed Amire80 as the assignee of this task.Oct 10 2019, 9:24 AM