Page MenuHomePhabricator

Extend CX2 translations graph to show also published translations that need review
Open, HighPublic


The graph for CX2 translations shows the number of articles that editors create using version 2 of Content translation over time.

Some of the published articles may be published with too much unmodified content (T190279). These are added to a special tracking category (T190798) on each wiki, but it is hard to get the overall number for all languages. Extending the graph to include those articles would be useful. To support this, the following is proposed:

  • Keep the current line for published translations. We may want to rename it from "cx2_published_translations" to "All published" or similar to emphasise that this line includes all translations (those marked as having too much unmodified content, and those not marked in such way).
  • Add a new line named "Published needing review" to represent the articles created despite the "too much unmodified content" warning.

The image below illustrates the expected result:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 19 2018, 5:59 PM
Pginer-WMF triaged this task as Medium priority.Nov 19 2018, 5:59 PM
Pginer-WMF edited projects, added ContentTranslation; removed CX-analytics.
Pginer-WMF edited projects, added CX-analytics; removed ContentTranslation.
Pginer-WMF added a subscriber: Amire80.

Change 476480 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] Add a counter for published translations with unreviewd MT

Change 476480 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Add a counter for published translations with unreviewed MT

Petar.petkovic moved this task from In Review to QA on the Language-Team (Language-2018-October-December) board.

@Petar.petkovic, I don't see the new line in the graph. Is this waiting for deployment or is further development needed?

I checked today (the task is on my list of tasks to check after wmf.8 deployment) - the link displays the error in the Console:

Wed, 12 Dec 2018 23:23:12 GMT

TypeError: e is undefined scripts.js:36:7482

The link was working last week.

This is what I think happened: @Amire80 updated the tab names, so that the new link is (cf. but the link in the task description was not updated.

Thx, @Nikerabbit - is working, but the new graph "Published needing review" is still not present.

santhosh added a subscriber: santhosh.

I don't know if @Amire80 implemented anything for the language-reportcard. The patches above are my patches that adds a graph to grafana See the graph about "Published translations with high amount of unreviewed MT"

I will unassign and assign to Amir

Given that this may take some more time due to technical complexities, I created a ticket for an initial report that will provide some initial insights and useful data for upcoming communications: T218020: Measure percentage of translations published with and without the expected level of modified content

Pginer-WMF raised the priority of this task from Medium to High.Apr 5 2019, 4:40 PM

Change 502312 had a related patch set uploaded (by Amire80; owner: Amire80):
[mediawiki/extensions/ContentTranslation@master] Log need-review events

Change 502312 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Log need-review events

Amire80 added a subscriber: mforns.May 8 2019, 1:30 PM

So I have the events being logged, and to show them on the chart, I'll have to do something like this regularly:

  count(distinct(concat_ws(',', event.sourceTitle, event.sourceLanguage, event.targetTitle, event.targetLanguage, event.token))) as count
  event.action = 'need-review' and
  year = 2019 and
  month = 5 and
  day = 3;

This will have to be in hive. As a first step, I'll probably make a new dashboard for this, and once it works, I'll merge the existing dashboard into it.

A question for @mforns: The query above selects just one number, per day. If I understand correctly, the output is supposed to include two columns: a date and a number. If I'm going to use RU, what's the right way to add the number? And what's the most robust way to pass this date to the query in a way that will show it as a column, and will break it correctly for the year/month/day conditions?

mforns added a comment.May 8 2019, 2:19 PM

When we use RU for Hive, we have to use a script instead of the query.
That is so, because RU doesn't have yet a Hive client. So we use a bash script that calls hive -e "<query>".
The way RU passes dates (and other params) to the script is different from the way it passes dates to sql files.
In a nutshell, to add a date column in a Hive query (bash script) use:

    '$1' AS date,

$1 is the first parameter that RU passes to the script, which is the date in question.
You can find this and other infos in the RU documentation:
Also, take a look at this example of another Hive-based RU report:
You can basically copy the way hive is called (hive -e "..." 2> /dev/null | grep -v parquet.hadoop).
And also, copy the way $1 is used.

I eventually decided to do this in mysql, and perhaps later move everything to hive if it's desirable. This is supposed to make the initial deployment easier.

Change 509007 had a related patch set uploaded (by Amire80; owner: Amire80):
[analytics/limn-language-data@master] Add need-review chart to published CX2 translation

Change 509007 merged by jenkins-bot:
[analytics/limn-language-data@master] Add need-review chart to published CX2 translation

OK, so now thanks to @mforns the chart works at . However, this task ask to put the chart at the "CX2 translations that need review" tab together with the "CX2 translations" tab, as just one tab with two lines. I guess that this may be possible by editing the JSON configuration at , but I'm not sure how exactly.

Pginer-WMF removed Amire80 as the assignee of this task.Oct 10 2019, 9:24 AM