Page MenuHomePhabricator

Extend CX2 translations graph to show also deleted translations
Open, MediumPublic


The graph for CX2 translations shows the number of articles that editors create using version 2 of Content translation over time.

To get a wider picture of what happens with the content created with Content Translation, we want to include a line in the graph for "deleted translations", representing the translations created with CX2 that have been deleted.

This, combined with similar additions to depict translations "published needing review" (T209868) will provide more perspective on the quality of the content created, and the impact on it that the measures we may incorporate to CX2 may have.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 22 2018, 11:07 AM
Pginer-WMF triaged this task as Medium priority.Nov 22 2018, 11:08 AM
Pginer-WMF removed Amire80 as the assignee of this task.
Pginer-WMF assigned this task to Amire80.
Nikerabbit updated the task description. (Show Details)Jan 10 2019, 12:47 PM

I'm trying to resolve it together with T209868#5232506.

I need the data about published translations, deleted translations, and translations that need review all in one chart with three lines. If I understand correctly, Dashiki is not able to show one chart with data from several files.

I rewrote the queries that are currently used for creating the Published and the Need-review charts as a Bash script, and added a query that shows information about deleted translations to the same script.

This script can run on mwmaint1002, and the output that it produces looks like this:

2019-05-20	Published translations	2734
2019-05-20	Translations that need review	1821
2019-05-20	Deleted translations	85

So, two questions before I submit a patch about it:

  1. Does this look like output that Dashiki can process into a graph with three lines?
  2. In this script I need to query three kinds of databases: the Wikipedia databases (enwiki, eswiki, and all other languages), the EventLogging database, and the wikishared database. What's the right way to connect to the EventLogging database? I can run my script in the shell on mwmaint1002, and I can connect to wikishared and to enwiki, eswiki, etc. using the sql command, but connecting to the EventLogging database, as far as I know, requires running something like mysql -pBLABLA -u research_prod -h db1108.eqiad.wmnet (replace BLABLA with the actual password).
Nuria added a subscriber: Nuria.Jun 13 2019, 3:10 PM

We recommend you look at eventlogging data in hadoop, the mysql eventlogging hosts are to be deprecated probably next quarter. Also, please get in touch with Product-Analytics for recommendations on how to do this work, there are tools such us superset that we think can help here.

I'll be fine with whatever allows me to build a chart that shows the three things:

  1. Published articles (from wikishared)
  2. Articles that need review (from EventLogging)
  3. Articles that were deleted (from wiki databases)

Are all three accessible from Hadoop?

While we're waiting for a proper solution with Dashiki, I made a simple, public-readable spreadsheet that presents all of this:

Nuria added a subscriber: kzimmerman.EditedJul 15 2019, 10:36 PM

While we're waiting for a proper solution with Dashiki, I made a simple, public-readable spreadsheet that presents all of this:

@Amire80 : Just clarifying we are not waiting for any dashiki work, as I mentioned before superset would be a better alternative for dashboarding, an example from cx translation (per

Pages created since January 2019 from cx_translations:

As you can see, Wikishared db is available in superset to be queried, you can follow the example I provided and start creating some of your dashboards there. It will be best to talk to Product-Analytics

CC @kzimmerman so she knows of this work.

Pginer-WMF removed Amire80 as the assignee of this task.Oct 10 2019, 9:24 AM