Page MenuHomePhabricator

Extend translations graph to show also deleted translations
Open, MediumPublic

Description

The Content Translation Key metrics dashboard shows in multiple graphs the translations published, but there is no information about how many of those got deleted.

To get a wider picture of what happens with the content created with Content Translation, we want to represent across all languages a information about "deleted translations" next to the published one, representing the translations that have been deleted.

This, combined with similar additions to depict translations "published needing review" (T209868) will provide more perspective on the quality of the content created, and the impact on it that the improvements on the tool may have.

That is, at a glance we can identify when most of the translations published were problematic, identifying peaks og deletions or translations needing review.

Event Timeline

Pginer-WMF removed Amire80 as the assignee of this task.
Pginer-WMF assigned this task to Amire80.

I'm trying to resolve it together with T209868#5232506.

I need the data about published translations, deleted translations, and translations that need review all in one chart with three lines. If I understand correctly, Dashiki is not able to show one chart with data from several files.

I rewrote the queries that are currently used for creating the Published and the Need-review charts as a Bash script, and added a query that shows information about deleted translations to the same script.

This script can run on mwmaint1002, and the output that it produces looks like this:

2019-05-20	Published translations	2734
2019-05-20	Translations that need review	1821
2019-05-20	Deleted translations	85

So, two questions before I submit a patch about it:

  1. Does this look like output that Dashiki can process into a graph with three lines?
  2. In this script I need to query three kinds of databases: the Wikipedia databases (enwiki, eswiki, and all other languages), the EventLogging database, and the wikishared database. What's the right way to connect to the EventLogging database? I can run my script in the shell on mwmaint1002, and I can connect to wikishared and to enwiki, eswiki, etc. using the sql command, but connecting to the EventLogging database, as far as I know, requires running something like mysql -pBLABLA -u research_prod -h db1108.eqiad.wmnet (replace BLABLA with the actual password).

We recommend you look at eventlogging data in hadoop, the mysql eventlogging hosts are to be deprecated probably next quarter. Also, please get in touch with Product-Analytics for recommendations on how to do this work, there are tools such us superset that we think can help here.

I'll be fine with whatever allows me to build a chart that shows the three things:

  1. Published articles (from wikishared)
  2. Articles that need review (from EventLogging)
  3. Articles that were deleted (from wiki databases)

Are all three accessible from Hadoop?

While we're waiting for a proper solution with Dashiki, I made a simple, public-readable spreadsheet that presents all of this:
https://docs.google.com/spreadsheets/d/1hLQyLq3oQ11BLhU3_EJsb3hNgeUG6TutJcXjZ2g22bk/edit#gid=1343851748

While we're waiting for a proper solution with Dashiki, I made a simple, public-readable spreadsheet that presents all of this:

@Amire80 : Just clarifying we are not waiting for any dashiki work, as I mentioned before superset would be a better alternative for dashboarding, an example from cx translation (per https://www.mediawiki.org/wiki/Content_translation/analytics/queries)

Pages created since January 2019 from cx_translations:
https://bit.ly/2XJpJnh

As you can see, Wikishared db is available in superset to be queried, you can follow the example I provided and start creating some of your dashboards there. It will be best to talk to Product-Analytics

CC @kzimmerman so she knows of this work.

We may want to review the task request, probably incorporating the new data in the new Superset dashboard.
We may consider deprecating the specific CX2 tags since that is now the only version (no longer coexisting with CX1).

Pginer-WMF renamed this task from Extend CX2 translations graph to show also deleted translations to Extend translations graph to show also deleted translations.Aug 3 2021, 4:04 PM
Pginer-WMF updated the task description. (Show Details)