Page MenuHomePhabricator

Update the external automatic translation Oozie job
Closed, DeclinedPublic


The external automatic translation Oozie job has not been able to run since December, since Analytics Engineering updated the URL of the Hive endpoint. The last data in neilpquinn.toledo_pageviews is for 2020-12-21. Source data is deleted after 90 days, so the last day before we start losing data is 2021-03-20.

We need to update and relaunch the job. When we do this, we should also move it to the analytics/wmf-product/jobs repo, and adapt it to take advantage of the new ownership and deployment process (T267940).


Due Date
Mar 16 2021, 6:30 PM

Event Timeline

Before we do this, we should confirm that this job is actually worth keeping. There is no organizational focus on the ExternalGuidance tool anymore, and we've decided there's no need to keep the automated report based on this job (T246250). There's no huge cost to keeping it, but if we can't imagine any future use, there's no need to pay even this moderate cost.

LGoto triaged this task as Medium priority.Jan 12 2021, 6:17 PM
nshahquinn-wmf set Due Date to Mar 16 2021, 6:30 PM.
nshahquinn-wmf added subscribers: Pginer-WMF, Arrbee.

I consulted @Pginer-WMF and @Arrbee, and they believe it's worth continuing to collect this data. I will need to do this soon as we are approaching the point of data loss.

nshahquinn-wmf raised the priority of this task from Medium to High.Mar 18 2021, 6:57 PM

Mostly done now (even if there's less point because of T277781); raising to high because (more) data loss is imminent.

The job is running again, but the daily pageview counts generated by the new job are only 20% of the ones generated by the old job, even though I didn't make any substantive change to the query. I'll need to figure this out if we want to keep producing meaningful data.

nshahquinn-wmf lowered the priority of this task from High to Medium.Mar 24 2021, 12:39 PM

I've discussed this with @Pginer-WMF: we do want to keep this job running properly, but I'll do it after T254891/T254891.

I've discussed with Pau, and instead of doing this, we can use the edit tag data to get a rough idea of the impact that the Google Translation entry point for Content Translation has.