Update the external automatic translation Oozie job
The external automatic translation Oozie job has not been able to run since December, since Analytics Engineering updated the URL of the Hive endpoint. The last data in neilpquinn.toledo_pageviews is for 2020-12-21. Source data is deleted after 90 days, so the last day before we start losing data is 2021-03-20.

We need to update and relaunch the job. When we do this, we should also move it to the analytics/wmf-product/jobs repo, and adapt it to take advantage of the new ownership and deployment process (T267940).


Mar 16 2021, 6:30 PM

Before we do this, we should confirm that this job is actually worth keeping. There is no organizational focus on the ExternalGuidance tool anymore, and we've decided there's no need to keep the automated report based on this job (T246250). There's no huge cost to keeping it, but if we can't imagine any future use, there's no need to pay even this moderate cost.

I consulted @Pginer-WMF and @Arrbee, and they believe it's worth continuing to collect this data. I will need to do this soon as we are approaching the point of data loss.

Mostly done now (even if there's less point because of T277781); raising to high because (more) data loss is imminent.

The job is running again, but the daily pageview counts generated by the new job are only 20% of the ones generated by the old job, even though I didn't make any substantive change to the query. I'll need to figure this out if we want to keep producing meaningful data.

I've discussed this with @Pginer-WMF: we do want to keep this job running properly, but I'll do it after T254891/T254891.

I've discussed with Pau, and instead of doing this, we can use the edit tag data to get a rough idea of the impact that the Google Translation entry point for Content Translation has.