Page MenuHomePhabricator

Fix external automatic translation notebook
Closed, ResolvedPublic

Description

@Isaac helpfully pointed out that the external automatic translation notebook has been broken since the start of September.

Unless there is no longer any value in this notebook, we should get it running again as soon as we can.

Event Timeline

LGoto triaged this task as Medium priority.Dec 9 2019, 5:48 PM

I discussed this with @Pginer-WMF today, and we do want to prioritize this above other Language work. We are not currently doing any development work, but there will be more community conversations as Google expands Toledo to new languages, so we want this data to be publicly available.

nshahquinn-wmf renamed this task from External automatic translation notebook broken for several months to Fix external automatic translation notebook.Dec 11 2019, 6:13 PM
nshahquinn-wmf claimed this task.

I've been putting a bunch of work into making this work again, but unfortunately it's not quite there yet. I've fixed the underlying job that collects data on Toledo pageviews (T240975), and done a bunch of cleanup on the notebook, but I'm having trouble with the job that reruns the notebook every day, since our move to Kerberos authentication meant changes in how we run cron jobs.

Currently, I'm running a one-off version of the notebook update script (which doesn't run into the Kerberos authentication issues); it may complete successfully, in which case the published notebook will be updated. However, it'll take a while to finish, so I'm going to head out of the office now and pick the project up again after All Hands.

Currently, I'm running a one-off version of the notebook update script (which doesn't run into the Kerberos authentication issues); it may complete successfully, in which case the published notebook will be updated. However, it'll take a while to finish, so I'm going to head out of the office now and pick the project up again after All Hands.

Nope, it failed with several errors. This is turning out to be a much bigger project than I anticipated 🙁

I've figured out the underlying issue in the wmfdata library that was causing some of the queries in the notebook to fail. I've now done a one-time update; the published notebook now has current data.

One graph ("Revert rate for edits from external guidance extension" in section 4C) still has data only through last August. In order to update it, I would need to do some further work on the custom script that checks for reverts.

I would also need to do additional work in order to make it auto-update again.

@Arrbee, I'll wait for the outcome of your meeting with Toby before I spend the time on those remaining steps. It would take 2-4 days of additional work, but a lot of that would benefit other projects using Jupyter dashboards (including T226171) so if it's important to have continued visibility into external guidance edits, it would be worth it.

@Arrbee, I'll wait for the outcome of your meeting with Toby before I spend the time on those remaining steps. It would take 2-4 days of additional work, but a lot of that would benefit other projects using Jupyter dashboards (including T226171) so if it's important to have continued visibility into external guidance edits, it would be worth it.

@Arrbee has concluded that we don't need to complete this. I filed T246250 for the work involved in cleanly sunsetting the dashboard.

Big thanks @nshahquinn-wmf for working on this! I'll jump to the sunset task if I have other questions/requests, but if it's very simple to do, I would request that you list the basic steps / things that would have to be fixed to get this notebook running so that I (or someone else) can more easily take up the work if our team deems this notebook useful in the near future.

@Isaac, makes sense! I've added collecting the open issues to T246250.