Page MenuHomePhabricator

ErfgoedBot doesn't work since 5 January 2020
Closed, ResolvedPublic

Description

ErfgoedBot doesn't work since 5 January 2020. (Global user contributions)

Reported:

No answer yet.

Event Timeline

The main focus right now has been around T224405: Migrate heritage to py3 and the aubsequent T243741: Migrate heritage to Kubernetes 2020 cluster. I'll try to take a look at the logs but unless it is trivial it will probably have to wait on at least the first of those

Reported:

@JeanFred Should we (soft) redirect all of these discussion pages to the Commons one?

From January 5th the cron job doesnt run because there is a job names 'update_monuments' already active

Mentioned in SAL (#wikimedia-cloud) [2020-02-05T08:45:23Z] <wm-bot> <lokal-profil> Killed update_monuments job (T244213)

Contents of logs/update_monuments.log

2020-01-04_20:18:06 Done with the update!
[Sun Jan  5 03:00:14 2020] there is a job named 'update_monuments' already active
[Mon Jan  6 03:00:12 2020] there is a job named 'update_monuments' already active
[Tue Jan  7 03:00:18 2020] there is a job named 'update_monuments' already active
[Wed Jan  8 03:00:15 2020] there is a job named 'update_monuments' already active
[Thu Jan  9 03:00:16 2020] there is a job named 'update_monuments' already active
[Fri Jan 10 03:00:15 2020] there is a job named 'update_monuments' already active
[Sat Jan 11 03:00:14 2020] there is a job named 'update_monuments' already active
[Sun Jan 12 03:00:12 2020] there is a job named 'update_monuments' already active
[Mon Jan 13 03:00:16 2020] there is a job named 'update_monuments' already active
[Tue Jan 14 03:00:13 2020] there is a job named 'update_monuments' already active
[Wed Jan 15 03:00:14 2020] there is a job named 'update_monuments' already active
[Thu Jan 16 03:00:20 2020] there is a job named 'update_monuments' already active
[Fri Jan 17 03:00:11 2020] there is a job named 'update_monuments' already active
[Sat Jan 18 03:00:16 2020] there is a job named 'update_monuments' already active
[Sun Jan 19 03:00:16 2020] there is a job named 'update_monuments' already active
[Mon Jan 20 03:00:18 2020] there is a job named 'update_monuments' already active
[Tue Jan 21 03:00:17 2020] there is a job named 'update_monuments' already active
[Wed Jan 22 03:00:16 2020] there is a job named 'update_monuments' already active
[Thu Jan 23 03:00:12 2020] there is a job named 'update_monuments' already active
[Fri Jan 24 03:00:13 2020] there is a job named 'update_monuments' already active
[Sat Jan 25 03:00:13 2020] there is a job named 'update_monuments' already active
[Sun Jan 26 03:00:14 2020] there is a job named 'update_monuments' already active
[Mon Jan 27 03:00:17 2020] there is a job named 'update_monuments' already active
[Tue Jan 28 03:00:15 2020] there is a job named 'update_monuments' already active
[Wed Jan 29 03:00:18 2020] there is a job named 'update_monuments' already active
[Thu Jan 30 03:00:14 2020] there is a job named 'update_monuments' already active
[Fri Jan 31 03:00:14 2020] there is a job named 'update_monuments' already active
[Sat Feb  1 03:00:14 2020] there is a job named 'update_monuments' already active
[Sun Feb  2 03:00:16 2020] there is a job named 'update_monuments' already active
[Mon Feb  3 03:00:16 2020] there is a job named 'update_monuments' already active
[Tue Feb  4 03:00:13 2020] there is a job named 'update_monuments' already active
[Wed Feb  5 03:00:14 2020] there is a job named 'update_monuments' already active

Not the first time this happens − same thing that we noticed with @Multichill at Wikimania.

I wanted to unstuck it but there was nothing displayed in qstat 🤔 did someone beat me to it? Anyhow, resubmited an updated.

Mentioned in SAL (#wikimedia-cloud) [2020-02-05T08:50:17Z] <wm-bot> <jeanfred> Triggered a manual jsub update_monuments.sh (T244213)

did someone beat me to it?

Yup, @Lokal_Profil had ;)

Killed the running job, lets see if it works by itself when it starts at 03 UTC.

In case it hangs again here is an overview of things which failed on the 4th (from the logs):

  • Sparql Harvesting of pt-wd
  • SQL warning on rs_sr harvest
  • SQL warning on statistics output
  • Massive failures when writing monuments without id lists (SQL connection lost) starting from be-vlg
  • Massive failures when categorizing (unclear exactly what but looks like sql database went away)
  • python error in mysql call for making stats for categorization
  • python error in mysql call for images of monuments without id
  • lost db connection on creating the dump
  • lots of invalid coordinates in the au_en dataset

Looking at the user contributions the update seems to have run ok, terminating a few hours shy of the scheduled one starting. [so ~12.5hours for a run]

Yep, looks all good.

Updating categorization statistics. Total: 229 Categorized: 0 Leftover: 229

that’s not so good >_> But a story for another ticket.

Shall we close as resolved @Lokal_Profil ?

Yep, looks all good.

Updating categorization statistics. Total: 229 Categorized: 0 Leftover: 229

that’s not so good >_> But a story for another ticket.

Shall we close as resolved @Lokal_Profil ?

I think the categorization problems falls under T244333: Re-enable categorization of Ukraine images, however it looks (based on global contribs) as though the cron job didn't start last night.

6 February evening, Missing commons category links and Images of cultural heritage monuments in Czech Republic without id were updated almost correctly but Nevyužité obrázky failed (was not updated).

Around the midnight 5/6 February, all 3 pages were updated correctly.

  • SQL warning on statistics output
  • Massive failures when writing monuments without id lists (SQL connection lost) starting from be-vlg
  • Massive failures when categorizing (unclear exactly what but looks like sql database went away)

@JeanFred Do you think a connection.ping(reconnect=True) at regular intervals (e.g. prior to each sql query) would work to alleviate some of these issues?

Ciell subscribed.

@JeanFred : I can see Erfgoedbot is working again.
Can I close this task?

I believe so yes :)