Page MenuHomePhabricator

l10nupdate fails and runs twice since 2015-07-14
Closed, ResolvedPublic

Description

From IRC tonight:

02:03 < logmsgbot> !log LocalisationUpdate failed (1.26wmf14) at 2015-07-22 02:03:18+00:00
02:03 < logmsgbot> !log LocalisationUpdate failed (1.26wmf15) at 2015-07-22 02:03:19+00:00

Looking at the SAL, the last time that completed successfully was July 13th, iow, it coincides with the deploy of wmf14.

Event Timeline

greg raised the priority of this task from to Needs Triage.
greg updated the task description. (Show Details)
greg added a project: Deployments.
greg added subscribers: greg, mmodell, thcipriani, demon.

Did someone just run it manually?

[19:37:46] <logmsgbot> !log LocalisationUpdate completed (1.26wmf14) at 2015-07-22 02:37:45+00:00
[20:10:25] <logmsgbot> !log LocalisationUpdate completed (1.26wmf15) at 2015-07-22 03:10:23+00:00

From tonight:

02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-23 02:03:02+00:00
02:03 logmsgbot: LocalisationUpdate failed (1.26wmf15) at 2015-07-23 02:03:03+00:00
02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 23 02:07:12 UTC 2015 (duration 7m 11s)
02:34 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 13s)
02:37 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-23 02:37:55+00:00
03:00 logmsgbot: l10nupdate Synchronized php-1.26wmf15/cache/l10n: (no message) (duration: 07m 24s)
03:04 logmsgbot: LocalisationUpdate completed (1.26wmf15) at 2015-07-23 03:04:48+00:00

are we syncing out broken l10n builds? Or just not updating strings?

I'm not sure how to tell. It is difficult to debug issues with LocalisationUpdate

Did someone just run it manually?

[19:37:46] <logmsgbot> !log LocalisationUpdate completed (1.26wmf14) at 2015-07-22 02:37:45+00:00
[20:10:25] <logmsgbot> !log LocalisationUpdate completed (1.26wmf15) at 2015-07-22 03:10:23+00:00

@Legoktm: I think I did that accidentally.

It looks like it has been run again on each night since it started failing, and succeeded... I'm sitting here just after it's failed, but watching it write logs on tin, and I don't think anyone is running it manually.

No idea. If it is really failing, then it's also retrying successfully.

I don't know what causes it to run twice though.

It should be running once for each deployed branch, which is currently 2 branches from Tuesday afternoon through Thursday afternoon, and 1 from Thursday evening through Tuesday morning.

16:17 < bd808> ori: that l10n.php stuff that Krenair was trying to debug last night is whatever voodoo you did to build the LCStoreStaticArray data files on tin. That was the thing I asked you to look at and undo as you were about to get on a plane to leave Mexico City.
16:18 < bd808> l10nupdate is running twice, once to build the cdbs and once to build the php caches that we aren't using. The php cache run is failing every night I think
16:18 < bd808> Sam poked about on tin and couldn't find the second cron
16:18 < Krenair> there is only one cron
16:19 < Krenair> it doesn't make sense
16:19 < Krenair> where is /usr/local/bin/l10nupdate being called from anyway? the cron uses the -l one
16:21 < Krenair> sorry, "-1", not "-l"
16:21 < bd808> Krenair: my guess would be that ori cron'ed it from his personal crontab but I don't have the server rights to verify that

greg renamed this task from l10nupdate fails since 2015-07-14 to l10nupdate fails and runs twice since 2015-07-14.Jul 30 2015, 6:36 PM
greg set Security to None.

<Krenair> ori, are you running l10nupdate from your crontab on tin?
<ori> ....no
<ori> [tin:/srv/mediawiki-staging] $ crontab -l
<ori> no crontab for ori

I still wonder whether it's being run from a place other than l10nupdate's crontab though. The line there is:
0 2 * * * /usr/local/bin/l10nupdate-1 --verbose >> /var/log/l10nupdatelog/l10nupdate.log 2>&1
Which means that if it's the only place running this, all logs must be going to /var/log/l10nupdatelog/l10nupdate.log. Except we know that when this failure message is logged to SAL, "FAILED" must be echoed, and therefore get logged, which clearly isn't happening:

krenair@tin:~$ zgrep FAILED /var/log/l10nupdatelog/l10nupdate.log-201507*
krenair@tin:~$

It's because mira is trying to run l10nupdate and failing.

krenair@mira:/var/log/l10nupdatelog$ grep FAILED l10nupdate.log -c
97

(Mira fails all mwscript calls, because on deployment servers that reads from mediawiki-staging, which isn't getting updated by scap properly yet)

Change 228125 had a related patch set uploaded (by BryanDavis):
Send $USER and $HOSTNAME with dologmsg messages

https://gerrit.wikimedia.org/r/228125

Change 228126 had a related patch set uploaded (by Alex Monk):
Don't try to run l10nupdate on mira

https://gerrit.wikimedia.org/r/228126

Change 228131 had a related patch set uploaded (by BryanDavis):
Send $USER and $HOSTNAME with !log messages

https://gerrit.wikimedia.org/r/228131

Change 228126 merged by Ori.livneh:
Don't try to run l10nupdate on mira

https://gerrit.wikimedia.org/r/228126

Change 228125 merged by Ori.livneh:
Send $USER and $HOSTNAME with dologmsg messages

https://gerrit.wikimedia.org/r/228125

Krenair closed this task as Resolved.EditedJul 31 2015, 2:22 AM

No sign of a l10nupdate run failing early tonight, am watching the log of tin working through the whole update fine though. Will open a separate bug against the MW extension about the warnings we were seeing (and Ori emailed me a helpful copy of).

Change 228131 merged by jenkins-bot:
Send $USER and $HOSTNAME with !log messages

https://gerrit.wikimedia.org/r/228131