Page MenuHomePhabricator

Daily errors on webperf1002 & webperf2002 /usr/local/bin/arclamp-generate-svgs > /dev/null
Closed, ResolvedPublic

Description

It seems to be happening since December, giving output:

Can't open /srv/xenon/logs/hourly/2020-01-08_20.excimer.load.log: No such file or directory at /usr/local/bin/flamegraph.pl line 491.
ERROR: No stack counts found

Please ask if you need more context for root@ mails in case you don't have access to.

I am not advocating to 2> /dev/null, but this seems like a real issue you may want to fix or workaround. I saw some other tickets related to performance logging and monitoring, but no mention of this.

Event Timeline

jcrespo created this task.Jan 27 2020, 11:59 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 27 2020, 11:59 AM
jijiki triaged this task as Medium priority.Jan 28 2020, 12:53 PM

I've ruled out any structural failure (e.g. the script looking in the wrong directory) because it does still reliably do what it's supposed to do: Create SVG Flame Graphs files for performance.wikimedia.org.

It is most likely a race condition because of old log files being removed on rotation.

Unless it causes the script to stop early and it happens every time, then it has no impact on the application because the purpose of arclamp-generate-svgs is to ensure SVG files exist for each of the incoming log files. If a log file no longer exists, it also doesn't need an SVG any more, and the next run it will get to the other files (if it caused it to stop early).

Either way, we should fix it by making sure the error is caught, ignored and let the script move on.

Krinkle moved this task from Untriaged to Jan2020/1.35-wmf.14 on the Wikimedia-production-error board.
Krinkle added a subscriber: dpifke.

Assuming no impact on actual functionality, this could have lower priority, not have the Wikimedia-production-error and be lower priority, just be kept (and eventually be solved or workarounded) to try to reduce root cron spam.

I consider log spam as impact. It's okay, we'll get it fixed. We only have one other open prod-error and I'd rather we not get in the habit of building up a backlog.

Thanks, let us know how we can help.

Gilles moved this task from Inbox to Doing on the Performance-Team board.Jan 28 2020, 11:24 PM
Gilles assigned this task to dpifke.Jan 28 2020, 11:26 PM

Change 568117 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] Fix log spam from arclamp-generate-svgs

https://gerrit.wikimedia.org/r/568117

Change 568117 merged by Jcrespo:
[operations/puppet@production] Fix log spam from arclamp-generate-svgs

https://gerrit.wikimedia.org/r/568117

Krinkle closed this task as Resolved.Jan 29 2020, 7:41 PM