Page MenuHomePhabricator

access.log files are not being updated
Closed, ResolvedPublic

Description

The access.log files for tools don't show new lines since yesterday. (Related to NFS outage?)

Event Timeline

Emijrp raised the priority of this task from to Needs Triage.
Emijrp updated the task description. (Show Details)
Emijrp added a project: Cloud-Services.
Emijrp added a subscriber: Emijrp.

Possibly. The error.log file descriptors are still functional, but access.log is not.

tools.gerrit-reviewer-bot@tools-webgrid-lighttpd-1402:~$ ls -l /proc/8835/fd
(...)
l-wx------ 1 tools.gerrit-reviewer-bot tools.gerrit-reviewer-bot 64 Aug 14 15:39 2 -> /data/project/gerrit-reviewer-bot/error.log
l-wx------ 1 tools.gerrit-reviewer-bot tools.gerrit-reviewer-bot 64 Aug 14 15:39 3 -> /data/project/gerrit-reviewer-bot/error.log
(...)
l-wx------ 1 tools.gerrit-reviewer-bot tools.gerrit-reviewer-bot 64 Aug 14 15:39 5 -> /data/project/gerrit-reviewer-bot/access.log (deleted)

Restarting the webservice solves this issue, so we should probably reschedule all webservice tasks.

And looking at your work in the past months, you probably already have a command that lists all web service jobs started before $time? :-)

Sort of.

qstat -f -xml | grep 'tools-webgrid' | sed -e 's/.*@//' | sed -e 's/<.*//' > webgrid_hosts
qhost -j -h `cat webgrid_hosts` |sed -e 's/^\s*//' | cut -d ' ' -f 1|egrep ^[0-9] > webgrid_jobs
sort webgrid_jobs > webgrid_jobs_sorted # spreads affected hosts a bit

for i in `cat webgrid_jobs_sorted`; do qmod -rj $i; sleep 5; done

this will take approx 40 minutes.

scfc assigned this task to valhallasw.
scfc set Security to None.