Page MenuHomePhabricator

analytics1032 has / mounted ro
Closed, ResolvedPublic

Description

root@analytics1032:~# puppet agent -vt
Error: Could not run Puppet configuration client: Read-only file system - /var/lib/puppet/state/agent_catalog_run.lock
root@analytics1032:~# touch foo
touch: cannot touch ‘foo’: Read-only file system

Event Timeline

faidon raised the priority of this task from to Needs Triage.
faidon updated the task description. (Show Details)
faidon subscribed.

Hm, this happened on analytics1038 a week ago. I rebooted and ran fsck, but found no errors. After that it came back up again. Doing the same here, but twice in 7 days on different nodes doesn't sound good!

1 reboot did not bring this back up. I then dropped into Ubuntu recovery, got a root shell, and tried to do fsck of /dev/mapper/analytics1032--vg-root. This showed no errors. I then continued with normal boot, and now things seem fine.

There's plenty of oom-killer invocations on java processes in syslog. One of those suspiciously looks like the culprit (at Nov 7 17:10:50).

(The oom killing here is just the trigger, the data corruption is still an unrelated kernel bug, the one on 1032 is over a year old, so that might have been fixed in one of the 3.13 stable updates since then.)

I saw the OOM killer too, but wasn't sure how that could cause the root partition to go into read only.

Milimetric moved this task from Incoming to Backlog on the Analytics-Engineering board.
Milimetric moved this task from Backlog to Prioritized on the Analytics-Engineering board.
Milimetric set Security to None.

This was fixed in a reboot, but we don't know why it originally happened.