Page MenuHomePhabricator

Host mw2250 is not in mediawiki-installation dsh group
Closed, ResolvedPublic0 Estimated Story Points

Description

CRITICAL (for 5d 8h 23m 54s)

From https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=mw2250&service=mediawiki-installation+DSH+group

I don't understand what needs to be done from the linked doc: https://wikitech.wikimedia.org/wiki/Application_servers#Apache_setup_checklist
The runbook should probably be improved.

A quick search shows: T82798 but the file mentioned in the Gerrit CR doesn't seem to exist anymore. (that's also why I CCed @Dzahn )
Bryan pointed me to this depool from @MoritzMuehlenhoff - https://tools.wmflabs.org/sal/log/AWu4jHBdOwpQ-3Pkztx-

Event Timeline

ayounsi triaged this task as Medium priority.Jul 9 2019, 1:18 AM
ayounsi created this task.

The reason it's depooled is it had a degraded RAID (T226948) i assume.

server could be synced again because a new scap version was deployed (T228482) which fixes scap pull (T228328).

after doing a scap pull this happens automatically:

20:13 <+icinga-wm> RECOVERY - mediawiki-installation DSH group on mw2250 is OK: OK https://wikitech.wikimedia.org/wiki/Application_servers%23Apache_setup_checklist

Nowadays dsh groups are not a file anymore, they are just the state in conftool.

the issue is resolved, keeping it open to improve the docs

Change 526561 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mediawiki: use a better notes_url for the "DSH groups" Icinga alert

https://gerrit.wikimedia.org/r/526561

Change 526561 merged by Dzahn:
[operations/puppet@production] mediawiki: use a better notes_url for the "DSH groups" Icinga alert

https://gerrit.wikimedia.org/r/526561