Page MenuHomePhabricator

Investigate the phabricator-prod-1001 alert
Closed, ResolvedPublic

Description

There is a permanent "Project devtools instance phabricator-prod-1001 is down" alert in Alertmanager [1]. The phabricator-prod-1001 is active in the Openstack browser [2]. Can we silence the alert, shut down the host or maybe both?

[1] https://alerts.wikimedia.org/?q=alertname%3DInstanceDown&q=project%3Ddevtools&q=%40receiver%3Dblackhole
[2] https://openstack-browser.toolforge.org/server/phabricator-prod-1001.devtools.eqiad1.wikimedia.cloud

Event Timeline

@brennen now that we have phorge-1001 in devtools, can phabricator-prod-1001 be decommissioned?

Jelto claimed this task.
Jelto subscribed.

The metric is flapping https://prometheus.wmcloud.org/graph?g0.expr=up%7Bjob%3D%22node%22%2C%20instance%3D%22phabricator-prod-1001%22%7D&g0.tab=0&g0.stacked=0&g0.range_input=30m&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D.

Also the services confd.service and networking.service are failing. There was a ipv6 address configured in /etc/network/interfaces. I commented this ipv6 address out and moved the source /etc/network/interfaces.d/* to the bottom of the file (which contains some generic cloud-init settings which should run after our custom configuration). After a restart the instance is "up" again.

I'll close this task.

@brennen now that we have phorge-1001 in devtools, can phabricator-prod-1001 be decommissioned?

I added this to the topic for the next Collab-RelEng sync meeting.

@brennen now that we have phorge-1001 in devtools, can phabricator-prod-1001 be decommissioned?

IIUC yes; see also T334519#8787677

https://phab.wmflabs.org/ is pointed at phabricator-prod-1001, where we did an in-place upgrade. I believe the phorge-1001 instance was a from-scratch install that dzahn did early in the process of exploring the migration.

I'm not sure why the alert is flapping here, since phab.wmflabs.org is up at the moment. (Edit: Failed to fully read Jelto's comment above.)

We currently have an alert again for this for some time. Noticed today in metrics review.

https://phab.wmflabs.org/ is pointed at phabricator-prod-1001, where we did an in-place upgrade. I believe the phorge-1001 instance was a from-scratch install that dzahn did early in the process of exploring the migration.

Yes, phorge-1001 was the first "proof of concept" install of phorge, using puppet but a separate class / role from the phab production role.