Page MenuHomePhabricator

ops-monitoring-bot (Operations Monitoring Bot)
UserBot

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Aug 12 2016, 1:45 PM (179 w, 3 d)
Roles
Bot
Availability
Available
LDAP User
Unknown
MediaWiki User
Unknown

Bot managed by Operations for automated interaction with Phabricator from monitoring tools.

Recent Activity

Today

ops-monitoring-bot added a comment to T224551: Migrate URL downloaders to Buster.

cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: actinium.wikimedia.org

  • actinium.wikimedia.org (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Wiped bootloaders
    • Shutdown issued. Verify it manually, verification not yet supported
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Mon, Jan 20, 12:08 PM · Patch-For-Review, Operations
ops-monitoring-bot added a comment to T224551: Migrate URL downloaders to Buster.

cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: alsafi.wikimedia.org

  • alsafi.wikimedia.org (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Wiped bootloaders
    • Shutdown issued. Verify it manually, verification not yet supported
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Mon, Jan 20, 11:41 AM · Patch-For-Review, Operations
ops-monitoring-bot added a comment to T224551: Migrate URL downloaders to Buster.

cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: alcyone.wikimedia.org

  • alcyone.wikimedia.org (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Wiped bootloaders
    • Shutdown issued. Verify it manually, verification not yet supported
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Mon, Jan 20, 10:04 AM · Patch-For-Review, Operations
ops-monitoring-bot added a comment to T224551: Migrate URL downloaders to Buster.

cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: aluminium.wikimedia.org

  • aluminium.wikimedia.org (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Wiped bootloaders
    • Shutdown issued. Verify it manually, verification not yet supported
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Mon, Jan 20, 10:01 AM · Patch-For-Review, Operations

Fri, Jan 17

ops-monitoring-bot added a comment to T239151: Gerrit VM to test data migration.

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: gerrit-test.wikimedia.org

  • gerrit-test.wikimedia.org (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Failed to shutdown, manual intervention required: Cumin execution failed (exit_code=2)
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Fri, Jan 17, 6:19 PM · Patch-For-Review, Gerrit, vm-requests, Operations

Thu, Jan 16

ops-monitoring-bot added a comment to T242481: d-i fails to install on servers with BRCM 2P 1G BT + 2P 10G SFP NDC.

Completed auto-reimage of hosts:

['es2020.codfw.wmnet']
Thu, Jan 16, 12:47 PM · Patch-For-Review, Operations, DBA, ops-codfw
ops-monitoring-bot added a comment to T242481: d-i fails to install on servers with BRCM 2P 1G BT + 2P 10G SFP NDC.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['es2020.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202001161221_marostegui_159494.log.

Thu, Jan 16, 12:22 PM · Patch-For-Review, Operations, DBA, ops-codfw
ops-monitoring-bot added a comment to T242481: d-i fails to install on servers with BRCM 2P 1G BT + 2P 10G SFP NDC.

Completed auto-reimage of hosts:

['es2020.codfw.wmnet']
Thu, Jan 16, 12:21 PM · Patch-For-Review, Operations, DBA, ops-codfw
ops-monitoring-bot added a comment to T242481: d-i fails to install on servers with BRCM 2P 1G BT + 2P 10G SFP NDC.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['es2020.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202001161221_marostegui_159400.log.

Thu, Jan 16, 12:21 PM · Patch-For-Review, Operations, DBA, ops-codfw
ops-monitoring-bot added a comment to T239835: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster.

cookbooks.sre.hosts.decommission executed by akosiaris@cumin1001 for hosts: etcd[1004-1006].eqiad.wmnet

  • etcd1004.eqiad.wmnet (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Wiped bootloaders
    • Shutdown issued. Verify it manually, verification not yet supported
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • etcd1005.eqiad.wmnet (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Wiped bootloaders
    • Shutdown issued. Verify it manually, verification not yet supported
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • etcd1006.eqiad.wmnet (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Wiped bootloaders
    • Shutdown issued. Verify it manually, verification not yet supported
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Thu, Jan 16, 11:20 AM · Patch-For-Review, serviceops
ops-monitoring-bot added a comment to T242481: d-i fails to install on servers with BRCM 2P 1G BT + 2P 10G SFP NDC.

Completed auto-reimage of hosts:

['es2020.codfw.wmnet']
Thu, Jan 16, 11:15 AM · Patch-For-Review, Operations, DBA, ops-codfw
ops-monitoring-bot added a comment to T242481: d-i fails to install on servers with BRCM 2P 1G BT + 2P 10G SFP NDC.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['es2020.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202001161014_marostegui_138642.log.

Thu, Jan 16, 10:14 AM · Patch-For-Review, Operations, DBA, ops-codfw

Tue, Jan 14

ops-monitoring-bot added a comment to T242481: d-i fails to install on servers with BRCM 2P 1G BT + 2P 10G SFP NDC.

Completed auto-reimage of hosts:

['es2020.codfw.wmnet']
Tue, Jan 14, 4:53 PM · Patch-For-Review, Operations, DBA, ops-codfw
ops-monitoring-bot added a comment to T242481: d-i fails to install on servers with BRCM 2P 1G BT + 2P 10G SFP NDC.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

es2020.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202001141627_marostegui_202850_es2020_codfw_wmnet.log.

Tue, Jan 14, 4:27 PM · Patch-For-Review, Operations, DBA, ops-codfw
ops-monitoring-bot added a comment to T242702: Test MariaDB 10.4 in production.

Completed auto-reimage of hosts:

['db1107.eqiad.wmnet']
Tue, Jan 14, 3:07 PM · DBA
ops-monitoring-bot added a comment to T242702: Test MariaDB 10.4 in production.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

db1107.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202001141449_marostegui_183985_db1107_eqiad_wmnet.log.

Tue, Jan 14, 2:49 PM · DBA

Mon, Jan 13

ops-monitoring-bot added a comment to T224567: decom debug proxies (was: Migrate debug proxies to Stretch/Buster).

cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: hassium.eqiad.wmnet

  • hassium.eqiad.wmnet (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Wiped bootloaders
    • Shutdown issued. Verify it manually, verification not yet supported
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Mon, Jan 13, 3:18 PM · serviceops, Operations
ops-monitoring-bot added a comment to T224567: decom debug proxies (was: Migrate debug proxies to Stretch/Buster).

cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: hassaleh.codfw.wmnet

  • hassaleh.codfw.wmnet (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Failed to shutdown, manual intervention required: Cumin execution failed (exit_code=2)
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Mon, Jan 13, 2:58 PM · serviceops, Operations

Sat, Jan 11

ops-monitoring-bot added projects to T242511: Degraded RAID on ms-be1039: Operations, ops-eqiad.
Sat, Jan 11, 2:47 PM · SRE-swift-storage, ops-eqiad, Operations

Fri, Jan 10

ops-monitoring-bot added projects to T242472: Degraded RAID on cloudvirt1013: Operations, ops-eqiad.
Fri, Jan 10, 10:18 PM · cloud-services-team (Hardware), ops-eqiad, Operations
ops-monitoring-bot added projects to T242471: Degraded RAID on ms-be1035: Operations, ops-eqiad.
Fri, Jan 10, 9:58 PM · SRE-swift-storage, ops-eqiad, Operations

Wed, Jan 8

ops-monitoring-bot added a comment to T238957: decommission phab1003.eqiad.wmnet.

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: phab1003.eqiad.wmnet

  • phab1003.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Wed, Jan 8, 9:30 PM · Patch-For-Review, serviceops, Release-Engineering-Team

Sat, Jan 4

ops-monitoring-bot added projects to T241886: Degraded RAID on cloudvirt1024: Operations, ops-eqiad.
Sat, Jan 4, 3:42 PM · ops-eqiad, Operations
ops-monitoring-bot added projects to T241884: Degraded RAID on cloudvirt1024: Operations, ops-eqiad.
Sat, Jan 4, 2:25 PM · cloud-services-team (Hardware), Patch-For-Review, ops-eqiad, Operations
ops-monitoring-bot added projects to T241881: Degraded RAID on cloudvirt1024: Operations, ops-eqiad.
Sat, Jan 4, 1:53 PM · ops-eqiad, Operations
ops-monitoring-bot added projects to T241873: Degraded RAID on cloudvirt1024: Operations, ops-eqiad.
Sat, Jan 4, 7:14 AM · cloud-services-team (Kanban), ops-eqiad, Operations

Thu, Jan 2

ops-monitoring-bot added projects to T241714: Degraded RAID on ms-be2035: ops-codfw, Operations.
Thu, Jan 2, 10:08 AM · Operations, ops-codfw
ops-monitoring-bot added a comment to T239805: ms-fe2007 NIC failure.

Completed auto-reimage of hosts:

['ms-fe2007.codfw.wmnet']
Thu, Jan 2, 9:33 AM · User-fgiunchedi, ops-codfw, Operations
ops-monitoring-bot added a comment to T239805: ms-fe2007 NIC failure.

Script wmf-auto-reimage was launched by filippo on cumin1001.eqiad.wmnet for hosts:

ms-fe2007.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202001020923_filippo_73650_ms-fe2007_codfw_wmnet.log.

Thu, Jan 2, 9:23 AM · User-fgiunchedi, ops-codfw, Operations

Sun, Dec 29

ops-monitoring-bot added projects to T241535: Degraded RAID on ms-be2035: ops-codfw, Operations.
Sun, Dec 29, 1:30 PM · SRE-swift-storage, Operations, ops-codfw
ops-monitoring-bot added projects to T241534: Degraded RAID on ms-be2035: ops-codfw, Operations.
Sun, Dec 29, 1:11 PM · SRE-swift-storage, Operations, ops-codfw

Sat, Dec 28

ops-monitoring-bot added projects to T241506: Degraded RAID on db1100: Operations, ops-eqiad.
Sat, Dec 28, 6:02 AM · DBA, ops-eqiad, Operations

Fri, Dec 27

ops-monitoring-bot added projects to T241494: Degraded RAID on cloudvirt1014: Operations, ops-eqiad.
Fri, Dec 27, 5:32 PM · ops-eqiad, Operations

Mon, Dec 23

ops-monitoring-bot added a comment to T224557: Migrate ldap/corp replicas to Stretch/Buster.

cookbooks.sre.hosts.decommission executed by jmm@cumin1001 for hosts: dubnium.wikimedia.org

  • dubnium.wikimedia.org (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Failed to shutdown, manual intervention required: Cumin execution failed (exit_code=2)
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Mon, Dec 23, 11:06 AM · Operations

Dec 19 2019

ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Completed auto-reimage of hosts:

['mw1286.eqiad.wmnet', 'mw1269.eqiad.wmnet', 'mw2235.codfw.wmnet', 'mw2216.codfw.wmnet']
Dec 19 2019, 2:27 PM · Operations, serviceops
ops-monitoring-bot added a comment to T224585: Migrate labmon* to Buster.

Completed auto-reimage of hosts:

['cloudmetrics1001.eqiad.wmnet']
Dec 19 2019, 2:21 PM · Cloud-VPS (Debian Jessie Deprecation), cloud-services-team (Kanban), Operations
ops-monitoring-bot added a comment to T224585: Migrate labmon* to Buster.

Script wmf-auto-reimage was launched by phamhi on cumin1001.eqiad.wmnet for hosts:

labmon1001.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201912191353_phamhi_153387_labmon1001_eqiad_wmnet.log.

Dec 19 2019, 1:53 PM · Cloud-VPS (Debian Jessie Deprecation), cloud-services-team (Kanban), Operations
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Completed auto-reimage of hosts:

['cp2023.codfw.wmnet']
Dec 19 2019, 1:25 PM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Completed auto-reimage of hosts:

['cp1089.eqiad.wmnet']
Dec 19 2019, 1:24 PM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp1089.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912191259_ema_139391.log.

Dec 19 2019, 12:59 PM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp2023.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912191258_ema_139094.log.

Dec 19 2019, 12:58 PM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Completed auto-reimage of hosts:

['cp3064.esams.wmnet']
Dec 19 2019, 9:56 AM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['mw1286.eqiad.wmnet', 'mw1269.eqiad.wmnet', 'mw2235.codfw.wmnet', 'mw2216.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912190955_jiji_101664.log.

Dec 19 2019, 9:56 AM · Operations, serviceops
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Completed auto-reimage of hosts:

['mw1321.eqiad.wmnet', 'mw1320.eqiad.wmnet', 'mw1314.eqiad.wmnet', 'mw2271.codfw.wmnet', 'mw2255.codfw.wmnet']
Dec 19 2019, 9:49 AM · Operations, serviceops
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3064.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912190926_ema_95045.log.

Dec 19 2019, 9:26 AM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Completed auto-reimage of hosts:

['cp1087.eqiad.wmnet']
Dec 19 2019, 9:24 AM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp1087.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912190859_ema_88921.log.

Dec 19 2019, 9:00 AM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Completed auto-reimage of hosts:

['cp1085.eqiad.wmnet']
Dec 19 2019, 8:23 AM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T227432: Replace Varnish backends with ATS on cache text nodes.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp1085.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912190755_ema_74472.log.

Dec 19 2019, 7:56 AM · Patch-For-Review, Operations, Traffic
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['mw1321.eqiad.wmnet', 'mw1320.eqiad.wmnet', 'mw1314.eqiad.wmnet', 'mw2271.codfw.wmnet', 'mw2255.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912190444_jiji_33118.log.

Dec 19 2019, 4:45 AM · Operations, serviceops

Dec 18 2019

ops-monitoring-bot added a comment to T224557: Migrate ldap/corp replicas to Stretch/Buster.

cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: pollux.wikimedia.org

  • pollux.wikimedia.org (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Failed to shutdown, manual intervention required: Cumin execution failed (exit_code=2)
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Dec 18 2019, 2:37 PM · Operations
ops-monitoring-bot added a comment to T239835: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster.

Icinga downtime for 1 day, 0:00:00 set by akosiaris@cumin1001 on 1 host(s) and their services with reason: alex reinit kubernetes cluster

acrux.codfw.wmnet
Dec 18 2019, 10:05 AM · Patch-For-Review, serviceops
ops-monitoring-bot added a comment to T239835: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster.

Icinga downtime for 1 day, 0:00:00 set by akosiaris@cumin1001 on 1 host(s) and their services with reason: alex reinit kubernetes cluster

acrab.codfw.wmnet
Dec 18 2019, 10:05 AM · Patch-For-Review, serviceops
ops-monitoring-bot added a comment to T239835: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster.

Icinga downtime for 1 day, 0:00:00 set by akosiaris@cumin1001 on 6 host(s) and their services with reason: alex reinit kubernetes cluster

kubetcd[2001-2006].codfw.wmnet
Dec 18 2019, 10:04 AM · Patch-For-Review, serviceops
ops-monitoring-bot added a comment to T239835: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster.

Icinga downtime for 1 day, 0:00:00 set by akosiaris@cumin1001 on 6 host(s) and their services with reason: alex reinit kubernetes cluster

kubernetes[2001-2006].codfw.wmnet
Dec 18 2019, 10:04 AM · Patch-For-Review, serviceops

Dec 17 2019

ops-monitoring-bot added a comment to T239821: decommission elastic10[18-31].eqiad.wmnet.

cookbooks.sre.hosts.decommission executed by gehel@cumin1001 for hosts: elastic[1019-1020,1022-1031].eqiad.wmnet

  • elastic1019.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1020.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1022.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1023.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1024.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1025.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1026.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1027.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1028.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1029.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1030.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • elastic1031.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Dec 17 2019, 1:53 PM · Discovery-Search (Current work), Operations, DC-Ops, decommission
ops-monitoring-bot added a comment to T239821: decommission elastic10[18-31].eqiad.wmnet.

cookbooks.sre.hosts.decommission executed by gehel@cumin1001 for hosts: elastic1018.eqiad.wmnet

  • elastic1018.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Dec 17 2019, 1:49 PM · Discovery-Search (Current work), Operations, DC-Ops, decommission

Dec 15 2019

ops-monitoring-bot added projects to T240798: Degraded RAID on ms-be2016: ops-codfw, Operations.
Dec 15 2019, 8:29 PM · Operations, ops-codfw

Dec 12 2019

ops-monitoring-bot added a comment to T228657: Upgrade Puppet Masters and Puppet DB servers.

cookbooks.sre.hosts.decommission executed by jmm@cumin1001 for hosts: puppetdb1001.eqiad.wmnet

  • puppetdb1001.eqiad.wmnet (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Failed to shutdown, manual intervention required: Cumin execution failed (exit_code=2)
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Dec 12 2019, 3:19 PM · User-jbond, Patch-For-Review, Puppet
ops-monitoring-bot added a comment to T228657: Upgrade Puppet Masters and Puppet DB servers.

cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: puppetdb2001.codfw.wmnet

  • puppetdb2001.codfw.wmnet (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Wiped bootloaders
    • Shutdown issued. Verify it manually, verification not yet supported
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Dec 12 2019, 11:46 AM · User-jbond, Patch-For-Review, Puppet
ops-monitoring-bot added projects to T240534: Degraded RAID on db1123: Operations, ops-eqiad.
Dec 12 2019, 8:42 AM · DBA, ops-eqiad, Operations
ops-monitoring-bot added a comment to T239684: Decommission db2070.codfw.wmnet.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2070.codfw.wmnet

  • db2070.codfw.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Dec 12 2019, 5:58 AM · Patch-For-Review, DC-Ops, ops-codfw, decommission, Operations

Dec 11 2019

ops-monitoring-bot added a comment to T239188: Decommission db1062.eqiad.wmnet.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db1062.eqiad.wmnet

  • db1062.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Dec 11 2019, 6:07 AM · Operations, DC-Ops, ops-eqiad, decommission

Dec 9 2019

ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Completed auto-reimage of hosts:

['mw2270.codfw.wmnet', 'mw2269.codfw.wmnet', 'mw2268.codfw.wmnet']
Dec 9 2019, 4:01 PM · Operations, serviceops
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['mw2270.codfw.wmnet', 'mw2269.codfw.wmnet', 'mw2268.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912091356_jiji_197404.log.

Dec 9 2019, 1:56 PM · Operations, serviceops

Dec 6 2019

ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Completed auto-reimage of hosts:

['cloudcephosd1003.wikimedia.org']
Dec 6 2019, 9:31 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1003.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912062110_jeh_258731.log.

Dec 6 2019, 9:10 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Completed auto-reimage of hosts:

['cloudcephosd1002.wikimedia.org']
Dec 6 2019, 9:08 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1002.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912062047_jeh_254049.log.

Dec 6 2019, 8:47 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Completed auto-reimage of hosts:

['cloudcephosd1001.wikimedia.org']
Dec 6 2019, 8:42 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1001.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912062022_jeh_248828.log.

Dec 6 2019, 8:22 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Completed auto-reimage of hosts:

['cloudcephosd1001.wikimedia.org']
Dec 6 2019, 8:19 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephosd1001.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912062007_jeh_246635.log.

Dec 6 2019, 8:07 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Completed auto-reimage of hosts:

['cloudcephmon1003.wikimedia.org']
Dec 6 2019, 7:04 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephmon1003.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912061842_jeh_229478.log.

Dec 6 2019, 6:42 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Completed auto-reimage of hosts:

['cloudcephmon1002.wikimedia.org']
Dec 6 2019, 6:40 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephmon1002.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912061821_jeh_224856.log.

Dec 6 2019, 6:21 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Completed auto-reimage of hosts:

['cloudcephmon1001.wikimedia.org']
Dec 6 2019, 6:20 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T240021: Reimage cloudceph mon and osd hosts.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephmon1001.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912061754_jeh_218664.log.

Dec 6 2019, 5:54 PM · Epic, cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T236290: Deploy a Ceph testing environment using Rook.io on VMs.

Completed auto-reimage of hosts:

['cloudcephmon1003.wikimedia.org']
Dec 6 2019, 5:29 PM · cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T236290: Deploy a Ceph testing environment using Rook.io on VMs.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephmon1003.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912061712_jeh_201578.log.

Dec 6 2019, 5:13 PM · cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T236290: Deploy a Ceph testing environment using Rook.io on VMs.

Completed auto-reimage of hosts:

['cloudcephmon1002.wikimedia.org']
Dec 6 2019, 5:09 PM · cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T236290: Deploy a Ceph testing environment using Rook.io on VMs.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephmon1002.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912061652_jeh_190443.log.

Dec 6 2019, 4:52 PM · cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T236290: Deploy a Ceph testing environment using Rook.io on VMs.

Completed auto-reimage of hosts:

['cloudcephmon1001.wikimedia.org']
Dec 6 2019, 4:51 PM · cloud-services-team (Kanban)
ops-monitoring-bot added a comment to T236290: Deploy a Ceph testing environment using Rook.io on VMs.

Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts:

['cloudcephmon1001.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912061625_jeh_184172.log.

Dec 6 2019, 4:26 PM · cloud-services-team (Kanban)

Dec 5 2019

ops-monitoring-bot added projects to T239957: Degraded RAID on cloudelastic1002: Operations, ops-eqiad.
Dec 5 2019, 10:35 PM · Discovery-Search (Current work), Discovery, ops-eqiad, Operations
ops-monitoring-bot added a comment to T239667: Convert DNS servers to Buster.

Completed auto-reimage of hosts:

['authdns1001.wikimedia.org']
Dec 5 2019, 9:05 PM · Patch-For-Review, netops, Operations, Traffic
ops-monitoring-bot added a comment to T239667: Convert DNS servers to Buster.

Script wmf-auto-reimage was launched by bblack on cumin1001.eqiad.wmnet for hosts:

['authdns1001.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912052044_bblack_216791.log.

Dec 5 2019, 8:44 PM · Patch-For-Review, netops, Operations, Traffic
ops-monitoring-bot added a comment to T239667: Convert DNS servers to Buster.

Completed auto-reimage of hosts:

['authdns2001.wikimedia.org']
Dec 5 2019, 8:03 PM · Patch-For-Review, netops, Operations, Traffic
ops-monitoring-bot added a comment to T239667: Convert DNS servers to Buster.

Script wmf-auto-reimage was launched by bblack on cumin1001.eqiad.wmnet for hosts:

['authdns2001.wikimedia.org']

The log can be found in /var/log/wmf-auto-reimage/201912051935_bblack_203116.log.

Dec 5 2019, 7:36 PM · Patch-For-Review, netops, Operations, Traffic
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Completed auto-reimage of hosts:

['mw2260.codfw.wmnet']
Dec 5 2019, 5:54 PM · Operations, serviceops
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['mw2260.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912051719_jiji_174984.log.

Dec 5 2019, 5:19 PM · Operations, serviceops
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Completed auto-reimage of hosts:

['mw2260.codfw.wmnet']
Dec 5 2019, 4:05 PM · Operations, serviceops
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['mw2260.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912051420_jiji_139305.log.

Dec 5 2019, 2:20 PM · Operations, serviceops
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Completed auto-reimage of hosts:

['mw2261.codfw.wmnet']
Dec 5 2019, 1:03 PM · Operations, serviceops
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['mw2261.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912051222_jiji_116344.log.

Dec 5 2019, 12:22 PM · Operations, serviceops
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Completed auto-reimage of hosts:

['mw2260.codfw.wmnet']
Dec 5 2019, 11:05 AM · Operations, serviceops
ops-monitoring-bot added a comment to T239054: Reimage all mediawiki servers .

Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts:

['mw2260.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201912051027_jiji_93252.log.

Dec 5 2019, 10:27 AM · Operations, serviceops
ops-monitoring-bot added a comment to T239046: decommission db2065.codfw.wmnet.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2065.codfw.wmnet

  • db2065.codfw.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Dec 5 2019, 6:51 AM · Patch-For-Review, Operations, DC-Ops, ops-codfw, decommission
ops-monitoring-bot added a comment to T238956: switch prod Phabricator from phab1003 to phab1001.

Completed auto-reimage of hosts:

['phab1001.eqiad.wmnet']
Dec 5 2019, 1:56 AM · serviceops, Release-Engineering-Team