Page MenuHomePhabricator

Site: eqiad|codfw VM request for Kafka Burrow Lag monitoring
Closed, ResolvedPublic

Description

Site/Location: eqiad and codfw
Number of systems: 2 (one for each DC)
Service: Kafka Burrow
Networking Requirements: internal IP, no specific requirement
Processor Requirements: 2 or 4
Memory: 8GB
Disks: 50/100GB
Other Requirements: None

Reference: https://phabricator.wikimedia.org/T187805

Event Timeline

What host names do we want to use?

(because DNS needs to exist before anything else and the VMs can be created)

How about "kafkamon", akin to "netmon"?

Change 413281 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] introduce kafkamon1001/2001

https://gerrit.wikimedia.org/r/413281

Change 413283 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] partman: add kafkamon[1-2]00[0-9]

https://gerrit.wikimedia.org/r/413283

Change 413281 merged by Dzahn:
[operations/dns@master] introduce kafkamon1001/2001

https://gerrit.wikimedia.org/r/413281

Change 413283 merged by Dzahn:
[operations/puppet@production] partman: add kafkamon[1-2]00[0-9]

https://gerrit.wikimedia.org/r/413283

Mentioned in SAL (#wikimedia-operations) [2018-02-23T04:53:35Z] <mutante> ganeti: creating new VM kafkamon1001 - vcpus=2,memory=8g,disk=60G, row_A eqiad (T187901)

Mentioned in SAL (#wikimedia-operations) [2018-02-23T04:56:27Z] <mutante> ganeti: ganeti2004 - creating new VM kafkamon2001 - vcpus=2,memory=8g,disk=60G, row_A codfw (T187901)

Fri Feb 23 05:18:56 2018 - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Fri Feb 23 05:18:57 2018 - INFO: Instance kafkamon1001.eqiad.wmnet's disks are in sync
Fri Feb 23 05:18:57 2018 - INFO: Waiting for instance kafkamon1001.eqiad.wmnet to sync disks
Fri Feb 23 05:18:57 2018 - INFO: Instance kafkamon1001.eqiad.wmnet's disks are in sync
..
Fri Feb 23 05:23:28 2018 - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Fri Feb 23 05:23:28 2018 - INFO: Instance kafkamon2001.codfw.wmnet's disks are in sync
Fri Feb 23 05:23:28 2018 - INFO: Waiting for instance kafkamon2001.codfw.wmnet to sync disks
Fri Feb 23 05:23:28 2018 - INFO: Instance kafkamon2001.codfw.wmnet's disks are in sync

Change 413670 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: add MACs for kafkamon1001/2001

https://gerrit.wikimedia.org/r/413670

Change 413670 merged by Dzahn:
[operations/puppet@production] DHCP: add MACs for kafkamon1001/2001

https://gerrit.wikimedia.org/r/413670

Mentioned in SAL (#wikimedia-operations) [2018-02-23T05:40:14Z] <mutante> ganeti1004 - initial startup of kafkamon1001 - booting to PXE, installing stretch (T187901)

Mentioned in SAL (#wikimedia-operations) [2018-02-23T05:58:03Z] <mutante> puppetmaster1001 - signing puppet certs for kafkamon1001/kafkamon2001 - initial puppet runs, adding as role spare (T187901)

Dzahn claimed this task.
  • created VMs
  • installed with stretch
  • signed puppet certs on master, added to site with role(test)
  • confirmed added in Icinga

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=kafkamon

the VMs exist and the request is done. from here the details continue on T187805