Page MenuHomePhabricator

rack/setup/install new kafka nodes kafka-jumbo100[1-6]
Closed, ResolvedPublic

Description

This task will track the setup and installation of 6 new kafka nodes in eqiad, ordered on procurement task T161636.

Hostname Review: The existing kafka systems were NOT renamed to a normal standard. They were analytics10XX and simply renamed to kafka 10XX. This means there is a large range of unused kafka hostnames. kafka100[1-3] are in use, as well as kafka101[2348] and kafka102[02]. I'd suggest we fill out the remainder of the hostnames, or simply jump right to kafka1023. I've assigned this to @Ottomata for his review. If the older kafka machines are going away, this is easier to answer and @RobH suggests kafka100[4-9]

Racking Proposal: Kafka nodes are not consistent, some are in different vlans (analytics) and others are in the private vlan. These will be in the normal private vlan, so spread across all 4 rows, and have none of the 6 new nodes share a rack.

Since hostnames have to be decided, please do the onsite steps with only the asset tag in use for now.

kafka-jumbo1001:

  • - receive in system on procurement task T161636
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

kafka-jumbo1002:

  • - receive in system on procurement task T161636
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

kafka-jumbo1003:

  • - receive in system on procurement task T161636
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

kafka-jumbo1004:

  • - receive in system on procurement task T161636
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

kafka-jumbo1005:

  • - receive in system on procurement task T161636
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

kafka-jumbo1006:

  • - receive in system on procurement task T161636
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 368186 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding dns entries for kafka-jumbo100[1-6] T167992

https://gerrit.wikimedia.org/r/368186

I'd ask if possible to pause naming and configurations for these hosts since me and @Ottomata are thinking about the optimal solution for this cluster. It might be possible to migrate one kafka broker at the time (deprecate an old one and replace it with one of the new batch), but it would be really handy to keep the nomenclature (kafka1012 currently means kafka broker with id 12). This is not a strict requirement and we already got Rob's feedback that kafka-jumbo1001-6 would be better, but we'd need some time to think about it (couple of days).

Thanks and sorry!

So we had a discussion about this earlier in IRC, and after that agreed to document some of it here. @elukey has done so above.

My stance is that we should have the oldest hardware servers (when possible) in a cluster as the lowest sequence. If that is not possible, it is best to include ALL systems from the same age group within a sequence range. Example: If kafka1009, kafka1011, and kafka1019 were to become kafka-jumbo, I'd recommend they be kakfla-jumbo100[123], and these newer systems be kafka-jumbo100[4-9]. If that isn't possible (due to not knowing how many old kafka hosts will rename to kafka-jumbo), then I'd call these new hosts kafka-jumbo100[1-6], and the older hosts should all move into hostname numbers in direct sequence, kafka-jumbo100[789], not kafka-jumbo1009, kafka-jumbo1011, kafka-jumbo1014.

I think the above is clear, please advise if not!

The rest of the hostnames for all our other clusters follow the above standard. Analytics varied (without my knowing about it at the time) with the analytics to kafka hostname change, and its caused some confusion. If it takes more work for the inital migration, then it falls in line with the rest of the cluster, it seems to be a better plan to me than continuing to have these analytics clusters vary from the rest of the clusters. (This has been a regular issue with analytics naming, etc.) When they are in a sequence, we know they will age out at roughly the same rate, and they are usually also grouped by similar specifications within that sequence.

@elukey & @Ottomata are already aware of the above ^ I'm just echoing it to the task for record keeping.

Since the kafka1012->kafka1022 are going to be decommed and kafka-jumbo is a complete new cluster from our point of view (that may share old Kafka broker ids) I'd be in favor of sticking with convention and call the nodes kafka100[1-6]. It will be less obvious for us the mapping between id->nodename (like 12 -> kafka-jumbo1001, etc..) but it will be better for the procurement's point of view.

Going to wait for Andrew's thoughts :)

Change 368186 abandoned by Cmjohnson:
Adding dns entries for kafka-jumbo100[1-6] T167992

Reason:
abandoning this until naming has been figured out

https://gerrit.wikimedia.org/r/368186

If we decide to keep broker ids (seeming less likely, I will test some stuff in labs this week), then I think we should keep the node numbers as they are. Otherwise, we'll def start with 1001. But! Let's not fight that fight until we decide to do that.

Alright! Luca and I have tested some things, and discussed this migration a little more. We're going to stick with the original plan of spinning this up as a new separate cluster, and then migrating clients over one by one.

So! We can proceed. These can be installed as kafka-jumbo100[1-6]. Thanks @Cmjohnson !

Note about disk config: we are going for a 12 disk RAID10 partition plus a raid1 root one (on separate disks). I can work on it if you guys are busy, otherwise I'll be happy to review the partman recipe :)

@elukey: I'm happy to help with partman, but I want to confirm:

These systems have dual 1TB OS disks, which will be placed in a raid1 for the OS. The 12 disks for DATA will be placed in a raid10. These systems come with hardware raid controllers, so I would NOT use software raid.

I'd suggest that all the systems be setup with hardware raid with the following:

2 * 1 TB SFF as raid1 SDA
12 * 4 TB LFF as raid10 SDB

Then the partitioning recipe just needs to setup a single / in the sda, and then all your data in a LVM on sdb, sound right?

It seems good to me, for some reason I was under the impression that we preferred sw raid vs hw controlled ones, this is why I mentioned it. Nevermind, the partman recipe will be simpler :)

I'm not sure of a reason to prefer software raid other than ease of management. Likely performance is better with HW raid.

@elukey: So we do prefer sw raid over hw raid when purchasing servers. However, servers in this particular chassis (R730xd) have to have a full hw raid controller installed. Since its already there, the performance on this particular raid controller is better than using software raid.

Change 369725 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt and production dns for kafka-jumbo100[1=6] T167992

https://gerrit.wikimedia.org/r/369725

FYI: These nodes should be installed with Debian Stretch.

@elukey: So we do prefer sw raid over hw raid when purchasing servers. However, servers in this particular chassis (R730xd) have to have a full hw raid controller installed. Since its already there, the performance on this particular raid controller is better than using software raid.

Thanks for explaining, completely trust your judgement so no issues from my side :)

Change 369725 merged by Cmjohnson:
[operations/dns@master] Adding mgmt and production dns for kafka-jumbo100[1=6] T167992

https://gerrit.wikimedia.org/r/369725

Hello people, any timeline for these hosts? Don't mean to pressure, just knowing the timings to organize/schedule all the work during the next weeks :)

@Cmjohnson Heyaaa, we are pretty ready and excited to start working with these. Can you let us know when they'll be worked on?

Thank you!

@Ottomata okay, I understand I will get them going as soon as I can there in my being worked on queue with a few other things https://phabricator.wikimedia.org/tag/ops-eqiad/

the issue with 1004 has been resolved assigning to @RobH to do installs.

Change 373328 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] kafka-jumbo install params

https://gerrit.wikimedia.org/r/373328

Change 373328 abandoned by RobH:
kafka-jumbo install params

https://gerrit.wikimedia.org/r/373328

Change 373357 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] setting kafka-jumbo100[1-6].eqiad.wmnet dns

https://gerrit.wikimedia.org/r/373357

Change 373357 merged by RobH:
[operations/dns@master] setting kafka-jumbo100[1-6].eqiad.wmnet dns

https://gerrit.wikimedia.org/r/373357

Ok, kafka-jumbo1001 has odd issues.

It is confirmed to have the correct MAC address in dhcp, as well as dns is right. The vlan is correct, and I can see the dhcp request come in on the correct subnet/vlan. I'm not sure why it is getting no free leases.

I've moved on to the rest of the systems, which so far boot fine via dhcp. However, there is no workign partman recipe for a hardware raid setup like this. I've created kafka-jumbo.cfg recipe and I'm tweaking it now.

So far, I have it booting, installing the OS to the sda raid1, and then putting a large LVM across the sdb. Its so far failing to mount the /srv in sdb. Still working on it.

So we solved the issue with partman and I was able to install the os on kafka-jumbo100[12], but failed to PXE boot on the other nodes (they hang after selecting the boot option afaics). Is there anything else to configure on them to proceed with OS/puppet/etc.. deployment?

All hosts up with OS installed and puppet/salt running.

elukey updated the task description. (Show Details)

Change 376336 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Apply kafka::jumbo::broker on new kafka-jumbo100* hosts

https://gerrit.wikimedia.org/r/376336

Change 376336 merged by Ottomata:
[operations/puppet@production] Apply kafka::jumbo::broker on new kafka-jumbo100* hosts

https://gerrit.wikimedia.org/r/376336

Change 376339 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Un-apply kafka role -- these should be stretch, not jessie! :/

https://gerrit.wikimedia.org/r/376339

Change 376340 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install kafka-jumbo as Stretch

https://gerrit.wikimedia.org/r/376340

Change 376339 merged by Ottomata:
[operations/puppet@production] Un-apply kafka role -- these should be stretch, not jessie! :/

https://gerrit.wikimedia.org/r/376339

Change 376340 merged by Ottomata:
[operations/puppet@production] Install kafka-jumbo as Stretch

https://gerrit.wikimedia.org/r/376340

Script wmf_auto_reimage was launched by otto on neodymium.eqiad.wmnet for hosts:

['kafka-jumbo1001.eqiad.wmnet', 'kafka-jumbo1002.eqiad.wmnet', 'kafka-jumbo1003.eqiad.wmnet', 'kafka-jumbo1004.eqiad.wmnet', 'kafka-jumbo1005.eqiad.wmnet', 'kafka-jumbo1006.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201709061953_otto_25477.log.

Completed auto-reimage of hosts:

['kafka-jumbo1001.eqiad.wmnet', 'kafka-jumbo1002.eqiad.wmnet', 'kafka-jumbo1003.eqiad.wmnet', 'kafka-jumbo1004.eqiad.wmnet', 'kafka-jumbo1005.eqiad.wmnet', 'kafka-jumbo1006.eqiad.wmnet']

Of which those FAILED:

set(['kafka-jumbo1001.eqiad.wmnet'])

Change 376377 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add debug notifies to figure out error message in prod

https://gerrit.wikimedia.org/r/376377

Change 376377 merged by Ottomata:
[operations/puppet@production] Add debug notifies to figure out error message in prod

https://gerrit.wikimedia.org/r/376377

Change 376379 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Debugging

https://gerrit.wikimedia.org/r/376379

Change 376379 merged by Ottomata:
[operations/puppet@production] Debugging

https://gerrit.wikimedia.org/r/376379

Change 376407 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow new kafka-jumbo hosts to talk to zookeeper on conf*

https://gerrit.wikimedia.org/r/376407

Change 376407 merged by Ottomata:
[operations/puppet@production] Allow new kafka-jumbo hosts to talk to zookeeper on conf*

https://gerrit.wikimedia.org/r/376407

Change 376428 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add kafka rack (row) awareness configs

https://gerrit.wikimedia.org/r/376428

Change 376428 merged by Ottomata:
[operations/puppet@production] Add kafka rack (row) awareness configs

https://gerrit.wikimedia.org/r/376428

elukey@kafka-jumbo1001:/usr/share/jmxtrans$ source /etc/default/jmxtrans
elukey@kafka-jumbo1001:/usr/share/jmxtrans$ ./jmxtrans.sh start
elukey@kafka-jumbo1001:/usr/share/jmxtrans$ OpenJDK 64-Bit Server VM warning: ignoring option PermSize=384m; support was removed in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=384m; support was removed in 8.0
MaxTenuringThreshold of 16 is invalid; must be between 0 and 15
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Replacing 16 with 15 in jmxtrans.sh made everything work again. Since this file is copied over from the deb package, it should be only a matter of changing our stretch-wikimedia package.

Change 376663 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/debs/jmxtrans@master] jmxtrans.sh: reduce MaxTenuringThreshold to 15

https://gerrit.wikimedia.org/r/376663

Applied the following to all the nodes to remove the placeholder logical volumes:

root@kafka-jumbo1001:/home/elukey# lvremove /dev/vg-flex/root-placeholder
Do you really want to remove active logical volume vg-flex/root-placeholder? [y/n]: y
  Logical volume "root-placeholder" successfully removed
root@kafka-jumbo1001:/home/elukey# lvremove /dev/vg-data/srv-placeholder
Do you really want to remove active logical volume vg-data/srv-placeholder? [y/n]: y
  Logical volume "srv-placeholder" successfully removed
root@kafka-jumbo1001:/home/elukey# pvs
  PV         VG      Fmt  Attr PSize   PFree
  /dev/sda2  vg-flex lvm2 a--  926.34g 93.13g
  /dev/sdb1  vg-data lvm2 a--   21.83t  2.73t

I could be wrong but from cr1/cr2 eqiad the hosts seem to be in the Analytics VLAN, and they shouldn't be:

elukey@re0.cr1-eqiad> show route kafka-jumbo1006.eqiad.wmnet

inet.0: 649448 destinations, 3244861 routes (649322 active, 0 holddown, 132 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

10.64.53.0/24      *[Direct/0] 24w1d 21:25:02
                    > via ae4.1023

{master}
elukey@re0.cr1-eqiad> show configuration interfaces ae4.1023
description "Subnet analytics1-d-eqiad";
vlan-id 1023;
family inet {
    filter {
        input analytics-in4;
    }

@RobH, @Cmjohnson: can you guys double check? If I am right what is the procedure to move those hosts out of the Analytics VLAN? (I guess new IPs + reimage?)

@Ottomata: let's also remember to whitelist the jumbo IPs in the Analytics VLAN firewall rules, otherwise hosts like analytics1003 will not be able to contact them.

Change 376663 merged by Elukey:
[operations/debs/jmxtrans@master] jmxtrans.sh: reduce MaxTenuringThreshold to 15

https://gerrit.wikimedia.org/r/376663

@Ottomata: I merged https://gerrit.wikimedia.org/r/#/c/376663 but I then realized that master/debian branches are a bit weird, namely the master contains the debian directory and it is out of sync from the debian branch. From what I can see it seems that the current debian package is made from the debian branch, but difficult to say. If you have time can you double check and let me know your opinion? With my current understanding I'd simply cherry pick https://gerrit.wikimedia.org/r/#/c/376663 to debian and build HEAD from copper.

Assigning back to Chris as discussed on IRC: we'd need to move the Kafka Jumbo hosts out of the analytics VLAN and then reimage (will take care of this step).

Change 377329 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Updating dns entries for kafka-jumbo100[1-6] to reflect change in vlan T167992

https://gerrit.wikimedia.org/r/377329

Change 377329 merged by Cmjohnson:
[operations/dns@master] Updating dns entries for kafka-jumbo100[1-6] to reflect change in vlan T167992

https://gerrit.wikimedia.org/r/377329

@elukey updated dns entries and swich ports to reflect vlan-private1-row-eqiad

Script wmf_auto_reimage was launched by volans on sarin.codfw.wmnet for hosts:

['kafka-jumbo1002.eqiad.wmnet', 'kafka-jumbo1003.eqiad.wmnet', 'kafka-jumbo1004.eqiad.wmnet', 'kafka-jumbo1005.eqiad.wmnet', 'kafka-jumbo1006.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201709120902_volans_9858.log.

Completed auto-reimage of hosts:

['kafka-jumbo1002.eqiad.wmnet', 'kafka-jumbo1003.eqiad.wmnet', 'kafka-jumbo1004.eqiad.wmnet', 'kafka-jumbo1005.eqiad.wmnet', 'kafka-jumbo1006.eqiad.wmnet']

Of which those FAILED:

set(['kafka-jumbo1003.eqiad.wmnet', 'kafka-jumbo1002.eqiad.wmnet', 'kafka-jumbo1005.eqiad.wmnet', 'kafka-jumbo1004.eqiad.wmnet', 'kafka-jumbo1006.eqiad.wmnet'])

For the record they were reimaged correctly, the new reimage script hit a small bug in the post-reimage part, I've already re-run it for the "failed" host to complete the post-reimage steps.

Change 377417 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] network::constants: update IP addresses of the new Kafka hosts

https://gerrit.wikimedia.org/r/377417

Change 377417 merged by Elukey:
[operations/puppet@production] network::constants: update IP addresses of the new Kafka hosts

https://gerrit.wikimedia.org/r/377417

Proposed new term for the analytics-in4 filter on cr1/cr2 eqiad:

term kafka {
    from {
        destination-address {
            10.64.0.175/32;
            10.64.0.176/32;
            10.64.16.99/32;
            10.64.32.159/32;
            10.64.32.160/32;
            10.64.48.117/32;
        }
        protocol tcp;
        destination-port 9092;
    }
    then accept;
}

EDIT: Just checked and a kafka term is already present (with kafka1012->1022) so it should be as easy as executing these:

set firewall family inet filter analytics-in4 term kafka from destination-address 10.64.0.175/32

set firewall family inet filter analytics-in4 term kafka from destination-address 10.64.0.176/32

set firewall family inet filter analytics-in4 term kafka from destination-address 10.64.16.99/32

set firewall family inet filter analytics-in4 term kafka from destination-address 10.64.32.159/32

set firewall family inet filter analytics-in4 term kafka from destination-address 10.64.32.160/32

set firewall family inet filter analytics-in4 term kafka from destination-address 10.64.48.117/32

LGTM. Minor nitpick: I love comments about which hostname IPs match to . e.g.

term puppet {
        from {
            destination-address {
                /* puppetmaster1001 */
                10.64.16.73/32;
                /* puppetmaster2001 */
                10.192.0.27/32;
            }
            protocol tcp;
            destination-port 8140;
        }
        then accept;
}

Cleaned up placeholders lvm partitions, now the next steps are:

  1. decide a TLS port for the Kafka cluster and whitelist it in the analytics vlan and ferm firewalls
  2. try prometheus jmx exporter as alternative to jmxtrans. If feasible go for it, otherwise just rebuild jmxtrans with the last commit in master.

Change 377753 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] [WIP] role::kafka::jumbo::broker: enable Prometheus JMX monitoring

https://gerrit.wikimedia.org/r/377753

Change 378876 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::kafka::broker: add the monitoring_enabled option

https://gerrit.wikimedia.org/r/378876

Change 378876 merged by Elukey:
[operations/puppet@production] profile::kafka::broker: add the monitoring_enabled option

https://gerrit.wikimedia.org/r/378876