Page MenuHomePhabricator

setup kafka2001 & kafka2002
Closed, ResolvedPublic8 Estimated Story Points

Description

This is the tracking task for the installation and implementation of kafka2001 and kafka2002. These were requested on hardware-requests task T114191 and allocated from spares ordered via T120246.

By default, requests such as these systems should go into two different racks/rows for redundancy.

  • - received in, asset tagged, added to racktables
  • - sub-task created for onsite to add label to systems and update racktables
  • - allocate mgmt dns
  • - system bios and drac setup
  • - allocate production dns
  • - allocate port and vlan (internal)
  • - install_module updates
  • - install OS (jessie)
  • - service implementation (typically handed off to requestor of hardware, in this case @aaron)

Event Timeline

RobH claimed this task.
RobH raised the priority of this task from to Medium.
RobH updated the task description. (Show Details)
RobH added subscribers: faidon, mobrovac, RobH and 13 others.

For the task I will be using wmf6377 in row A and wmf6379 in row bB

Papaul set Security to None.

@RobH for the install-module update section what type of RAID level will i be using?

These systems are 4 * 3TB disks, as such need to make use of that disk space and use GPT.

raid10-gpt.cfg should be used for these, as it sets up the following:

# * four disks, sda, sdb, sdc, sdd
# * primary partitions, no LVM
# * GPT layout (large disks, > 2TB)
# * layout:
#   - /	:   ext3, RAID10, 50GB
#   - /srv: xfs,  RAID10, rest of the space

I think that will be fine, but be warned that I may need to delete the /srv partition and recreate JBOD for Kafka. TBD.

/ ext RAID10 across all 4 sounds good.

kafka2001 & kakfa2002 port descriptions and vlans updated.

We discussed this on IRC that the following RAID level need to be used
raid10-gpt-srv-ext4.cfg

@RobH I am getting this error message during install.

Screen Shot 2015-12-15 at 7.23.55 PM.png (364×540 px, 48 KB)

Just continue past it, we don't need swap space on it. Thanks for checking!

signing puppet certs, salt key complete

Hey Aaron the installation is complete on kafka200[1-2] I f you have any questions please let me know.

Thanks

ok! looking good! We may have a problem for codfw -- we don't have a zookeeper cluster there. For some reason I assumed there was already a mirror of the conf100x hosts in codfw, but there are not. We will have to think about this.

I actually did have Kafka set up on these for a few minutes before I realized this. I think we can go ahead and close this ticket and we can track the final set up and zookeeper problem elsewhere.

@Ottomata can you indeed file a task for codfw zookeeper?

Change 285958 had a related patch set uploaded (by Elukey):
Enable kafka200[12] to host Kafka and Event Bus

https://gerrit.wikimedia.org/r/285958

JAllemandou set the point value for this task to 5.Apr 28 2016, 4:30 PM

Change 285958 merged by Elukey:
Enable kafka200[12] to host Kafka and EventBus.

https://gerrit.wikimedia.org/r/285958

Change 286134 had a related patch set uploaded (by Elukey):
Add eventbus_codfw to the monitoring hiera variables.

https://gerrit.wikimedia.org/r/286134

Change 286134 merged by Elukey:
Add eventbus_codfw to the monitoring hiera variables.

https://gerrit.wikimedia.org/r/286134

Change 286136 had a related patch set uploaded (by Elukey):
Add kafka200[12] codfw hosts to scap.

https://gerrit.wikimedia.org/r/286136

Change 286136 merged by Mobrovac:
Add kafka200[12] codfw hosts to scap.

https://gerrit.wikimedia.org/r/286136

  • Icinga configuration updated
  • Added kafka200[12] to eventbus scap config (thanks to Marko) + deploy
  • Run puppet on both nodes, no errors

Next steps:

  • Add LVS/Pyball configurations (need to pre-allocate an IP I guess?)

https://wikitech.wikimedia.org/wiki/LVS#LVS_installation

Change 286621 had a related patch set uploaded (by Elukey):
Add LVS configuration for EventBus in codfw (DNS reverse/wmnet config already in place).

https://gerrit.wikimedia.org/r/286621

Double checked that health checks work fine:

elukey@kafka1001:~$ curl http://kafka2002.codfw.wmnet:8085/v1/topics
{"change-prop.retry.change-prop.backlinks.continue": {"schema_name": "retry"}, "change-prop.retry.mediawiki.page_delete": {"schema_name": "retry"}, "mediawiki.page_delete": {"schema_name": "page_delete"}, "mediawiki.page_move": {"schema_name": "page_move"}, "change-prop.backlinks.continue": {"schema_name": "continue"}, "change-prop.retry.mediawiki.page_restore": {"schema_name": "retry"}, "resource_change": {"schema_name": "resource_change"}, "mediawiki.revision_visibility_set": {"schema_name": "revision_visibility_set"}, "change-prop.retry.mediawiki.revision_visibility_set": {"schema_name": "retry"}, "mediawiki.user_block": {"schema_name": "user_block"}, "mediawiki.page_restore": {"schema_name": "page_restore"}, "change-prop.retry.mediawiki.user_block": {"schema_name": "retry"}, "change-prop.retry.mediawiki.page_move": {"schema_name": "retry"}, "mediawiki.revision_create": {"schema_name": "revision_create"}, "change-prop.retry.mediawiki.revision_create": {"schema_name": "retry"}, "test.event": {"schema_name": "test_event"}, "change-prop.retry.resource_change": {"schema_name": "retry"}}

elukey@kafka1001:~$ curl http://kafka2001.codfw.wmnet:8085/v1/topics
{"change-prop.retry.change-prop.backlinks.continue": {"schema_name": "retry"}, "change-prop.retry.mediawiki.page_delete": {"schema_name": "retry"}, "mediawiki.page_delete": {"schema_name": "page_delete"}, "mediawiki.page_move": {"schema_name": "page_move"}, "change-prop.backlinks.continue": {"schema_name": "continue"}, "change-prop.retry.mediawiki.page_restore": {"schema_name": "retry"}, "resource_change": {"schema_name": "resource_change"}, "mediawiki.revision_visibility_set": {"schema_name": "revision_visibility_set"}, "change-prop.retry.mediawiki.revision_visibility_set": {"schema_name": "retry"}, "mediawiki.user_block": {"schema_name": "user_block"}, "mediawiki.page_restore": {"schema_name": "page_restore"}, "change-prop.retry.mediawiki.user_block": {"schema_name": "retry"}, "change-prop.retry.mediawiki.page_move": {"schema_name": "retry"}, "mediawiki.revision_create": {"schema_name": "revision_create"}, "change-prop.retry.mediawiki.revision_create": {"schema_name": "retry"}, "test.event": {"schema_name": "test_event"}, "change-prop.retry.resource_change": {"schema_name": "retry"}}

Note that the topics' names ought to be prefixed with codfw.. @elukey I guess you created them by running ./bin/ensure-kafka-topics-exist ?

Created topics with mobrovac, all good. Last step is to enable LVS.

Change 286621 merged by Elukey:
Add LVS configuration for EventBus in codfw (DNS reverse/wmnet config already in place).

https://gerrit.wikimedia.org/r/286621

LVS configuration set up on lvs200[36]:

curl http://eventbus.svc.codfw.wmnet:8085/v1/topics

Pending verification, but the work should be done :)

elukey changed the point value for this task from 5 to 8.May 4 2016, 11:25 AM