Page MenuHomePhabricator

Puppetize codesearch
Closed, ResolvedPublic

Description

Really thought I had already filed a bug for this.

Nearly all of the steps for setting up codesearch are well documented in README, but it would be nice if it was all taken care of by puppet. I think some of the steps (such as adjusting /srv's formatting) may still need to be done manually, but lets see how much we can automate.

Event Timeline

Legoktm created this task.Jan 9 2020, 8:05 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 9 2020, 8:05 AM

Change 563114 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] Initial puppetization of codesearch

https://gerrit.wikimedia.org/r/563114

Change 563114 merged by Dzahn:
[operations/puppet@production] Initial puppetization of codesearch

https://gerrit.wikimedia.org/r/563114

Change 563599 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] codesearch: fix systemd template name for hound_proxy service

https://gerrit.wikimedia.org/r/563599

Change 563599 merged by Dzahn:
[operations/puppet@production] codesearch: fix systemd template name for hound_proxy service

https://gerrit.wikimedia.org/r/563599

Change 563602 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] codesearch: create system group, fix system user membership

https://gerrit.wikimedia.org/r/563602

Change 563602 merged by Dzahn:
[operations/puppet@production] codesearch: create system group, fix system user membership

https://gerrit.wikimedia.org/r/563602

Change 563607 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] codesearch: fix typo in user groups attribute

https://gerrit.wikimedia.org/r/563607

Change 563607 merged by Dzahn:
[operations/puppet@production] codesearch: fix typo in user groups attribute

https://gerrit.wikimedia.org/r/563607

Change 563612 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] codesearch: fix dependency cycle over clone directories

https://gerrit.wikimedia.org/r/563612

Change 563612 merged by Dzahn:
[operations/puppet@production] codesearch: fix dependency cycle over clone directories

https://gerrit.wikimedia.org/r/563612

Change 564466 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] codesearch: Ensure /srv/hound is writable by codesearch user

https://gerrit.wikimedia.org/r/564466

Change 564467 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[labs/codesearch@master] Update instructions after initial puppetization

https://gerrit.wikimedia.org/r/564467

Legoktm added a subscriber: Dzahn.Jan 14 2020, 7:45 AM

The codesearch5 instance is now running Debian Buster plus the role::codesearch puppet role with lots of help from @Dzahn

Remaining todos:

Change 564466 merged by Dzahn:
[operations/puppet@production] codesearch: Ensure /srv/hound is writable by codesearch user

https://gerrit.wikimedia.org/r/564466

Change 564467 merged by jenkins-bot:
[labs/codesearch@master] Update instructions after initial puppetization

https://gerrit.wikimedia.org/r/564467

Dzahn added a comment.Jan 14 2020, 5:49 PM

Yea, it should be possible. You can run any command with the puppet exec resource type and one way to do stuff only once and not on every run is to use the unless parameter to check something already exists.

By the way, cool that you did buster right away and we were able to skip stretch :)

Change 564857 merged by Dzahn:
[operations/puppet@production] codesearch: Migrate ./write_config.py cron job to systemd timer

https://gerrit.wikimedia.org/r/564857

Change 564858 merged by Dzahn:
[operations/puppet@production] codesearch: Generate hound-${name} systemd units

https://gerrit.wikimedia.org/r/564858

Dzahn added a comment.Jan 16 2020, 9:54 PM

sounds like it's resolved already :)

Everything seems to work, but there's a few issues when bootstrapping a new node:

  • The hound-* instances shouldn't start until after a successful run of codesearch-write-config.service. If they start early, they create /srv/hound/hound-${name}/ as owned by root and then write-config can no longer write to it.
  • puppet starts all the hound-${name} services at the same time, which OOMs the instance and can cause docker to segfault or get in a weird state.. we need to do some progressive start up instead of all of them starting at the same time.

Also the hound services are running as root, not the codesearch user...should be easy to fix that, and maybe that fixes the first issue.

More of a note for myself, but I need to remember that if I stop a service puppet will restart it unless I stop puppet too.

Hm, some puppet thing transitively installed ferm (good I guess), but that means we need a config file to open up port 3002 that hound_proxy uses.

Change 565451 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] codesearch: Open up port 3002

https://gerrit.wikimedia.org/r/565451

Change 565508 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] codesearch: Work around bootstrapping problems

https://gerrit.wikimedia.org/r/565508

From IRC:

16:19:35 <mutante> legoktm: just 2 quick thoughts for now.  first is in systemd unit files there is "After=" and it could be used to start one service after another service instead of all after "network.target". other thought is that systemd itself also supports templates to start multiple services from the same unit file with just some small things changing. "By appending the @ symbol to the unit file name, 
16:19:41 <mutante> it becomes a template unit file and can be called multiple times."
16:20:53 <mutante> also.. one unit can have stuff like "Requires=worker@1.service worker@2.service worker@3.service"

I looked into that, but the problem is that the process will have started (so systemd marks it as running), but the start up process will still be going on, which is memory/CPU hungry. So I wrote wait.py for now and it seems to do the job.

Change 565514 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[labs/codesearch@master] Read ports from puppetized configuration if possible

https://gerrit.wikimedia.org/r/565514

Change 565514 merged by jenkins-bot:
[labs/codesearch@master] Read ports from puppetized configuration if possible

https://gerrit.wikimedia.org/r/565514

Change 565451 merged by Dzahn:
[operations/puppet@production] codesearch: Open up port 3002

https://gerrit.wikimedia.org/r/565451

Change 565508 merged by Dzahn:
[operations/puppet@production] codesearch: Work around bootstrapping problems

https://gerrit.wikimedia.org/r/565508

docker instances aren't starting because of:

Jan 19 06:05:30 codesearch6 dockerd[8955]: time="2020-01-19T06:05:30.388754452Z" level=error msg="Handler for POST /v1.40/containers/e11663b38a8102289e80993b055410f07a17cc7d7e82aa35f245aea0407166d8/start returned error: driver failed programming external connectivity on endpoint hound-search (66e6c742f41480da6cc8e187bd01341297f149d323a75265d291f47b25d5bbce):  (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 6080 -j DNAT --to-destination 172.17.0.2:6080 ! -i docker0: iptables v1.8.2 (nf_tables): Chain 'DOCKER' does not exist\n (exit status 1))"

docker instances aren't starting because of:

Jan 19 06:05:30 codesearch6 dockerd[8955]: time="2020-01-19T06:05:30.388754452Z" level=error msg="Handler for POST /v1.40/containers/e11663b38a8102289e80993b055410f07a17cc7d7e82aa35f245aea0407166d8/start returned error: driver failed programming external connectivity on endpoint hound-search (66e6c742f41480da6cc8e187bd01341297f149d323a75265d291f47b25d5bbce):  (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 6080 -j DNAT --to-destination 172.17.0.2:6080 ! -i docker0: iptables v1.8.2 (nf_tables): Chain 'DOCKER' does not exist\n (exit status 1))"

Based on https://github.com/moby/moby/issues/38099 I tried update-alternatives: using /usr/sbin/iptables-legacy to provide /usr/sbin/iptables (iptables) in manual mode however that didn't work so I set it back to nftables.

root@codesearch6:/srv/codesearch# iptables --version
iptables v1.8.2 (nf_tables)
root@codesearch6:/srv/codesearch# docker --version
Docker version 19.03.5, build 633a0ea838

After reading https://github.com/moby/moby/issues/26824#issuecomment-412309421 which said that running iptables legacy and nftables at the same time was a bad idea, I performed the alternative switch so it was pointing to iptables-legacy and then rebooted, and now it all seems to be working.

Change 565752 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/puppet@production] codesearch: Use iptables-legacy for docker compatibility

https://gerrit.wikimedia.org/r/565752

OK! https://codesearch6.wmflabs.org/search/ is ready for testing as a fully puppetized codesearch buster instance.

Change 565809 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[labs/codesearch@master] Drop stuff now managed by puppet

https://gerrit.wikimedia.org/r/565809

Change 565752 merged by Dzahn:
[operations/puppet@production] codesearch: Use iptables from buster-backports for docker compatibility

https://gerrit.wikimedia.org/r/565752

Change 565809 merged by jenkins-bot:
[labs/codesearch@master] Drop stuff now managed by puppet

https://gerrit.wikimedia.org/r/565809

Mentioned in SAL (#wikimedia-cloud) [2020-01-23T01:56:37Z] <legoktm> retargetting codesearch.wmflabs.org to codesearch6 (T242319)

Legoktm closed this task as Resolved.Thu, Jan 23, 2:02 AM
Legoktm claimed this task.

Woohoo! Major thank you to @Dzahn for all of his help :)

I've left codesearch4 shut down but not deleted just in case something goes wrong, will delete it next week.