Page MenuHomePhabricator

puppetmaster broken in the cloudstore project
Closed, ResolvedPublic

Description

The cloudstore project is for testing changes and upgrades to the lab/cloudstore hosts. The puppetmaster (cloudstore-puppetmaster-01.cloudstore.eqiad.wmflabs) has some useful local patches on it, and it is non-functional on last check.

Sudo is broken because it cannot load plugins, and puppet itself is throwing 500s for everything on my last check.

Event Timeline

Bstorm triaged this task as Medium priority.Jan 15 2020, 5:12 PM
Bstorm created this task.

Hrm. Now I cannot seem to ssh to it. :)

Mentioned in SAL (#wikimedia-cloud) [2020-01-15T17:23:22Z] <jeh> restart cloudstore-puppetmaster01 lots of OOM events and "Cannot allocate memory" errors T242893

Hrm. Now I cannot seem to ssh to it. :)

Yeah, it's in bad shape with memory allocation.

-bash: /bin/ps: Cannot allocate memory

reopening to track work on fixing the puppet master configuration.

I also don't see any untracked changes or interesting commits in /var/lib/git/operations/puppet, or /etc/puppet.

/usr/local/bin/git-sync-upstream was having a hard time with the git repository in /var/lib/git/operations/puppet and consuming all available memory on the VM. I moved the git repo to /var/lib/git/operations/puppet-save-from-gtirloni and pulled down a fresh copy of the repo. I also confirmed that the puppet agent is working on all the hosts in the cloudstore project now.

Old Branches
cloudstore-puppetmaster-01:/var/lib/git/operations/puppet# git branch -v --all
  oot-branch-201912312058   5b1adde31e AWS search block: anchor regex to start
  oot-branch-201912312124   5b1adde31e AWS search block: anchor regex to start
  oot-branch-201912312130   5b1adde31e AWS search block: anchor regex to start
  oot-branch-201912312137   5b1adde31e AWS search block: anchor regex to start
  oot-branch-201912312141   5b1adde31e AWS search block: anchor regex to start
  oot-branch-201912312142   5b1adde31e AWS search block: anchor regex to start
  oot-branch-202001041855   fb1e6e6e43 toolschecker: update k8s config reading
  oot-branch-202001070009   b3fbb98afc mailman: add http redirect for renamed listinfo page of wikimediamy
  oot-branch-202001070019   b3fbb98afc mailman: add http redirect for renamed listinfo page of wikimediamy
  oot-branch-202001070020   b3fbb98afc mailman: add http redirect for renamed listinfo page of wikimediamy
  oot-branch-202001080129   78c2dba9fd phabricator: Remove comment about bans being superseded by WP0 bans
  oot-branch-202001080214   c9aa6b3b3b url_downloader: Adapt auto restart for Buster
  oot-branch-202001080817   b6f71db7f1 caching-proxy: squid vs squid3 paths
  oot-branch-202001081647   ecc43a843f Revert "ATS: assign 8G instead of 2G to RAM caches on ats-be"
  oot-branch-202001090726   b0609ef004 install_server: Do not reimage db1107
  oot-branch-202001091016   567dc3b1fc Switch rdb* to standardised Partman layout
  oot-branch-202001091017   567dc3b1fc Switch rdb* to standardised Partman layout
  oot-branch-202001100652   135a6bc256 lvs, monitoring: prometheus expects params value as string[] type
  oot-branch-202001100722   135a6bc256 lvs, monitoring: prometheus expects params value as string[] type
  oot-branch-202001110226   7e88027b76 codesearch: fix dependency cycle with git::clone
  oot-branch-202001110248   7e88027b76 codesearch: fix dependency cycle with git::clone
  oot-branch-202001110251   7e88027b76 codesearch: fix dependency cycle with git::clone
  oot-branch-202001110306   7e88027b76 codesearch: fix dependency cycle with git::clone
  oot-branch-202001132039   b51e074fd3 hieradata/labs: add etcd srv_domain parameter for deployment-prep
  oot-branch-202001140158   a83f55682f codesearch: fix parameters of apt::package_from:component
  oot-branch-202001140200   a83f55682f codesearch: fix parameters of apt::package_from:component
  oot-branch-202001140214   a83f55682f codesearch: fix parameters of apt::package_from:component
  oot-branch-202001141436   1cf47b2cc0 transparency-archive: correct template name
  oot-branch-202001141521   13ddfd93f7 Support missing /historical on transparency sites
  oot-branch-202001142117   ec7f3a1a0f installserver: Convert tftp/dhcp ferm rules to ferm services
  oot-branch-202001142124   ec7f3a1a0f installserver: Convert tftp/dhcp ferm rules to ferm services
  oot-branch-202001142129   ec7f3a1a0f installserver: Convert tftp/dhcp ferm rules to ferm services
  oot-branch-202001142131   ec7f3a1a0f installserver: Convert tftp/dhcp ferm rules to ferm services
  oot-branch-202001150933   059201c683 Ping offload, add esams text-lb VIP
  oot-branch-202001150938   059201c683 Ping offload, add esams text-lb VIP
  oot-branch-202001150939   059201c683 Ping offload, add esams text-lb VIP
  oot-branch-202001150940   059201c683 Ping offload, add esams text-lb VIP
  oot-branch-202001150954   6c6dac01f8 prometheus: fix varnishd_mmap_count pid extraction
  oot-branch-202001151728   48748def5f Partman: Add restbase202[1-3] to netboot.cfg
  oot-branch-202001151731   48748def5f Partman: Add restbase202[1-3] to netboot.cfg
* production                e89776673b [behind 14] fnm: bump pps threshold up a bit
  remotes/origin/HEAD       -> origin/production
  remotes/origin/production 48748def5f Partman: Add restbase202[1-3] to netboot.cfg