Workaround merged and deployed on prod maintenance server(s). Lowering priority from High to Normal because we have new captchas and a workaround for now.
I manually ran the script by @Reedy (thanks!) from https://gerrit.wikimedia.org/r/c/operations/puppet/+/543707/2/modules/mediawiki/files/captchaloop
The changes made in T235013 added a requirement to have git-lfs installed and use a different command to pull data.
@Ladsgroup git-lfs is not installed on the prod servers cloning from this and the puppet git:::clone class also does not support changing the command yet. So this breaks cloning on the prod servers.
Tue, Oct 15
- added to special groups in Phabricator to see private tickets (acl*SRE and WMF/NDA)
Very nice. Welcome @RLazarus! I'll upload a change to code review to create your shell account. Could you create a SSH key pair and paste the public part here on ticket? Also feel free to come to IRC and ping so we can add you to some public and private channels. Cheers, Daniel
Mon, Oct 14
- added to maint-announce shared inbox / Google group
- added to "Ops vendor maintenance" calendar and permissions
Hello Reuven and welcome to the team!
added as 306th Wikipedia
sorry, i was on 1001 and 2001 vs. 1002 and 2002 and was wondering why i don't even see /srv mounted on a separate device. yes, ACK. on 1002 / 2002 it's the xenon logs.
looking at them now i see they are only using 14% and 8% of / . I ran "apt-get clean" and now it's down to 12% and 6%. Alerting would be at 95% by default. So looks like somebody (or something like a cron?) already deleted stuff.
Sat, Oct 12
@mobrovac Yes, i agree. Making 2 new LVS and DNS services, one parsoid-php and one parsoid-js and then switching first from old parsoid to parsoid-js seems like the best plan to solve the conflict. My latest patch is the attempt to add that config for a new parsoid-php service so i could more or less copy that to make parsoid-js first. ACK.
Fri, Oct 11
OIT reports E-mail account has been created. We can start now with some of these.
List has been created
@greg List created. I let it created a random pass, then added the secondary admins and ran a "reset password" command.
redirected to Management Interfaces
Wikitech has the following list of IPMI related pages:
@RobH there is a wikitech page you made back in 2012 about the ipmi_mgmt script at https://wikitech.wikimedia.org/wiki/Systems_management.
The box for production DNS removed is checked but looking at DNS repo it's still there:
assigning to Papaul per IRC chat (thanks!)
codfw db hosts - fixed
Thu, Oct 10
Oh, that was quick and easier than i thought. Thank you!
Announcement text as agreed on on P9309. Paladox is sending mail to wikitech :)
We agreed on Monday, October 21st.
@thcipriani Sounds good and Mondays work for me (from around 10am PST). This coming one is "Wikimedia holiday email / Monday, October 14 US holiday" though. Unless you want to specifically use the WMF holiday to do it for less impact?
@Jclark-ctr checked on this. (Thanks!) but this still needs to happen. One minute i could SSH to it just fine and 12 minutes later it was alerting in Icinga again. So it keeps being "from time to time" and Chris' comment " we will need to power off the host for 10-30secs." still stands.
Could confirm yesterday i can login again with the hotfix. Thanks!
mgmt password updated using cookbook.
Wed, Oct 9
replication.log shows it is replicating again and working on the backlog queue right now.
Broken by https://gerrit.wikimedia.org/r/c/operations/puppet/+/541386 when we renamed the replication target yesterday.
Tue, Oct 8
Enabled the debug log as suggested by Krenair.