Thu, Oct 10
This is an error that sometimes happens during VM creation -- I think it's something like...
Wed, Oct 9
This all sounds good to me. lmk if you need me to make the designate changes.
Tue, Oct 8
I'm merging an experimental patch to reduce the number of connections needed. It's possible that this issue was caused by Newton upgrade (and some changein behavior) but it could also be a result of us switching to an HA setup (if the connection limit on the db side is per user/database and not per host/user/database).
heh, the first forum post I found about this topic suggests raising the connection limit to 2000
Mon, Oct 7
@arturo if you wanted to submit the package upstream there's at least one other person who would appreciate it.
Fri, Oct 4
@aborrero, do you have intuition about whether packaging bdsync for Buster is hard or easy?
(nevermind, I think I see what's happening)
Thu, Oct 3
I am finally back looking at this! I'm not sure quite what I should expect regarding the raids here -- I reimaged (for buster) and saw the partitioner offer to create two volumes, but in the OS I only see one:
Wed, Oct 2
I'm closing this because everything is different now
The system for assigning a particular wiki to a particular db host in mediawiki-config has changed a lot since I last touched this code. @Joe, if you could write me a sample patch of how to break out labtestwiki into its own group and direct it to a different db server, I should be able to take it from there.
regarding ldap: I just created a new project in codfw1dev and added a member. Ldap config looks correct to me, for example:
this is moot now that labpuppetmaster1001/1002 aren't puppetmasters anymore :)
I dug a little deeper, and the primary issue is local diffs in /var/lib/git/operations/puppet on af-puppetmaster02.automation-framework.eqiad.wmflabs. If you commit those and are able to get a sensible rebase with modern upstream puppet, then you'll need to update a couple of other things:
Tue, Oct 1
Mon, Sep 30
Things look better after that last patch
I noticed this issue because of the source IP that was detected by the dns recursor on cloudservices2002-dev. After this change, things are slightly worse:
Is T210008.wikistats.eqiad.wmflabs associated with this bug? It has had broken puppet for many weeks -- perhaps it can be deleted?
Fri, Sep 27
Thu, Sep 26
Wed, Sep 25
(btw, I bet that exposing these IPs to production hosts breaks a lot of our 'future ideal model' rules, so if we can move towards total-outside-world-natting it might be considered forward progress in some circles)
The other question I have about this hack is... do we need it? The issue I ran into that caused me to notice it was the dns-recursors not recognizing the source IPs, but that's quite easy for me to work around.
I put a new cloudvirt online yesterday, and boosted your quotas. If things get scheduled on hdd systems and you need them moved just let me know.
Brooke took a stab at this but writing the script turns out to be non-trivial; this happens infrequently and we have good docs now so we're going to try to avoid writing this.
Tue, Sep 24
@MusikAnimal nice work! I've revered the quota boost.
Approved during WMCS meeting
If someone (arturo?) knows how to reliably forward the patch, I'm inclined to go with that for now and then refactor to other mechanisms post-upgrade, just in the interest of changing fewer things at a time. I don't know if forwarding the patch is something we can do without introducing unknowns though.
We're precariously close to upgrading to Newton, so maybe this is moot?
Mon, Sep 23
Sat, Sep 21
I just uploaded a new Stretch image (9.11); now neither buster nor stretch will have swap partitions.
Thu, Sep 19
Now that we're running designate/newton this is unblocked. Switching will probably involve downtime, though, since we need to swap in a different pdns version at the same time as a different designate backend.
Other than the database cleanup this is now done.
after this is done we can clean up the db indicated by mysql://designate:<password>@clouddb2001-dev.codfw.wmnet/designate_pool_manager and the equivalent in prod
Cloudservices1003 and 1004 are now running Designate version Newton. There are a few more steps that we should take before we're ready for Ocata there, though -- we need to move to the worker/producer model and also (probably) to pdns4.
Wed, Sep 18
I'm sorry I didn't get to this! It sounds like you are (probably) all set.
Tue, Sep 17
@MusikAnimal, I'm temporarily doubled the ram and CPU quotas in this project. Once you've created the new VMs and deleted the old ones let me know and I'll revert the change.
Approved, I'll help with this shortly.
Mon, Sep 16
Sorry, this shouldn't have alerted -- the downtime expired. This will be talking to a test database server (clouddb2001-dev).
Sep 13 2019
I fixed the file that @Krenair mentioned and confirmed that /var/lib/puppet/ssl/certs/ca.pem == /etc/ssl/certs/Puppet_Internal_CA.pem ==/var/lib/puppet/client/ssl/certs/ca.pem on all hosts in tools.
The attached patch resolves the issue without need for revert.
Sep 12 2019
Here is what happens without those three reverts:
Sep 11 2019
I think this task is done but I'll let @Krenair comment and close :)
I built a second cumin host, cloud-cumin-02.cloudinfra.eqiad.wmflabs. It's partly for backup, and partly because I wanted to confirm that the existing puppetization is sufficient. It turns out that it is! The new host just required a reboot to get keyholder on board.
Closing as the VM doesn't exist anymore
I'm no longer clear that this is a good idea/necessary