This has been cleaned up for the moment. The proper fix is to add a cleanup stage in wmfkeystonehooks but for now this can be addressed via periodic runs of wmcs-novastats-puppetleaks --delete.
works for me! Can you ping on this task when ready? Or do you have an eta?
This is approved. You don't actually need a quota change for the disk space but we'll adjust quota to move that large VM to an xlarge vm. If you need a temporary bump in order to do rebuilds just let us know.
I've run maintain-replica-indexes and maintain-views on labsdb1011; the maintain_meta_p command fails:
wmcs steps done
I always feel like I'm flying blind when I do this, but I've run the steps on https://wikitech.wikimedia.org/wiki/Add_a_wiki#Cloud_Services and this DB is now visible on Quarry. I think that means we're done.
Tue, Jun 2
@bd808's instructions seem to have worked.
@Krenair, can you summarize the results here? It looks resolved but it's not clear if or how :)
This is fixed in our Rocky deploy and I have a pending upstream patch.
I think this is fixed.
Thu, May 28
I upgraded the puppetmaster packages on labtestpuppetmaster2001 and things are working for now. This isn't our long-term plan but it should unblock me for now.
Wed, May 27
I did a simple hand test of this (setting on one host using the local mcrouter port, getting on another host using that host's local mcrouter) and it looks good. I've also get designate-producer working which uses mcrouter for coordination and it seems happy.
I warned @Vgutierrez that we'll be using acme-chief and will need to know about possible breaking changes. He seemed fine with all that.
Tue, May 26
Regardless of whether or not we move existing cloudvirts from 2 ports to 1, we can definitely rack these new servers with only one 10g connection if we take the vlan steps described in T248425.
After a recent meeting, we're going to put this project on hold while we give acme-chief another try. Acme-chief is clearly not the ideal solution for cloud-vps but there are few enough use-cases that it might be better to re-use this rather than add new code to support.
Fri, May 22
Horizon requires 2fa setup via wikitech, so probably best to not include that as a test case.
I think what we are seeing is this:
Thu, May 21
A few weeks is just fine. Thank you!
Wed, May 20
If you're able to muster everyone into backing up their homedirs, I'd love to remove the mount.
I left stack traces on, so you can reproduce and see this error without a wikitech-static login. Can we just rebuild that db?
[dbf6b55549aeaa4dd45553af] /w/index.php?search=ceph&title=Special%3ASearch&profile=default&fulltext=1 Wikimedia\Rdbms\DBQueryError from line 1603 of /srv/mediawiki/w/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? Query: SELECT page_id,page_namespace,page_title FROM `page`,`searchindex` WHERE (page_id=si_page) AND ( MATCH(si_title) AGAINST('+ceph ' IN BOOLEAN MODE) ) AND page_namespace = '0' ORDER BY MATCH(si_title) AGAINST('+ceph ' IN NATURAL LANGUAGE MODE) DESC LIMIT 21 Function: SearchMySQL::searchInternal Error: 144 Table './wikitech/searchindex' is marked as crashed and last (automatic?) repair failed (localhost)
I've removed all the mounts other than /home from all VMs in this project.
There's not a straightforward way to preserve the contents of /home outside of manual copying. For now I'll remove all the mounts other than that one, then you can ping if/when y'all think you're clear of it.
Tue, May 19
@Nemo_bis *bump* can you respond to me most recent question?
Hello again @Addshore!
Reassigning to arturo in case he knows how to proceed and/or thinks that we don't actually need to do this :)
Hello! I can't account for why your password was rejected, but I have enabled your wikitech account such that you should now be able to do a password reset. Give that a try, and let me know if you run into any new issues.
This is resolved for existing nodes. I'm keeping this open as a reference, though, because we'll need to do the same song-and-dance for any future odes that are moved to ceph.
Mon, May 18
@elukey I'm don't remember why I assigned this to you :) Are you actively working on it or does it need a new home?
I deleted the last VM using that zone, and deleted the zone.
I fixed at least one thing with the 57.15.185.in-addr.arpa zone (the SOA was pointing to a currently broken resolver).
Sun, May 17
maps are moved to another proxy host, so this is done.
All these proxies are now hosted on maps-proxy-01 in the project-proxy project, using the new profile::wmcs::proxy::static setup. Four-level domains are still a bit of a problem since we rely on the *.wmflabs.org wildcard, but once we have a reasonable let's encrypt setup we should be able to automate per-domain certs here.
This seems to be working -- we'll see if anyone complains! Docs are at
@EBernhardson I've switched the wdqs hosts back to local storage so you should be able to recreate the VMs that you need any time. Let me know if you run into any trouble!
Sat, May 16
Fri, May 15
1004 and 1006 are now running ceph; I've drained the wdqs nodes but not rebuilt them yet.
Fixed cloudvirt-wdqs1002 with:
I thought the install/uninstall dance was doing something with kernel modules but diffing lsmod outputs didn't get me anywhere.
The fix is not to /run/ qemu version 1:2.8, but rather to have ever installed it. This, for example, fixed cloudvirt-wdqs1001:
Thu, May 14
Next steps are:
I downgraded some things on cloudvirt1004 and now I'm able to launch VMs.
It's quite possible that the MTU error represents forward progress. Possible but uncertain
Here is an example creation command:
@EBernhardson sorry for the delay in responding! We can return these hosts to you at any time -- I'll start working on that shortly unless you're back to not needing them :)