This has been cleaned up for the moment. The proper fix is to add a cleanup stage in wmfkeystonehooks but for now this can be addressed via periodic runs of wmcs-novastats-puppetleaks --delete.

Jun 3 2020, 5:17 PM · cloud-services-team, Cloud-VPS

Andrew added a comment to T250706: Prepare and check storage layer for gomwiktionary.

works for me! Can you ping on this task when ready? Or do you have an eta?

Jun 3 2020, 5:11 PM · cloud-services-team (Kanban), Data-Services, DBA

Andrew assigned T253836: Update quotas for MWoffliner VPS to aborrero.

This is approved. You don't actually need a quota change for the disk space but we'll adjust quota to move that large VM to an xlarge vm. If you need a temporary bump in order to do rebuilds just let us know.

Jun 3 2020, 3:55 PM · Cloud-VPS (Quota-requests), affects-Kiwix-and-openZIM

Andrew added a comment to T250706: Prepare and check storage layer for gomwiktionary.

I've run maintain-replica-indexes and maintain-views on labsdb1011; the maintain_meta_p command fails:

Jun 3 2020, 2:40 PM · cloud-services-team (Kanban), Data-Services, DBA

Andrew closed T251410: Prepare and check storage layer for awawiki, a subtask of T251371: Create Awadhi Wikipedia, as Resolved.

Jun 3 2020, 2:01 PM · MW-1.35-notes (1.35.0-wmf.31; 2020-05-05), User-Kizule, Wiki-Setup (Create), User-Urbanecm

Andrew closed T251410: Prepare and check storage layer for awawiki as Resolved.

wmcs steps done

Jun 3 2020, 2:01 PM · cloud-services-team (Kanban), Data-Services, DBA

Andrew closed T250706: Prepare and check storage layer for gomwiktionary, a subtask of T249506: Create Wiktionary Konkani, as Resolved.

Jun 3 2020, 2:01 PM · User-Ladsgroup, MW-1.35-notes (1.35.0-wmf.30; 2020-04-28), User-Urbanecm, Wiki-Setup (Create)

Andrew closed T250706: Prepare and check storage layer for gomwiktionary as Resolved.

I always feel like I'm flying blind when I do this, but I've run the steps on https://wikitech.wikimedia.org/wiki/Add_a_wiki#Cloud_Services and this DB is now visible on Quarry. I think that means we're done.

Jun 3 2020, 2:00 PM · cloud-services-team (Kanban), Data-Services, DBA

Jun 2 2020

Andrew closed T243730: DBQueryError on Wikitech Static Search as Resolved.

@bd808's instructions seem to have worked.

Jun 2 2020, 6:30 PM · Wikimedia-production-error, cloud-services-team (Kanban), CirrusSearch, Discovery-Search, wikitech.wikimedia.org

Andrew added a comment to T245937: tools-acme-chief-01 is attempting to validate DNS challenge against cloud authdns IPv6 addresses.

@Krenair, can you summarize the results here? It looks resolved but it's not clear if or how :)

Jun 2 2020, 4:45 PM · Patch-For-Review, cloud-services-team, IPv6, Acme-chief

Andrew claimed T252224: stale data in the ENC API.

Jun 2 2020, 4:32 PM · cloud-services-team, Cloud-VPS

Andrew closed T253304: Designate sometimes fails to delete zones as Resolved.

This is fixed in our Rocky deploy and I have a pending upstream patch.

Jun 2 2020, 4:16 PM · cloud-services-team (Kanban)

Andrew triaged T253780: Upgrade cloudservices nodes to Debian Buster as Medium priority.

Jun 2 2020, 4:16 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew moved T253780: Upgrade cloudservices nodes to Debian Buster from Inbox to Doing on the cloud-services-team (Kanban) board.

Jun 2 2020, 4:15 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew closed T133082: Horizon: Project links on the project panel often drop user into logged-out hell as Resolved.

I think this is fixed.

Jun 2 2020, 2:20 PM · TestMe, Horizon

May 28 2020

Andrew committed rLPRIc09987cacff0: Add another ldap dummy pass.

Add another ldap dummy pass

May 28 2020, 9:47 PM

Andrew committed rLPRI8078dd56aaa3: Add a dummy ldap sync pass.

Add a dummy ldap sync pass

May 28 2020, 9:46 PM

Andrew updated the task description for T253780: Upgrade cloudservices nodes to Debian Buster.

May 28 2020, 3:33 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew updated the task description for T253780: Upgrade cloudservices nodes to Debian Buster.

May 28 2020, 2:55 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew closed T253817: broken puppet on codfw1dev VMs as Resolved.

I upgraded the puppetmaster packages on labtestpuppetmaster2001 and things are working for now. This isn't our long-term plan but it should unblock me for now.

May 28 2020, 12:24 AM · Puppet-infrastructure-modernization, Puppet, cloud-services-team (Kanban)

May 27 2020

Andrew created T253817: broken puppet on codfw1dev VMs.

May 27 2020, 11:25 PM · Puppet-infrastructure-modernization, Puppet, cloud-services-team (Kanban)

Andrew added a comment to T251574: Package mcrouter 0.41 for Debian Buster .

I did a simple hand test of this (setting on one host using the local mcrouter port, getting on another host using that host's local mcrouter) and it looks good. I've also get designate-producer working which uses mcrouter for coordination and it seems happy.

May 27 2020, 10:48 PM · cloud-services-team (Kanban), serviceops, Packaging

Andrew updated the task description for T253780: Upgrade cloudservices nodes to Debian Buster.

May 27 2020, 6:30 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew updated the task description for T253780: Upgrade cloudservices nodes to Debian Buster.

May 27 2020, 6:12 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew created T253780: Upgrade cloudservices nodes to Debian Buster.

May 27 2020, 6:02 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew updated subscribers of T252721: cloud-vps solution for Let's Encrypt.

I warned @Vgutierrez that we'll be using acme-chief and will need to know about possible breaking changes. He seemed fine with all that.

May 27 2020, 5:35 PM · cloud-services-team (Kanban), Cloud-VPS

May 26 2020

Andrew added a comment to T251627: (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet.

Regardless of whether or not we move existing cloudvirts from 2 ports to 1, we can definitely rack these new servers with only one 10g connection if we take the vlan steps described in T248425.

May 26 2020, 6:43 PM · Patch-For-Review, cloud-services-team (Hardware), ops-eqiad, SRE, DC-Ops

Andrew changed the status of T252721: cloud-vps solution for Let's Encrypt, a subtask of T161256: multi-component wmflabs.org subdomains doesn't work under simple wildcard TLS cert, from Open to Stalled.

May 26 2020, 5:26 PM · cloud-services-team (Kanban), SRE, Traffic, Maps, Cloud-VPS, DNS

Andrew changed the status of T252721: cloud-vps solution for Let's Encrypt, a subtask of T251558: multilevel domains in the 'maps' project don't use tls, from Open to Stalled.

May 26 2020, 5:26 PM · cloud-services-team (Kanban), Cloud-Services

Andrew changed the status of T252721: cloud-vps solution for Let's Encrypt, a subtask of T252199: Stop using letsencrypt::cert::integrated, from Open to Stalled.

May 26 2020, 5:26 PM · cloud-services-team (Kanban), Mail

Andrew changed the status of T252721: cloud-vps solution for Let's Encrypt, a subtask of T252734: Consider moving tools away from acme-chief, from Open to Stalled.

May 26 2020, 5:26 PM · cloud-services-team (Kanban), Tools

Andrew changed the status of T252721: cloud-vps solution for Let's Encrypt from Open to Stalled.

After a recent meeting, we're going to put this project on hold while we give acme-chief another try. Acme-chief is clearly not the ideal solution for cloud-vps but there are few enough use-cases that it might be better to re-use this rather than add new code to support.

May 26 2020, 5:26 PM · cloud-services-team (Kanban), Cloud-VPS

May 22 2020

Andrew added a comment to T253414: Problems accesing superset and horizon.wikimedia.org.

Horizon requires 2fa setup via wikitech, so probably best to not include that as a test case.

May 22 2020, 8:09 PM · SRE, LDAP

Andrew added a comment to T253304: Designate sometimes fails to delete zones.

I think what we are seeing is this:

May 22 2020, 7:16 PM · cloud-services-team (Kanban)

May 21 2020

Andrew updated the task description for T251294: Upgrade cloud-vps control plane to Debian Buster.

May 21 2020, 8:12 PM · cloud-services-team (Kanban)

Andrew closed T253124: Upgrade cloudnet2003 and cloudnet2004 to Debian Buster, a subtask of T251294: Upgrade cloud-vps control plane to Debian Buster, as Resolved.

May 21 2020, 6:41 PM · cloud-services-team (Kanban)

Andrew closed T253124: Upgrade cloudnet2003 and cloudnet2004 to Debian Buster as Resolved.

May 21 2020, 6:41 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew updated the task description for T253124: Upgrade cloudnet2003 and cloudnet2004 to Debian Buster.

May 21 2020, 3:52 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew updated the task description for T253124: Upgrade cloudnet2003 and cloudnet2004 to Debian Buster.

May 21 2020, 3:36 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew created T253304: Designate sometimes fails to delete zones.

May 21 2020, 12:57 PM · cloud-services-team (Kanban)

Andrew added a comment to T208416: Check whether wikidata-dev project requires NFS or not.

A few weeks is just fine. Thank you!

May 21 2020, 4:07 AM · User-Addshore, Patch-For-Review, Wikidata-Campsite, cloud-services-team (Kanban), [DEPRECATED] wdwb-tech, Wikidata, Cloud-VPS

May 20 2020

Andrew added a comment to T208416: Check whether wikidata-dev project requires NFS or not.

If you're able to muster everyone into backing up their homedirs, I'd love to remove the mount.

May 20 2020, 6:44 PM · User-Addshore, Patch-For-Review, Wikidata-Campsite, cloud-services-team (Kanban), [DEPRECATED] wdwb-tech, Wikidata, Cloud-VPS

Andrew added a comment to T243730: DBQueryError on Wikitech Static Search.

I left stack traces on, so you can reproduce and see this error without a wikitech-static login. Can we just rebuild that db?

May 20 2020, 6:30 PM · Wikimedia-production-error, cloud-services-team (Kanban), CirrusSearch, Discovery-Search, wikitech.wikimedia.org

Andrew added a comment to T243730: DBQueryError on Wikitech Static Search.

[dbf6b55549aeaa4dd45553af] /w/index.php?search=ceph&title=Special%3ASearch&profile=default&fulltext=1 Wikimedia\Rdbms\DBQueryError from line 1603 of /srv/mediawiki/w/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?
Query: SELECT page_id,page_namespace,page_title FROM `page`,`searchindex` WHERE (page_id=si_page) AND ( MATCH(si_title) AGAINST('+ceph ' IN BOOLEAN MODE) ) AND page_namespace = '0' ORDER BY MATCH(si_title) AGAINST('+ceph ' IN NATURAL LANGUAGE MODE) DESC LIMIT 21
Function: SearchMySQL::searchInternal
Error: 144 Table './wikitech/searchindex' is marked as crashed and last (automatic?) repair failed (localhost)

May 20 2020, 6:28 PM · Wikimedia-production-error, cloud-services-team (Kanban), CirrusSearch, Discovery-Search, wikitech.wikimedia.org

Andrew added a comment to T208416: Check whether wikidata-dev project requires NFS or not.

I've removed all the mounts other than /home from all VMs in this project.

May 20 2020, 4:08 PM · User-Addshore, Patch-For-Review, Wikidata-Campsite, cloud-services-team (Kanban), [DEPRECATED] wdwb-tech, Wikidata, Cloud-VPS

Andrew added a comment to T208416: Check whether wikidata-dev project requires NFS or not.

There's not a straightforward way to preserve the contents of /home outside of manual copying. For now I'll remove all the mounts other than that one, then you can ping if/when y'all think you're clear of it.

May 20 2020, 2:50 AM · User-Addshore, Patch-For-Review, Wikidata-Campsite, cloud-services-team (Kanban), [DEPRECATED] wdwb-tech, Wikidata, Cloud-VPS

May 19 2020

Andrew added a comment to T208402: Check whether dumps project requires NFS or not.

@Nemo_bis *bump* can you respond to me most recent question?

May 19 2020, 6:06 PM · cloud-services-team (Kanban), Cloud-VPS

Andrew added a comment to T208416: Check whether wikidata-dev project requires NFS or not.

Hello again @Addshore!

May 19 2020, 6:05 PM · User-Addshore, Patch-For-Review, Wikidata-Campsite, cloud-services-team (Kanban), [DEPRECATED] wdwb-tech, Wikidata, Cloud-VPS

Andrew reassigned T247972: Cloud DNS: fix inconsistent ownership of reverse domains for openstack floating ip networks from Andrew to aborrero.

Reassigning to arturo in case he knows how to proceed and/or thinks that we don't actually need to do this :)

May 19 2020, 5:56 PM · cloud-services-team (Kanban)

Andrew triaged T252224: stale data in the ENC API as Medium priority.

May 19 2020, 5:54 PM · cloud-services-team, Cloud-VPS

Andrew moved T252721: cloud-vps solution for Let's Encrypt from Inbox to Needs discussion on the cloud-services-team (Kanban) board.

May 19 2020, 4:52 PM · cloud-services-team (Kanban), Cloud-VPS

Andrew claimed T252874: Can't log in to newly-created account.

Hello! I can't account for why your password was rejected, but I have enabled your wikitech account such that you should now be able to do a password reset. Give that a try, and let me know if you run into any new issues.

May 19 2020, 4:49 PM · cloud-services-team (Kanban)

Andrew added a comment to T174469: LDAP account that is not attached on wikitech has no means for password reset.

bd808 says:

May 19 2020, 4:48 PM · Infrastructure-Foundations, Bitu, Striker, wikitech.wikimedia.org

Andrew triaged T252199: Stop using letsencrypt::cert::integrated as Medium priority.

May 19 2020, 4:11 PM · cloud-services-team (Kanban), Mail

Andrew moved T252762: tools/toolsbeta: improve acme-chief integration from Inbox to Needs discussion on the cloud-services-team (Kanban) board.

May 19 2020, 4:10 PM · Acme-chief, cloud-services-team (Kanban)

Andrew triaged T252831: cloudvirt ceph nodes can't launch new VMs as Medium priority.

May 19 2020, 4:04 PM · cloud-services-team (Kanban)

Andrew moved T252831: cloudvirt ceph nodes can't launch new VMs from Inbox to Doing on the cloud-services-team (Kanban) board.

May 19 2020, 4:04 PM · cloud-services-team (Kanban)

Andrew changed the status of T252831: cloudvirt ceph nodes can't launch new VMs from Open to Stalled.

This is resolved for existing nodes. I'm keeping this open as a reference, though, because we'll need to do the same song-and-dance for any future odes that are moved to ceph.

May 19 2020, 4:04 PM · cloud-services-team (Kanban)

Andrew triaged T253124: Upgrade cloudnet2003 and cloudnet2004 to Debian Buster as Medium priority.

May 19 2020, 4:03 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew moved T253124: Upgrade cloudnet2003 and cloudnet2004 to Debian Buster from Inbox to Doing on the cloud-services-team (Kanban) board.

May 19 2020, 4:03 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew created T253124: Upgrade cloudnet2003 and cloudnet2004 to Debian Buster.

May 19 2020, 2:39 PM · Patch-For-Review, cloud-services-team (Kanban)

Andrew updated the task description for T251294: Upgrade cloud-vps control plane to Debian Buster.

May 19 2020, 2:32 AM · cloud-services-team (Kanban)

May 18 2020

Andrew updated the task description for T251294: Upgrade cloud-vps control plane to Debian Buster.

May 18 2020, 8:20 PM · cloud-services-team (Kanban)

Andrew added a comment to T251574: Package mcrouter 0.41 for Debian Buster .

@elukey I'm don't remember why I assigned this to you :) Are you actively working on it or does it need a new home?

May 18 2020, 6:58 PM · cloud-services-team (Kanban), serviceops, Packaging

Andrew closed T247134: Fix all codfw1dev.cloud references as Resolved.

I deleted the last VM using that zone, and deleted the zone.

May 18 2020, 6:56 PM · cloud-services-team (Kanban), Cloud-VPS

Andrew added a comment to T247972: Cloud DNS: fix inconsistent ownership of reverse domains for openstack floating ip networks.

I fixed at least one thing with the 57.15.185.in-addr.arpa zone (the SOA was pointing to a currently broken resolver).

May 18 2020, 4:46 PM · cloud-services-team (Kanban)

May 17 2020

Andrew renamed T251558: multilevel domains in the 'maps' project don't use tls from multilevel domains in the 'maps' project to multilevel domains in the 'maps' project don't use tls.

May 17 2020, 10:01 PM · cloud-services-team (Kanban), Cloud-Services

Andrew closed T233995: Setup a dedicated HTTPS terminating proxy for maps project as Resolved.

May 17 2020, 10:01 PM · cloud-services-team (Kanban), VPS-Projects, Cloud-VPS

Andrew closed T233995: Setup a dedicated HTTPS terminating proxy for maps project, a subtask of T251558: multilevel domains in the 'maps' project don't use tls, as Resolved.

May 17 2020, 10:01 PM · cloud-services-team (Kanban), Cloud-Services

Andrew added a parent task for T252721: cloud-vps solution for Let's Encrypt: T251558: multilevel domains in the 'maps' project don't use tls.

May 17 2020, 10:01 PM · cloud-services-team (Kanban), Cloud-VPS

Andrew added a subtask for T251558: multilevel domains in the 'maps' project don't use tls: T252721: cloud-vps solution for Let's Encrypt.

May 17 2020, 10:01 PM · cloud-services-team (Kanban), Cloud-Services

Andrew removed a subtask for T131290: Abolish use of labs proxies in domains other than .wmflabs.org: T251558: multilevel domains in the 'maps' project don't use tls.

May 17 2020, 10:00 PM · cloud-services-team (Kanban), Cloud-Services

Andrew removed a parent task for T251558: multilevel domains in the 'maps' project don't use tls: T131290: Abolish use of labs proxies in domains other than .wmflabs.org.

May 17 2020, 10:00 PM · cloud-services-team (Kanban), Cloud-Services

Andrew closed T131290: Abolish use of labs proxies in domains other than .wmflabs.org, a subtask of T131288: Make Cloud Services shared HTTP proxies enforce TLS, as Resolved.

May 17 2020, 10:00 PM · User-bd808, cloud-services-team (Kanban), Cloud-VPS

Andrew closed T131290: Abolish use of labs proxies in domains other than .wmflabs.org as Resolved.

maps are moved to another proxy host, so this is done.

May 17 2020, 10:00 PM · cloud-services-team (Kanban), Cloud-Services

Andrew added a comment to T251558: multilevel domains in the 'maps' project don't use tls.

All these proxies are now hosted on maps-proxy-01 in the project-proxy project, using the new profile::wmcs::proxy::static setup. Four-level domains are still a bit of a problem since we rely on the *.wmflabs.org wildcard, but once we have a reasonable let's encrypt setup we should be able to automate per-domain certs here.

May 17 2020, 9:59 PM · cloud-services-team (Kanban), Cloud-Services

Andrew added a comment to T233995: Setup a dedicated HTTPS terminating proxy for maps project.

This seems to be working -- we'll see if anyone complains! Docs are at

May 17 2020, 9:57 PM · cloud-services-team (Kanban), VPS-Projects, Cloud-VPS

Andrew closed T221631: Dedicated servers on WMCS to test WDQS scalability strategy, a subtask of T206636: Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service, as Resolved.

May 17 2020, 4:15 AM · cloud-services-team (Kanban), User-Smalyshev, Wikidata, SRE, Wikidata-Query-Service

Andrew closed T221631: Dedicated servers on WMCS to test WDQS scalability strategy, a subtask of T221630: [Epic] Search platform - Hardware requests for 2019-2020, as Resolved.

May 17 2020, 4:15 AM · Discovery-Search (Current work), Epic

Andrew closed T221631: Dedicated servers on WMCS to test WDQS scalability strategy as Resolved.

@EBernhardson I've switched the wdqs hosts back to local storage so you should be able to recreate the VMs that you need any time. Let me know if you run into any trouble!

May 17 2020, 4:15 AM · cloud-services-team (Kanban), Wikidata, Wikidata-Query-Service, Discovery-Search

Andrew closed T252784: Remove Ceph from cloudvirt-wdqs100x, Add ceph to cloudvirt1004 and cloudvirt1006, a subtask of T221631: Dedicated servers on WMCS to test WDQS scalability strategy, as Resolved.

May 17 2020, 4:13 AM · cloud-services-team (Kanban), Wikidata, Wikidata-Query-Service, Discovery-Search

Andrew closed T252784: Remove Ceph from cloudvirt-wdqs100x, Add ceph to cloudvirt1004 and cloudvirt1006 as Resolved.

May 17 2020, 4:13 AM · cloud-services-team (Kanban), Wikidata, Wikidata-Query-Service