Page MenuHomePhabricator

Andrew (Andrew Bogott)
User

Projects (12)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2014, 11:35 PM (376 w, 3 d)
Availability
Available
IRC Nick
andrewbogott
LDAP User
Unknown
MediaWiki User
Andrewbogott [ Global Accounts ]

Recent Activity

Today

Andrew closed T170355: Figure out process for deleting an unused tool as Resolved.

All of the backend work for this task is done. we don't have a UI for that, but the UI work is T285403 so I'm going to close this one.

Thu, Jan 20, 6:04 PM · cloud-services-team (FY2021/2022-Q3), Patch-For-Review, Toolforge
Andrew added a comment to T285403: Add striker UI features for disabling/enabling/deleting tools,.

This is a bit blocked pending a striker test/dev env and/or someone wanting to work on it. Most of the backend decisions are made though.

Thu, Jan 20, 6:04 PM · Striker, cloud-services-team (Kanban)
Andrew closed T170355: Figure out process for deleting an unused tool, a subtask of T133777: Tools that should get archived/deleted (tracking), as Resolved.
Thu, Jan 20, 6:03 PM · User-bd808, Projects-Cleanup, Tracking-Neverending, Toolforge
Andrew added a comment to T299610: (Need By: TBD) rack/setup/install labstore100[89].

clouddumps100x or clouddatasets100x or just datasets100x

Thu, Jan 20, 5:01 PM · SRE, ops-eqiad, cloud-services-team (Hardware), DC-Ops
Andrew added a comment to T299610: (Need By: TBD) rack/setup/install labstore100[89].

Rack and network looks right to me. We might be renaming these hosts but I'll get the task retitled before the servers show up.

Thu, Jan 20, 12:11 AM · SRE, ops-eqiad, cloud-services-team (Hardware), DC-Ops

Yesterday

Andrew committed rLPRIb1b791881eed: More dummy passwords for eqiad1 cinder backups (authored by Andrew).
More dummy passwords for eqiad1 cinder backups
Wed, Jan 19, 6:18 PM
Andrew committed rLPRIee7228569e15: Another dummy password for eqiad backup services in codfw (authored by Andrew).
Another dummy password for eqiad backup services in codfw
Wed, Jan 19, 3:05 AM
Andrew committed rLPRI3e0624ad3e57: Add dummy password for profile::openstack::eqiad1::cinder::db_pass (authored by Andrew).
Add dummy password for profile::openstack::eqiad1::cinder::db_pass
Wed, Jan 19, 3:03 AM

Fri, Jan 14

Andrew claimed T291405: [NFS] Reduce or eliminate bare-metal NFS servers.
Fri, Jan 14, 9:55 PM · cloud-services-team (Kanban)

Thu, Jan 13

Andrew awarded T299120: cloudvirt1024/Check unit status of backup_vms is CRITICAL a Stroopwafel token.
Thu, Jan 13, 2:51 PM · Patch-For-Review, Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-Alert, cloud-services-team (Kanban), User-dcaro
Andrew claimed T299139: horizon: unable to paste multi line yaml into puppet hiera config.
Thu, Jan 13, 2:40 PM · Horizon

Mon, Jan 10

Andrew added a comment to T297446: Request increased quota for wikiwho Cloud VPS project (volume storage).

Thanks for the feedback!

Mon, Jan 10, 5:31 PM · WikiWho, cloud-services-team (Kanban), Cloud-VPS (Quota-requests)

Fri, Jan 7

Andrew added a comment to T298726: Request to enable XFF headers for wikiwho VPS project.

+1, fine with me!

Fri, Jan 7, 6:49 PM · WikiWho, cloud-services-team (Kanban)
Andrew closed T298681: nova wmfsink not deleting dns records for proxies whose name equals to project name as Resolved.
Fri, Jan 7, 6:06 PM · cloud-services-team (Kanban), Cloud-VPS

Thu, Jan 6

Andrew updated subscribers of T293934: Q2:(Need By: TBD) rack/setup/install cloudbackup100[34].

@Cmjohnson is there not a hw raid controller?

Thu, Jan 6, 6:56 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
Andrew added a comment to T297814: cloudmetrics1003 seizes up under load.

Thanks for your ongoing attention on this! I'm frustrated by how easily I can produce this issue in production but can't find a non-downtime-causing test case that produces the same issue. If we fully run out of other troubleshooting ideas I can spend a few days trying to simulate the workload and see if we can make it fail reliably.

Thu, Jan 6, 6:49 PM · cloud-services-team (Hardware), SRE, ops-eqiad, Patch-For-Review, decommission-hardware
Andrew added a comment to T298681: nova wmfsink not deleting dns records for proxies whose name equals to project name.

parentzone = '.'.join(proxyzone.split('.')[:3]) ?

Thu, Jan 6, 4:47 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew added a comment to T298681: nova wmfsink not deleting dns records for proxies whose name equals to project name.

parentzone = '.'.join(proxyzone.split('.')[:3]) ?

Thu, Jan 6, 3:25 PM · cloud-services-team (Kanban), Cloud-VPS

Wed, Jan 5

Andrew added a comment to T293934: Q2:(Need By: TBD) rack/setup/install cloudbackup100[34].

@Cmjohnson is there not a hw raid controller?

Wed, Jan 5, 5:39 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
Andrew moved T285668: tiles.wmflabs.org OSM is outdated from Doing to Soon! on the cloud-services-team (Kanban) board.
Wed, Jan 5, 5:28 PM · Patch-For-Review, Data-Services, cloud-services-team (Kanban), Maps
Andrew added a comment to T292195: Unable to create user in Trove postgres DB.

Trove went through a recent refactor which drastically prioritized mysql/mariadb and left a lot of features untested for other DBs. iirc the postgres implementation is still useful if you act directly on the psql interface.

Wed, Jan 5, 3:34 AM · cloud-services-team (Kanban), Cloud-VPS
Andrew closed T291168: [Debian bullseye image] APT source-list file is not up to date as Resolved.

@Majavah reminded me about this issue and caught me up with the latest. A few points:

Wed, Jan 5, 3:31 AM · Wikidata, cloud-services-team (Kanban), Soweego, Cloud-VPS
Andrew closed T291168: [Debian bullseye image] APT source-list file is not up to date, a subtask of T264311: Prepare for puppetizing /etc/apt/sources.list, as Resolved.
Wed, Jan 5, 3:31 AM · cloud-services-team (Kanban)
Andrew closed T264311: Prepare for puppetizing /etc/apt/sources.list as Resolved.
  • created a sources.list.d/docker.list file on all hosts that had a docker entry in sources.list
  • backed up all existing /etc/apt/sources.list files as /etc/apt/sources.list.prepuppet
  • puppetized sources.list
Wed, Jan 5, 3:19 AM · cloud-services-team (Kanban)

Tue, Jan 4

Andrew closed T296664: Our need for cloudvirt hypervisors with local disks as Resolved.

cloudvirt1028 is now the third localdisk host. No actions left here until the time comes to refresh 1019, 1020 or 1028.

Tue, Jan 4, 10:07 PM · Toolforge, cloud-services-team (Kanban)
Andrew added a comment to T264311: Prepare for puppetizing /etc/apt/sources.list.

I changed my mind about the announcement. It's likely that either 0 or 1 users are going to be confused by this so I'll handle this with a comment in the sources.list file

Tue, Jan 4, 9:31 PM · cloud-services-team (Kanban)
Andrew added a parent task for T291168: [Debian bullseye image] APT source-list file is not up to date: T264311: Prepare for puppetizing /etc/apt/sources.list.
Tue, Jan 4, 9:01 PM · Wikidata, cloud-services-team (Kanban), Soweego, Cloud-VPS
Andrew added a subtask for T264311: Prepare for puppetizing /etc/apt/sources.list: T291168: [Debian bullseye image] APT source-list file is not up to date.
Tue, Jan 4, 9:01 PM · cloud-services-team (Kanban)
Andrew added a comment to T264311: Prepare for puppetizing /etc/apt/sources.list.

Proposed good-enough solution:

Tue, Jan 4, 9:00 PM · cloud-services-team (Kanban)
Andrew claimed T264311: Prepare for puppetizing /etc/apt/sources.list.
Tue, Jan 4, 8:59 PM · cloud-services-team (Kanban)
Andrew added a comment to T291168: [Debian bullseye image] APT source-list file is not up to date.

I've just confirmed that this still happens

Tue, Jan 4, 8:36 PM · Wikidata, cloud-services-team (Kanban), Soweego, Cloud-VPS
Andrew closed T298466: puppet failure on unknown instance as Resolved.

It looks to me like you have this sorted out. The puppet failure message uses hostname -f so if your host thought it was named something funny then you'd get an email from that funny name.

Tue, Jan 4, 8:29 PM · cloud-services-team (Kanban), Internet-Archive, Cloud-VPS
Andrew closed T298544: should attached volumes automatically mount? as Declined.

Copying the fstab record is fine; I'm pretty sure that wmcs-prepare-cinder-volume would also do the fstab work for you (with a bit of user-friendly interaction).

Tue, Jan 4, 8:25 PM · Cloud-VPS, cloud-services-team (Kanban)
Andrew closed T298291: tools-sgegrid-master needs to be a submit host for tool deletion, a subtask of T170355: Figure out process for deleting an unused tool, as Resolved.
Tue, Jan 4, 8:11 PM · cloud-services-team (FY2021/2022-Q3), Patch-For-Review, Toolforge
Andrew closed T298291: tools-sgegrid-master needs to be a submit host for tool deletion, a subtask of T298285: Check on tool deletion status from 2021-12-23 batch, as Resolved.
Tue, Jan 4, 8:11 PM · cloud-services-team (Kanban), Tools, User-bd808
Andrew closed T298291: tools-sgegrid-master needs to be a submit host for tool deletion as Resolved.
Tue, Jan 4, 8:11 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
Andrew triaged T298531: Horizon User Management: page ldap queries as Medium priority.
Tue, Jan 4, 7:28 PM · cloud-services-team (Kanban)
Andrew committed rCTDTc1cba78103a3: Move _kill_grid_jobs from the grid master to the cron host (authored by Andrew).
Move _kill_grid_jobs from the grid master to the cron host
Tue, Jan 4, 6:53 PM
Andrew added a comment to T298354: Request creation of Teyora VPS project.

+1, sounds good to me

Tue, Jan 4, 4:18 PM · Cloud-VPS (Project-requests)
Andrew added a comment to T298228: Request creation of qrank VPS project.

+1, sounds good to me

Tue, Jan 4, 4:18 PM · Cloud-VPS (Project-requests)
Andrew closed T298428: Requesting additional temporary resources for Cyberbot Project as Resolved.
root@cloudcontrol1003:~# openstack quota set --instances 9 cyberbot
root@cloudcontrol1003:~# openstack quota set --cores 26 cyberbot
root@cloudcontrol1003:~# openstack quota set --ram 53248 cyberbot
Tue, Jan 4, 4:16 PM · Cloud-VPS (Quota-requests), InternetArchiveBot
Andrew added a comment to T298291: tools-sgegrid-master needs to be a submit host for tool deletion.

@Andrew I was wondering if all of the grid bits could be handled from the grid's cron server? It is by necessity a submit host.

Tue, Jan 4, 3:29 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
Andrew created T298531: Horizon User Management: page ldap queries.
Tue, Jan 4, 3:06 PM · cloud-services-team (Kanban)

Fri, Dec 24

Andrew created T298291: tools-sgegrid-master needs to be a submit host for tool deletion.
Fri, Dec 24, 3:04 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge

Dec 20 2021

Andrew added a comment to T295266: wikitech-static down.

i created a tentative (and private) procurement ticket about this issue, here: T298052

Dec 20 2021, 10:18 PM · SRE, cloud-services-team (Kanban), wikitech.wikimedia.org
Andrew added a comment to T297814: cloudmetrics1003 seizes up under load.

Moritz says:

Dec 20 2021, 5:38 PM · cloud-services-team (Hardware), SRE, ops-eqiad, Patch-For-Review, decommission-hardware
Andrew assigned T297814: cloudmetrics1003 seizes up under load to wiki_willy.

@wiki_willy, I'm unable to learn much about what's causing this issue but I strongly suspect a hardware issue has the same workload on cloudmetrics1004 (purchased at the same time) does not cause the failure.

Dec 20 2021, 3:31 PM · cloud-services-team (Hardware), SRE, ops-eqiad, Patch-For-Review, decommission-hardware
Andrew renamed T297814: cloudmetrics1003 seizes up under load from cloudmetrics1003 so slow as to be useless to cloudmetrics1003 seizes up under load.
Dec 20 2021, 3:28 PM · cloud-services-team (Hardware), SRE, ops-eqiad, Patch-For-Review, decommission-hardware

Dec 17 2021

Andrew reassigned T294664: 2021 Cloud Services Survey from Andrew to komla.

@komla I think you have all that you need to handle this now -- please reach out if it turns out to be unclear how to send this.

Dec 17 2021, 8:49 PM · Developer-Advocacy (Jan-Mar 2022), Surveys, cloud-services-team (Kanban)
Andrew added a comment to T297814: cloudmetrics1003 seizes up under load.
root@cloudmetrics1004:/srv# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=200G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=319MiB/s,w=107MiB/s][r=81.6k,w=27.3k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=22480: Fri Dec 17 04:30:13 2021
  read: IOPS=85.0k, BW=332MiB/s (348MB/s)(150GiB/462536msec)
   bw (  KiB/s): min=73744, max=415664, per=99.99%, avg=340008.05, stdev=69227.29, samples=925
   iops        : min=18436, max=103916, avg=85002.01, stdev=17306.82, samples=925
  write: IOPS=28.3k, BW=111MiB/s (116MB/s)(50.0GiB/462536msec); 0 zone resets
   bw (  KiB/s): min=24200, max=140160, per=99.99%, avg=113349.05, stdev=23122.43, samples=925
   iops        : min= 6050, max=35040, avg=28337.24, stdev=5780.61, samples=925
  cpu          : usr=8.78%, sys=89.47%, ctx=201079, majf=0, minf=5418
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=39320441,13108359,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64
Dec 17 2021, 4:50 AM · cloud-services-team (Hardware), SRE, ops-eqiad, Patch-For-Review, decommission-hardware

Dec 16 2021

Andrew added a comment to T294429: cinder-backups: figure out automation.

I'm about to merge a timer that does daily backups in codfw1dev; we'll see how it holds up over the next few days.

Dec 16 2021, 10:26 PM · Patch-For-Review, cloud-services-team (Kanban), User-dcaro

Dec 15 2021

Andrew closed T297812: New instance has been stuck in "scheduling" for more than an hour as Resolved.

This issue is surfaced very poorly, but I've found this in the logs:

Dec 15 2021, 10:02 PM · Cloud-VPS
Andrew added a comment to T297812: New instance has been stuck in "scheduling" for more than an hour.

Hello @jsn.sherman -- I'm interested in this issue but won't have time to investigate for a few hours. If your quota permits it, please leave that stuck host as is and go ahead and try to re-schedule. Either it'll work and get you unstuck or it'll fail and I'll have more testing data :)

Dec 15 2021, 7:49 PM · Cloud-VPS
Andrew added a comment to T297814: cloudmetrics1003 seizes up under load.

Just in case it's not a hardware problem I'm swapping roles between the two hosts to see if the bad behavior follows the role

Dec 15 2021, 5:50 PM · cloud-services-team (Hardware), SRE, ops-eqiad, Patch-For-Review, decommission-hardware
Andrew created T297814: cloudmetrics1003 seizes up under load.
Dec 15 2021, 5:45 PM · cloud-services-team (Hardware), SRE, ops-eqiad, Patch-For-Review, decommission-hardware

Dec 14 2021

Andrew added a comment to T297446: Request increased quota for wikiwho Cloud VPS project (volume storage).

I'm not able to provide a clear answer about this storage quota just yet. In the meantime, though, can you tell me more about 'migrate WikiWho to WMF production, using a much more efficient storage system through the API platform'? I'm wondering specifically about what makes it more efficient, and why the API platform is an option in prod but not in cloud-vps.

Dec 14 2021, 8:30 PM · WikiWho, cloud-services-team (Kanban), Cloud-VPS (Quota-requests)
Andrew updated the task description for T297712: Migrate cloudmetrics workload from cloudmetrics100[1-2] to cloudmetrics100[3-4].
Dec 14 2021, 5:21 PM · cloud-services-team (Kanban), decommission-hardware
Andrew updated the task description for T297712: Migrate cloudmetrics workload from cloudmetrics100[1-2] to cloudmetrics100[3-4].
Dec 14 2021, 3:57 PM · cloud-services-team (Kanban), decommission-hardware
Andrew created T297712: Migrate cloudmetrics workload from cloudmetrics100[1-2] to cloudmetrics100[3-4].
Dec 14 2021, 3:56 PM · cloud-services-team (Kanban), decommission-hardware

Dec 13 2021

Andrew added a comment to T289888: Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet.

update; It seems we aren't ready to run grafana on bullseye yet so I'm rolling these back to Buster

Dec 13 2021, 9:03 PM · Patch-For-Review, SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
Andrew added a comment to T297563: Request increased quota for mwoffliner Cloud VPS project.

thx @Majavah

Dec 13 2021, 2:56 PM · Cloud-VPS (Quota-requests)

Dec 12 2021

Andrew updated subscribers of T297563: Request increased quota for mwoffliner Cloud VPS project.

+1

Dec 12 2021, 3:55 PM · Cloud-VPS (Quota-requests)

Dec 10 2021

Andrew reassigned T296792: decommission cloudvirt10[2,3,4].eqiad.wmnet from Andrew to mdipietro.

@mdipietro, I suggest that you get these hosts ready for decom, as practice. The steps in this task are pretty clear; to remove the hosts from service you'll want to run the cloudvirt/drain cookbook; that should take the hosts out of service and put them in the 'maintenance' aggregate where they won't get new VMs scheduled.

Dec 10 2021, 4:34 PM · SRE, ops-eqiad, cloud-services-team (Kanban), decommission-hardware

Dec 9 2021

Andrew closed T210360: Setup tests framework, a subtask of T210359: Develop Quarry tests, as Resolved.
Dec 9 2021, 2:10 AM · Quarry, cloud-services-team (FY2021/2022-Q3)
Andrew closed T210360: Setup tests framework as Resolved.
Dec 9 2021, 2:10 AM · Quarry

Dec 8 2021

Andrew moved T288982: Productionize quarry a bit from Soon! to Doing on the cloud-services-team (Kanban) board.
Dec 8 2021, 10:46 PM · Quarry, cloud-services-team (Kanban), Epic
Andrew moved T294664: 2021 Cloud Services Survey from Soon! to Doing on the cloud-services-team (Kanban) board.
Dec 8 2021, 10:45 PM · Developer-Advocacy (Jan-Mar 2022), Surveys, cloud-services-team (Kanban)
Andrew added a comment to T294664: 2021 Cloud Services Survey.

The mailing list for this survey is on mwmaint1002.eqiad.wmnet:/home/andrew/wmcs-survey-emails.txt

Dec 8 2021, 9:22 PM · Developer-Advocacy (Jan-Mar 2022), Surveys, cloud-services-team (Kanban)
Andrew added a comment to T294664: 2021 Cloud Services Survey.

To expand on that:

Dec 8 2021, 7:03 PM · Developer-Advocacy (Jan-Mar 2022), Surveys, cloud-services-team (Kanban)
Andrew added a comment to T294664: 2021 Cloud Services Survey.

Let's get @komla the 'restricted' role so he can log onto a mw server and do the emailing for this.

Dec 8 2021, 6:59 PM · Developer-Advocacy (Jan-Mar 2022), Surveys, cloud-services-team (Kanban)
Andrew claimed T294664: 2021 Cloud Services Survey.
Dec 8 2021, 6:17 PM · Developer-Advocacy (Jan-Mar 2022), Surveys, cloud-services-team (Kanban)
Andrew closed T296906: reimage/pxe boot failing on cloudvirt1028 as Resolved.

fixed!

Dec 8 2021, 4:43 AM · SRE, ops-eqiad, cloud-services-team (Kanban)
Andrew closed T297230: Request creation of ldap-dev VPS project as Resolved.

Created! please let me know if you run into any trouble.

Dec 8 2021, 3:45 AM · Cloud-VPS (Project-requests)
Andrew added a comment to T296906: reimage/pxe boot failing on cloudvirt1028.

I don't much care about having to click through the partman step but imaging still fails. Now it stalls on

Dec 8 2021, 3:30 AM · SRE, ops-eqiad, cloud-services-team (Kanban)

Dec 7 2021

Andrew added a comment to T296906: reimage/pxe boot failing on cloudvirt1028.

I am fine with wrangling with the disk partitioning pieces if you don't feel like it; IIRC the cloudvirts often prompt for a keypress at some point during install but otherwise succeed

Dec 7 2021, 8:17 PM · SRE, ops-eqiad, cloud-services-team (Kanban)
Andrew added a comment to T292546: cloud NFS: figure out backups for cinder volumes.

I looked at this more today. A lot of the suddenly-failing jobs seem to be poorly-surfaced OOM issues (I'm testing on cloudbackup1001-dev which only has 4Gb of RAM). When I change the buffer size to be much smaller I get many fewer failures although, unfortunately, I'm still seeing occasional jobs stuck in 'creating' forever.

Dec 7 2021, 5:43 AM · Patch-For-Review, cloud-services-team (FY2021/2022-Q3)

Dec 6 2021

Andrew closed T297059: Request increased quota for wikiapiary Cloud VPS project as Resolved.
Dec 6 2021, 9:52 PM · Cloud-VPS (Quota-requests)
Andrew closed T297125: Request increased quota for mwstake Cloud VPS project as Resolved.
Dec 6 2021, 9:40 PM · Cloud-VPS (Quota-requests)
Andrew added a comment to T297059: Request increased quota for wikiapiary Cloud VPS project.

This resulted from a conversation I had with @bd808 about scarce Cloud VPS resources. The MediaWiki Stakeholders' Group would like to be good citizens and find alternate hosting, but we have not been able to do so yet.

Dec 6 2021, 9:40 PM · Cloud-VPS (Quota-requests)
Andrew added a comment to T297059: Request increased quota for wikiapiary Cloud VPS project.

I've doubled your RAM and CPU quota so that you can create a new Buster VM. Note that when creating the new VM you'll be using the g3.cores8.ram16.disk20 flavor. For your big /srv volume you'll then want to create and attach a cinder volume. Directions about how to do that are at

Dec 6 2021, 9:39 PM · Cloud-VPS (Quota-requests)
Andrew added a comment to T297125: Request increased quota for mwstake Cloud VPS project.

I've doubled your RAM and CPU quota so that you can create a new Buster VM. Note that when creating the new VM you'll be using the g3.cores8.ram16.disk20 flavor. For your big /srv volume you'll then want to create and attach a cinder volume. Directions about how to do that are at

Dec 6 2021, 9:38 PM · Cloud-VPS (Quota-requests)
Andrew added a comment to T297059: Request increased quota for wikiapiary Cloud VPS project.

Thank you for the distro upgrades! For tracking purposes I'd appreciate it if you can create a separate ticket for mwstake and retitle this one to refer to just wikiapiary.

Dec 6 2021, 3:47 PM · Cloud-VPS (Quota-requests)
Andrew added a comment to T297022: Reusing VM names appears to break initial puppet run.

The problem is not the race condition, but most likely a leftover cert in the master (since 15th Jun):

root@cloud-puppetmaster-03:~# find /var/lib/puppet/server/ssl/ca/signed -maxdepth 1 -iname wmcz-stats-test02.wmcz-stats.eqiad1.wikimedia.cloud\*  -printf "modification:%TY-%Tb-%Td-%TH:%TM\taccess:%AY-%Ab-%Ad-%AH:%AM\tstatus_change:%CY-%Cb-%Cd-%CH:%CM\t%P\n"
modification:2021-Jun-15-17:17  access:2021-Dec-03-11:18        status_change:2021-Jun-15-17:17 wmcz-stats-test02.wmcz-stats.eqiad1.wikimedia.cloud.pem

You're right! I ran a cleanup of leaked puppet certs last week (after posting my not-very-correct comment here). Why, then, is the cert for stats-test02 still present? I'm checking now.

Dec 6 2021, 3:03 PM · Cloud-VPS
Andrew added a comment to T297022: Reusing VM names appears to break initial puppet run.

The problem is not the race condition, but most likely a leftover cert in the master (since 15th Jun):

root@cloud-puppetmaster-03:~# find /var/lib/puppet/server/ssl/ca/signed -maxdepth 1 -iname wmcz-stats-test02.wmcz-stats.eqiad1.wikimedia.cloud\*  -printf "modification:%TY-%Tb-%Td-%TH:%TM\taccess:%AY-%Ab-%Ad-%AH:%AM\tstatus_change:%CY-%Cb-%Cd-%CH:%CM\t%P\n"
modification:2021-Jun-15-17:17  access:2021-Dec-03-11:18        status_change:2021-Jun-15-17:17 wmcz-stats-test02.wmcz-stats.eqiad1.wikimedia.cloud.pem
Dec 6 2021, 3:01 PM · Cloud-VPS
Andrew added a comment to T296963: Annual audit and purge of unused cloud-vps resources, 2021.

The year after the fourth item in the list is 2022, right? Not 2021.

Dec 6 2021, 2:54 PM · cloud-services-team (Kanban)
Andrew updated the task description for T296963: Annual audit and purge of unused cloud-vps resources, 2021.
Dec 6 2021, 2:53 PM · cloud-services-team (Kanban)

Dec 3 2021

Andrew renamed T266043: Openstack Designate: replace use of designate-sink with the designate/neutron integration API from Openstack Designate: replcae use of designate-sink with the designate/neutron integration API to Openstack Designate: replace use of designate-sink with the designate/neutron integration API.
Dec 3 2021, 7:41 PM · Cloud-VPS, cloud-services-team (Kanban)
Andrew added a comment to T297022: Reusing VM names appears to break initial puppet run.

There are race conditions in creation/deletion of DNS records for VMs. They will be somewhat relieved when T266043 is done; until then the only option is to wait a few minutes before recreating a VM with the same name.

Dec 3 2021, 7:28 PM · Cloud-VPS
Andrew moved T269380: rename/delete/something duplicate accounts for Muhammad Usman from Clinic Duty to Graveyard on the cloud-services-team (Kanban) board.
Dec 3 2021, 7:25 PM · LDAP, cloud-services-team (Kanban)
Andrew created T297026: Automate maintain-views workflow.
Dec 3 2021, 7:19 PM · Data-Services, cloud-services-team (Kanban)
Andrew reassigned T294652: user_properties_anon view not being created/maintained consistently on wikireplicas due to lack of meta_p in all sections from bd808 to AntiCompositeNumber.

I applied this and ran maintain-views everywhere; would love it if someone else confirms this did what was hoped for.

Dec 3 2021, 7:19 PM · User-bd808, Data-Engineering, cloud-services-team (Kanban), Data-Services
Andrew closed T292418: Prepare and check storage layer for pwnwiki, a subtask of T292415: Create Wikipedia Paiwan, as Resolved.
Dec 3 2021, 7:17 PM · Patch-For-Review, MW-1.38-notes (1.38.0-wmf.7; 2021-11-02), User-Urbanecm, Wiki-Setup (Create)
Andrew closed T292418: Prepare and check storage layer for pwnwiki as Resolved.

This should be present in the replicas now, although I wouldn't mind someone else double-checking!

Dec 3 2021, 7:17 PM · cloud-services-team (Kanban), Data-Services, DBA
Andrew closed T292420: Prepare and check storage layer for amiwiki, a subtask of T292414: Create Wikipedia Amis, as Resolved.
Dec 3 2021, 7:17 PM · Patch-For-Review, MW-1.38-notes (1.38.0-wmf.3; 2021-10-05), User-Urbanecm, Wiki-Setup (Create)
Andrew closed T292420: Prepare and check storage layer for amiwiki as Resolved.

This should be present in the replicas now, although I wouldn't mind someone else double-checking!

Dec 3 2021, 7:17 PM · cloud-services-team (Kanban), Data-Services, DBA
Andrew closed T291404: Prepare and check storage layer for lmowiktionary, a subtask of T291390: Create Wiktionary Lombard, as Resolved.
Dec 3 2021, 7:17 PM · Patch-For-Review, MW-1.38-notes (1.38.0-wmf.3; 2021-10-05), User-Urbanecm, Wiki-Setup (Create)
Andrew closed T291404: Prepare and check storage layer for lmowiktionary as Resolved.

This should be present in the replicas now, although I wouldn't mind someone else double-checking!

Dec 3 2021, 7:16 PM · cloud-services-team (Kanban), Data-Services, DBA
Andrew updated subscribers of T296906: reimage/pxe boot failing on cloudvirt1028.

@Dzahn points out that it could be that dhcp is working but preseed is failing

Dec 3 2021, 6:31 PM · SRE, ops-eqiad, cloud-services-team (Kanban)
Andrew updated subscribers of T296906: reimage/pxe boot failing on cloudvirt1028.

@RobH @wiki_willy sounds like this is back in your court. Cathal and I are out of ideas :(

Dec 3 2021, 5:15 PM · SRE, ops-eqiad, cloud-services-team (Kanban)
Andrew assigned T296963: Annual audit and purge of unused cloud-vps resources, 2021 to komla.
Dec 3 2021, 3:19 PM · cloud-services-team (Kanban)