Page MenuHomePhabricator

Andrew (Andrew Bogott)
User

Projects (11)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2014, 11:35 PM (276 w, 1 d)
Availability
Available
IRC Nick
andrewbogott
LDAP User
Unknown
MediaWiki User
Andrewbogott [ Global Accounts ]

Recent Activity

Fri, Feb 14

Andrew added a comment to T243226: Upgrade puppet in deployment-prep (Puppet agent broken in Beta Cluster).

That last issue (the resolution failure) was a side-effect of work I was doing for T229441. That issue is resolved, but now the failure us

Fri, Feb 14, 5:47 PM · Operations, Beta-Cluster-Infrastructure

Thu, Feb 13

Andrew added a comment to T245174: CloudVPS: automatically create per-project subdomain.

I'm not sure this is necessary. I'm currently experimenting with creating a recordset named '<project>.<instance>' in the codfw1dev.wikimedia.cloud zone (which is owned by cloudinfra-codfw1dev) and it works fine. Since it will only ever be sink acting on the wikimedia.cloud zones, probably that's sufficient.

Thu, Feb 13, 11:51 PM · cloud-services-team (Kanban)
Andrew assigned T244222: CloudVPS: hiera refactor to jbond.

The patch to re-organize is:

Thu, Feb 13, 8:13 PM · Patch-For-Review, Epic, cloud-services-team (Kanban)
Andrew merged T244933: Re-organize hiera lookups for cloud-vps instances into T244222: CloudVPS: hiera refactor.
Thu, Feb 13, 8:12 PM · Patch-For-Review, Epic, cloud-services-team (Kanban)
Andrew merged task T244933: Re-organize hiera lookups for cloud-vps instances into T244222: CloudVPS: hiera refactor.
Thu, Feb 13, 8:12 PM · Epic, cloud-services-team (Kanban)
Andrew added a comment to T243536: cloudvirt1022 memory errors causing host to crash.

This server is now drained and ready for whatever.

Thu, Feb 13, 7:45 PM · DC-Ops, ops-eqiad, cloud-services-team (Hardware), Operations
Andrew changed the status of T181375: Revamp first boot process for new VMs from Open to Stalled.

I think this is ready to go -- we can switch to upstream images once we have Cinder working.

Thu, Feb 13, 5:20 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew added a comment to T208405: Check whether huggle project requires NFS or not.

@Petrb I'm going to remove the dumps and scratch mounts now, which shouldn't affect you. If at some point in the future you want to move to local storage just ping on this ticket and I can clean up the other mounts.

Thu, Feb 13, 3:02 PM · Patch-For-Review, cloud-services-team (Kanban), Huggle, Cloud-VPS
Andrew updated subscribers of T208410: Check whether osmit project requires NFS or not.

@Nemo_bis, @CristianCantoro, @akosiaris et. al, can you confirm whether or not your project makes use of the Dumps mount on your VMs? If you do that's fine, we're just trying to clean up unused mounts.

Thu, Feb 13, 3:00 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew added a comment to T208414: Check whether utrs project requires NFS or not.

@DeltaQuad If you have things in the NFS home mount then it's probably fine to keep it; can you respond as to which of the other mounts (dumps/project/scratch) you are using? This will let me clean up unused mounts.

Thu, Feb 13, 2:58 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew added a comment to T208403: Check whether etytree project requires NFS or not.

@Epantaleo Your project is mounting the 'dumps' NFS volume under /mnt/nfs/ -- we're just wondering if that's something that you use or if it can be cleaned up. Either way is fine.

Thu, Feb 13, 2:56 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew added a comment to T208404: Check whether fastcci project requires NFS or not.

@dschwen, can you respond as to which of (dumps, scratch, home, project) you're using in this project? Any of them is fine, we just want to clean up unused mounts. Thanks.

Thu, Feb 13, 2:54 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew closed T208408: Check whether math project requires NFS or not, a subtask of T102240: Audit projects' use of NFS, and remove it where not necessary, as Resolved.
Thu, Feb 13, 5:09 AM · cloud-services-team (Kanban), Wikimedia-Incident, Labs-Sprint-106, Labs-Sprint-105, Labs-Sprint-104, Incident-20150617-LabsNFSOutage, Labs-Sprint-103, Labs-Sprint-102, Cloud-Services
Andrew closed T208408: Check whether math project requires NFS or not as Resolved.

ok! Thanks for confirming.

Thu, Feb 13, 5:09 AM · cloud-services-team (Kanban), Math, Cloud-VPS
Andrew added a comment to T208402: Check whether dumps project requires NFS or not.

Your description sounds like a pretty good use case for 'scratch' -- is that what you're using now, or are you doing your work in /data/project? (It may be that scratch is too slow for this purpose, but it might be worth a try.)

Thu, Feb 13, 5:06 AM · cloud-services-team (Kanban), Cloud-VPS
Andrew closed T208417: Check whether wikidata-primary-source-tool project requires NFS or not, a subtask of T102240: Audit projects' use of NFS, and remove it where not necessary, as Resolved.
Thu, Feb 13, 4:58 AM · cloud-services-team (Kanban), Wikimedia-Incident, Labs-Sprint-106, Labs-Sprint-105, Labs-Sprint-104, Incident-20150617-LabsNFSOutage, Labs-Sprint-103, Labs-Sprint-102, Cloud-Services
Andrew closed T208417: Check whether wikidata-primary-source-tool project requires NFS or not as Resolved.

thanks, done!

Thu, Feb 13, 4:58 AM · Patch-For-Review, cloud-services-team (Kanban), Wikidata, Cloud-VPS

Tue, Feb 11

Andrew updated subscribers of T208408: Check whether math project requires NFS or not.

Hello Math project users! Can someone please respond here about your use of NFS (project, scratch, and dumps) and indicate if it's practical to stop using any of those three mounts?

Tue, Feb 11, 10:22 PM · cloud-services-team (Kanban), Math, Cloud-VPS
Andrew closed T208412: Check whether testlabs project requires NFS or not, a subtask of T102240: Audit projects' use of NFS, and remove it where not necessary, as Resolved.
Tue, Feb 11, 10:17 PM · cloud-services-team (Kanban), Wikimedia-Incident, Labs-Sprint-106, Labs-Sprint-105, Labs-Sprint-104, Incident-20150617-LabsNFSOutage, Labs-Sprint-103, Labs-Sprint-102, Cloud-Services
Andrew closed T208412: Check whether testlabs project requires NFS or not as Resolved.

It's useful to have mounted -- one more thing to see break during VM creation.

Tue, Feb 11, 10:17 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew closed T208415: Check whether wikidumpparse project requires NFS or not, a subtask of T102240: Audit projects' use of NFS, and remove it where not necessary, as Resolved.
Tue, Feb 11, 10:16 PM · cloud-services-team (Kanban), Wikimedia-Incident, Labs-Sprint-106, Labs-Sprint-105, Labs-Sprint-104, Incident-20150617-LabsNFSOutage, Labs-Sprint-103, Labs-Sprint-102, Cloud-Services
Andrew closed T208415: Check whether wikidumpparse project requires NFS or not as Resolved.

@Maximilianklein using NFS is fine, we just wanted to eliminate the cases where it was mounted but unused.

Tue, Feb 11, 10:16 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew added a comment to T208416: Check whether wikidata-dev project requires NFS or not.

@Addshore, can you please follow up on this? I see @https://phabricator.wikimedia.org/p/LucasWerkmeister/ creating a fair number of files in /home but not much in /data/project.

Tue, Feb 11, 9:58 PM · Wikidata-Campsite, cloud-services-team (Kanban), wikidata-tech-focus, Wikidata, Cloud-VPS
Andrew updated subscribers of T208417: Check whether wikidata-primary-source-tool project requires NFS or not.

@Hjfocs can you comment on whether or not the wikidata-primary-source-tool makes use of the nfs 'dumps' mount that's provided? We're trying to eliminate unnecessary mounts.

Tue, Feb 11, 9:54 PM · Patch-For-Review, cloud-services-team (Kanban), Wikidata, Cloud-VPS
Andrew added a comment to T181375: Revamp first boot process for new VMs.

there are (at least) two remaining things here:

Tue, Feb 11, 9:46 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T241045: cloudmetrics: archive-instances using deprecated keystone lib, a subtask of T229920: WMCS: migrate python2 scripts to python3, as Resolved.
Tue, Feb 11, 9:41 PM · Epic, cloud-services-team (Kanban)
Andrew closed T241045: cloudmetrics: archive-instances using deprecated keystone lib as Resolved.
Tue, Feb 11, 9:41 PM · cloud-services-team (Kanban)
Andrew merged task T210513: Unable to set parameters to types other than String in horizon puppet parameters into T243422: Horizon hiera UI: investigate data type handling.
Tue, Feb 11, 9:40 PM · cloud-services-team (Kanban), Horizon
Andrew merged T210513: Unable to set parameters to types other than String in horizon puppet parameters into T243422: Horizon hiera UI: investigate data type handling.
Tue, Feb 11, 9:40 PM · Release-Engineering-Team (Other / Uncategorized), Release-Engineering-Team-TODO, User-Joe, Beta-Cluster-Infrastructure, Cloud-Services, Puppet
Andrew assigned T244933: Re-organize hiera lookups for cloud-vps instances to jbond.

The patch to re-organize is:

Tue, Feb 11, 9:28 PM · Epic, cloud-services-team (Kanban)
Andrew created T244933: Re-organize hiera lookups for cloud-vps instances.
Tue, Feb 11, 9:24 PM · Epic, cloud-services-team (Kanban)
Andrew claimed T208417: Check whether wikidata-primary-source-tool project requires NFS or not.
Tue, Feb 11, 5:12 PM · Patch-For-Review, cloud-services-team (Kanban), Wikidata, Cloud-VPS
Andrew claimed T208410: Check whether osmit project requires NFS or not.
Tue, Feb 11, 5:12 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew claimed T208404: Check whether fastcci project requires NFS or not.
Tue, Feb 11, 5:12 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew claimed T208403: Check whether etytree project requires NFS or not.
Tue, Feb 11, 5:12 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew claimed T208408: Check whether math project requires NFS or not.
Tue, Feb 11, 5:12 PM · cloud-services-team (Kanban), Math, Cloud-VPS
Andrew claimed T208412: Check whether testlabs project requires NFS or not.
Tue, Feb 11, 5:12 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew claimed T208402: Check whether dumps project requires NFS or not.
Tue, Feb 11, 5:11 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew claimed T208406: Check whether maps project requires NFS or not.
Tue, Feb 11, 5:11 PM · cloud-services-team (Kanban), Maps, Cloud-VPS
Andrew claimed T208414: Check whether utrs project requires NFS or not.
Tue, Feb 11, 5:11 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew claimed T208405: Check whether huggle project requires NFS or not.
Tue, Feb 11, 5:11 PM · Patch-For-Review, cloud-services-team (Kanban), Huggle, Cloud-VPS
Andrew claimed T208415: Check whether wikidumpparse project requires NFS or not.
Tue, Feb 11, 5:11 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew claimed T208416: Check whether wikidata-dev project requires NFS or not.
Tue, Feb 11, 5:11 PM · Wikidata-Campsite, cloud-services-team (Kanban), wikidata-tech-focus, Wikidata, Cloud-VPS

Fri, Feb 7

Andrew closed T243422: Horizon hiera UI: investigate data type handling, a subtask of T161675: Re-think puppet management for deployment-prep, as Resolved.
Fri, Feb 7, 4:18 PM · Release-Engineering-Team (Other / Uncategorized), Release-Engineering-Team-TODO, User-Joe, Beta-Cluster-Infrastructure, Cloud-Services, Puppet
Andrew closed T243422: Horizon hiera UI: investigate data type handling as Resolved.

This is quite a bit better now.

Fri, Feb 7, 4:18 PM · Release-Engineering-Team (Other / Uncategorized), Release-Engineering-Team-TODO, User-Joe, Beta-Cluster-Infrastructure, Cloud-Services, Puppet

Thu, Feb 6

Andrew added a comment to T243422: Horizon hiera UI: investigate data type handling.

This is happening because yaml.safe_dump() (and yaml.dump()) does some weird arbitrary quoting of things:

Thu, Feb 6, 10:03 PM · Release-Engineering-Team (Other / Uncategorized), Release-Engineering-Team-TODO, User-Joe, Beta-Cluster-Infrastructure, Cloud-Services, Puppet
Andrew created P10326 pyyaml mystery.
Thu, Feb 6, 9:48 PM
Andrew added a comment to T243422: Horizon hiera UI: investigate data type handling.

I've confirmed that the behavior with the yaml-based UI is correct. For the guided interface, strings are unquoted and non-string types (numbers, booleans, etc.) are quoted. weird.

Thu, Feb 6, 8:55 PM · Release-Engineering-Team (Other / Uncategorized), Release-Engineering-Team-TODO, User-Joe, Beta-Cluster-Infrastructure, Cloud-Services, Puppet

Wed, Feb 5

Andrew added a comment to T244277: Request creation of Wikidata Realtime Dumps VPS project.

Hello! Can you give us a bit of info about what resources you expect to use? Ram, Cores, Disk space? Also, are you hoping to offload storage onto NFS or NFS/scratch? (If the latter, you may be disappointed by performance)

Wed, Feb 5, 4:20 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Tue, Feb 4

Andrew updated the task description for T216195: Move cloudvirt hosts to 10Gb ethernet.
Tue, Feb 4, 5:41 PM · ops-eqiad, DC-Ops, Operations, Epic, cloud-services-team (Kanban)
Andrew changed the status of T243414: relocate/reimage cloudvirt1013 with 10G interfaces, a subtask of T216195: Move cloudvirt hosts to 10Gb ethernet, from Open to Stalled.
Tue, Feb 4, 5:41 PM · ops-eqiad, DC-Ops, Operations, Epic, cloud-services-team (Kanban)
Andrew changed the status of T243414: relocate/reimage cloudvirt1013 with 10G interfaces from Open to Stalled.

Ah, dammit, dc-ops missed this ticket and now 1013 is back in service on 1G. So it's no longer a good time to do this, there's real workload on that host.

Tue, Feb 4, 5:41 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations, Epic
Andrew added a comment to T244222: CloudVPS: hiera refactor.

As per https://phabricator.wikimedia.org/T171289, getting $project (and, possibly $deployment) from a fact may be problematic.

Tue, Feb 4, 4:21 PM · Patch-For-Review, Epic, cloud-services-team (Kanban)
Andrew added a comment to T244209: Upgrade and restart m5 master (db1133).
  • email sent to wikitech-l and cloud-announce
Tue, Feb 4, 2:40 PM · cloud-services-team, wikitech.wikimedia.org, DBA, Operations
Andrew added a comment to T244209: Upgrade and restart m5 master (db1133).

I'll do it now.

Tue, Feb 4, 2:33 PM · cloud-services-team, wikitech.wikimedia.org, DBA, Operations
Andrew added a comment to T244209: Upgrade and restart m5 master (db1133).

Thank you!
What about Monday 10th at 15:00 UTC?

Tue, Feb 4, 2:27 PM · cloud-services-team, wikitech.wikimedia.org, DBA, Operations
Andrew added a comment to T244209: Upgrade and restart m5 master (db1133).

For a few seconds interruption I wouldn't expect this to be very disruptive. If you schedule it in my morning (e.g. 15:00 UTC) then I can send out notice to users &c. and be around in case unexpected things happen.

Tue, Feb 4, 2:19 PM · cloud-services-team, wikitech.wikimedia.org, DBA, Operations

Mon, Feb 3

Andrew closed T243355: puppet panel: Can't add new prefixes as Resolved.
Mon, Feb 3, 10:19 PM · Horizon
Andrew added a comment to T243355: puppet panel: Can't add new prefixes.

Yep, I can reproduce this and it's awful.

Mon, Feb 3, 7:30 PM · Horizon

Tue, Jan 28

Andrew closed T243240: Reset of two factor on Wikitech for User:Tpt as Resolved.

@Tpt sorry for the delay in responding to this -- lots of team travel lately. I've reset 2fa on wikitech for you so you should be all set now.

Tue, Jan 28, 2:44 PM · Trust-and-Safety, Cloud-Services

Fri, Jan 24

Andrew added a comment to T243556: Fix internal TLD in use in codfw1dev.

I've re-read the code a bit, and it's not obvious to me that we need to specify a tenant to sink (outside of the implicit tenant associated with the zone id from the config). So for starters let's just try swapping in a tenant-owned zone and see if everything just works.

Fri, Jan 24, 6:12 PM · Cloud-VPS, cloud-services-team (Kanban)

Wed, Jan 22

Andrew claimed T243422: Horizon hiera UI: investigate data type handling.
Wed, Jan 22, 3:47 PM · Release-Engineering-Team (Other / Uncategorized), Release-Engineering-Team-TODO, User-Joe, Beta-Cluster-Infrastructure, Cloud-Services, Puppet
Andrew created T243422: Horizon hiera UI: investigate data type handling.
Wed, Jan 22, 3:46 PM · Release-Engineering-Team (Other / Uncategorized), Release-Engineering-Team-TODO, User-Joe, Beta-Cluster-Infrastructure, Cloud-Services, Puppet
Andrew added a comment to T243418: Convert keystone from uuid tokens to fernet tokens.

also possibly: https://www.lbragstad.com/blog/migrating-token-formats-without-downtime

Wed, Jan 22, 2:42 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew created T243418: Convert keystone from uuid tokens to fernet tokens.
Wed, Jan 22, 2:42 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew reassigned T243414: relocate/reimage cloudvirt1013 with 10G interfaces from Andrew to Cmjohnson.
Wed, Jan 22, 2:30 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations, Epic
Andrew updated the task description for T216195: Move cloudvirt hosts to 10Gb ethernet.
Wed, Jan 22, 2:00 PM · ops-eqiad, DC-Ops, Operations, Epic, cloud-services-team (Kanban)
Andrew created T243414: relocate/reimage cloudvirt1013 with 10G interfaces.
Wed, Jan 22, 1:58 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations, Epic

Tue, Jan 21

Andrew committed rLPRI983d5c4a87f3: labtest-instances: change proxyuser password to be the same as eqiad1 (authored by Andrew).
labtest-instances: change proxyuser password to be the same as eqiad1
Tue, Jan 21, 9:07 PM
Andrew updated the task description for T224549: Track remaining jessie systems in production.
Tue, Jan 21, 7:47 PM · Operations
Andrew added a comment to T243329: decommission labstore2001.codfw.wmnet and labstore2002.codfw.wmnet.

@Papaul, please decom associated disk shelves when you pull these servers. Thank you!

Tue, Jan 21, 7:46 PM · ops-codfw, Operations, DC-Ops, decommission
Andrew assigned T243329: decommission labstore2001.codfw.wmnet and labstore2002.codfw.wmnet to Papaul.
Tue, Jan 21, 7:46 PM · ops-codfw, Operations, DC-Ops, decommission
Andrew added a parent task for T243329: decommission labstore2001.codfw.wmnet and labstore2002.codfw.wmnet: T214835: labstore1001,1002,2001,2002: status clarification.
Tue, Jan 21, 7:45 PM · ops-codfw, Operations, DC-Ops, decommission
Andrew added a subtask for T214835: labstore1001,1002,2001,2002: status clarification: T243329: decommission labstore2001.codfw.wmnet and labstore2002.codfw.wmnet.
Tue, Jan 21, 7:45 PM · DC-Ops, cloud-services-team (Kanban)
Andrew updated the task description for T243329: decommission labstore2001.codfw.wmnet and labstore2002.codfw.wmnet.
Tue, Jan 21, 7:39 PM · ops-codfw, Operations, DC-Ops, decommission
Andrew created T243329: decommission labstore2001.codfw.wmnet and labstore2002.codfw.wmnet.
Tue, Jan 21, 7:35 PM · ops-codfw, Operations, DC-Ops, decommission
Andrew updated the task description for T224549: Track remaining jessie systems in production.
Tue, Jan 21, 7:32 PM · Operations
Andrew closed T242332: CloudVPS: wrong nova quota usage for a project detected as Resolved.

"This command group will be removed in 17.0.0 (Queens). The quota_usage_refresh subcommand has been deprecated and is now a no-op since quota usage is counted from resources instead of being tracked separately."

Tue, Jan 21, 7:31 PM · cloud-services-team (Kanban)
Andrew closed T239884: Replace labstore2003/2004 with cloudbackup2001/2002 as Resolved.

The decom task is now in Papaul's hands; everything else is done.

Tue, Jan 21, 7:08 PM · cloud-services-team (Kanban)
Andrew added a comment to T243319: decommission labstore2003.codfw.wmnet and labstore2004.codfw.wmnet.

@Papaul, I'm not positive but I think these servers have disk shelves attached. If so those shelves can also be decom'd at the same time.

Tue, Jan 21, 7:07 PM · ops-codfw, cloud-services-team (Hardware), Operations, DC-Ops, decommission
Andrew reassigned T243319: decommission labstore2003.codfw.wmnet and labstore2004.codfw.wmnet from Andrew to Papaul.
Tue, Jan 21, 7:07 PM · ops-codfw, cloud-services-team (Hardware), Operations, DC-Ops, decommission
Andrew updated the task description for T243319: decommission labstore2003.codfw.wmnet and labstore2004.codfw.wmnet.
Tue, Jan 21, 7:01 PM · ops-codfw, cloud-services-team (Hardware), Operations, DC-Ops, decommission
Andrew updated the task description for T243319: decommission labstore2003.codfw.wmnet and labstore2004.codfw.wmnet.
Tue, Jan 21, 6:56 PM · ops-codfw, cloud-services-team (Hardware), Operations, DC-Ops, decommission
Andrew added a parent task for T243319: decommission labstore2003.codfw.wmnet and labstore2004.codfw.wmnet: T239884: Replace labstore2003/2004 with cloudbackup2001/2002.
Tue, Jan 21, 6:47 PM · ops-codfw, cloud-services-team (Hardware), Operations, DC-Ops, decommission
Andrew added a subtask for T239884: Replace labstore2003/2004 with cloudbackup2001/2002: T243319: decommission labstore2003.codfw.wmnet and labstore2004.codfw.wmnet.
Tue, Jan 21, 6:47 PM · cloud-services-team (Kanban)
Andrew created T243319: decommission labstore2003.codfw.wmnet and labstore2004.codfw.wmnet.
Tue, Jan 21, 6:47 PM · ops-codfw, cloud-services-team (Hardware), Operations, DC-Ops, decommission
bd808 awarded T241347: upgrade cloud-vps openstack to Openstack version 'Pike' a Party Time token.
Tue, Jan 21, 4:56 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew closed T241347: upgrade cloud-vps openstack to Openstack version 'Pike' as Resolved.
Tue, Jan 21, 4:56 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew closed T241868: /usr/local/sbin/make-instance-vg broken on buster as Resolved.

Added a clearer error handler

Tue, Jan 21, 4:54 PM · cloud-services-team (Kanban), Cloud-VPS

Sun, Jan 19

Andrew added a comment to T243161: Unblock port 22 on commons-corruption-checker-main.commons-corruption-checker.

ran "sudo ufw allow ssh" on the instance

Sun, Jan 19, 10:47 PM · Cloud-VPS

Jan 17 2020

Andrew closed T238766: openstack: dns_floating_ip_updater mechanism improvements to better handle transient errors as Resolved.

With merged patches, this is still activated by a systemd timer, but has a retry loop. I think that gets us what we want: we'll get the same alert as before, but only if the script fails three times in a row over several minutes.

Jan 17 2020, 4:24 AM · Cloud-Services, cloud-services-team (Kanban)
Andrew added a comment to T236526: "design" Cloud VPS project jessie deprecation.

Note that this project is slated for deletion. I failed to notify the admins (or shutdown VMs) in a timely fashion so the deletion is scheduled for 2020-02-15. It's quite likely that this problem will go away then.

Jan 17 2020, 4:17 AM · Cloud-VPS (Debian Jessie Deprecation)

Jan 16 2020

Andrew created T242976: public dns for codfw1dev vms.
Jan 16 2020, 3:05 PM · Epic, cloud-services-team (Kanban)

Jan 15 2020

Andrew placed T236547: "shinken" Cloud VPS project jessie deprecation up for grabs.
Jan 15 2020, 2:01 PM · Cloud-VPS (Debian Jessie Deprecation)
Andrew added a comment to T239884: Replace labstore2003/2004 with cloudbackup2001/2002.

@Bstorm I'm still hoping for your confirmation that this is all working so we can shut down the old servers.

Jan 15 2020, 6:35 AM · cloud-services-team (Kanban)
Andrew added a comment to T242332: CloudVPS: wrong nova quota usage for a project detected.

This looks better to me now:

Jan 15 2020, 6:34 AM · cloud-services-team (Kanban)
Andrew placed T240969: shinken: all puppet reports showing as 'unknown' because data moved to Prometheus up for grabs.

The right fix for this is to build a new monitoring system which I'm not going to dive into immediately

Jan 15 2020, 4:44 AM · cloud-services-team (Kanban)

Jan 14 2020

Andrew moved T218139: Develop or expand grid troubleshooting playbook from Important to Inbox on the cloud-services-team (Kanban) board.
Jan 14 2020, 5:55 PM · cloud-services-team (Kanban), Toolforge
Andrew moved T235708: Redesign for wmcs custom puppet settings from Important to Inbox on the cloud-services-team (Kanban) board.
Jan 14 2020, 5:50 PM · MW-1.35-notes (1.35.0-wmf.5; 2019-11-05), cloud-services-team (Kanban), Cloud-VPS
Andrew moved T206793: Look into spots where we can surface user statistics from toolforge and VPS from Important to Inbox on the cloud-services-team (Kanban) board.
Jan 14 2020, 5:49 PM · cloud-services-team (Kanban)