Page MenuHomePhabricator

Andrew (Andrew Bogott)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2014, 11:35 PM (421 w, 20 h)
Availability
Available
IRC Nick
andrewbogott
LDAP User
Unknown
MediaWiki User
Andrewbogott [ Global Accounts ]

Recent Activity

Today

Andrew added a comment to T323159: Shut down osmdb.eqiad.wmnet (clouddb100[3-4])?.

Thanks for doing this research! I guess the remaining question is if all of this traffic can be moved to the in-project database running on maps-osmdb. @dschwen, is it your understanding that the new DB (that you built) contains the same data as this database?

Mon, Nov 28, 6:15 PM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)
Andrew added a comment to T276018: Investigate new roles and policies in openstack Xena.

https://docs.openstack.org/nova/latest/configuration/policy-concepts.html

Mon, Nov 28, 5:54 PM · cloud-services-team (Kanban), Cloud-VPS

Fri, Nov 18

Andrew added a comment to T323387: Request creation of netops-clab VPS project.

+1

Fri, Nov 18, 3:58 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Andrew added a comment to T306200: CRITICAL: Status of the systemd unit backup_cinder_volumes.

I just ran an incremental backup of the maps cinder volume on the CLI so I could see how long it's really taking... it timed out after 48 hours. Our current timeout for scripted backups is 30 hours, so probably part of what's been happening is that the initial full backup of that volume works but then any subsequent (incremental) backup fails.

Fri, Nov 18, 3:48 PM · cloud-services-team (Kanban)
Andrew added a comment to T322755: shut down cloud-vps 'maps' project.

Thank you @TheDJ! I can delete that VM and clean up other pieces sometime soon.

Fri, Nov 18, 3:38 PM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)

Thu, Nov 17

Andrew created T323319: OpenStack deprecation hunt.
Thu, Nov 17, 9:51 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T312557: Some WMCS clusters have inconsistent AAAA DNS records for the primary IPv6 of the hosts as Resolved.
Thu, Nov 17, 9:43 PM · Cloud-VPS, cloud-services-team (Kanban), IPv6
Andrew closed T312557: Some WMCS clusters have inconsistent AAAA DNS records for the primary IPv6 of the hosts, a subtask of T253173: Some clusters do not have DNS for IPv6 addresses (TRACKING TASK), as Resolved.
Thu, Nov 17, 9:43 PM · Infrastructure-Foundations, IPv6, User-jbond, netbox
Andrew updated the task description for T312557: Some WMCS clusters have inconsistent AAAA DNS records for the primary IPv6 of the hosts.
Thu, Nov 17, 9:43 PM · Cloud-VPS, cloud-services-team (Kanban), IPv6
Andrew updated the task description for T312557: Some WMCS clusters have inconsistent AAAA DNS records for the primary IPv6 of the hosts.
Thu, Nov 17, 9:23 PM · Cloud-VPS, cloud-services-team (Kanban), IPv6

Wed, Nov 16

Andrew added a comment to T297712: Migrate cloudmetrics workload from cloudmetrics100[1-2] to cloudmetrics100[3-4].

Hm, I merged https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/857763 thinking it was a puppet patch :/

Wed, Nov 16, 11:41 PM · cloud-services-team (Kanban), decommission-hardware
Andrew added a comment to T322865: dependency confusion with latest Bullseye cloud-vps base image.

You can see an example at newimage2.andrewtestproject.codfw1dev.wikimedia.cloud

Wed, Nov 16, 3:17 PM · cloud-services-team (Kanban)
Andrew added a comment to T322756: Decision request - WMCS Kanban board.

Option 3 was the historical norm during most of the Brooke era -- we had periodic meetings where we scrubbed the kanban board and rearranged tickets to reflect the actual present. We lost the habit (even before Brooke left) and I haven't started it up again, mostly because of it not feeling like a high priority. Those meetings were very boring but the system worked OK.

Wed, Nov 16, 3:01 PM · cloud-services-team (Kanban), Cloud Services Proposals
Andrew reassigned T297444: decommission cloudmetrics100[1-2].eqiad.wmnet from Andrew to Jclark-ctr.
Wed, Nov 16, 12:06 AM · SRE, ops-eqiad, cloud-services-team (Hardware), decommission-hardware

Tue, Nov 15

Andrew closed T297712: Migrate cloudmetrics workload from cloudmetrics100[1-2] to cloudmetrics100[3-4] as Resolved.
Tue, Nov 15, 11:53 PM · cloud-services-team (Kanban), decommission-hardware
Andrew closed T297712: Migrate cloudmetrics workload from cloudmetrics100[1-2] to cloudmetrics100[3-4], a subtask of T297444: decommission cloudmetrics100[1-2].eqiad.wmnet, as Resolved.
Tue, Nov 15, 11:52 PM · SRE, ops-eqiad, cloud-services-team (Hardware), decommission-hardware
Andrew closed T319217: decommission labstore100[67].wikimedia.org as Resolved.

No longer.

Tue, Nov 15, 11:50 PM · SRE, ops-eqiad, Patch-For-Review, decommission-hardware, cloud-services-team (Kanban)
Andrew closed T319217: decommission labstore100[67].wikimedia.org, a subtask of T309346: Replace labstore100[67] with clouddumps100[12], as Resolved.
Tue, Nov 15, 11:49 PM · Patch-For-Review, cloud-services-team (Kanban), Infrastructure-Foundations, SRE
Andrew added a comment to T322865: dependency confusion with latest Bullseye cloud-vps base image.

I just checked the prior release (from 20 Oct.) and sources.list looked the same.

Tue, Nov 15, 11:48 PM · cloud-services-team (Kanban)
Andrew closed T321731: wmcs-image-create no longer works as Invalid.

I've used this recently without any trouble. Closing until/unless I can find a real problem.

Tue, Nov 15, 11:32 PM · User-dcaro, cloud-services-team (Kanban)
Andrew renamed T323159: Shut down osmdb.eqiad.wmnet (clouddb100[3-4])? from Shut down osmdb.eqiad.wmnet to Shut down osmdb.eqiad.wmnet?.
Tue, Nov 15, 8:09 PM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)
Andrew added a comment to T322755: shut down cloud-vps 'maps' project.

Thank you for the breakdown, @TheDJ! A few questions below...

Tue, Nov 15, 8:08 PM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)
Andrew created T323159: Shut down osmdb.eqiad.wmnet (clouddb100[3-4])?.
Tue, Nov 15, 8:05 PM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)
Andrew added a comment to T322755: shut down cloud-vps 'maps' project.

I want to provide a little bit of context for this task: WMCS staff has many dozens of projects to think about (currently 175) so it's unlikely that we know much about what's happening within a project, or if that project actually contains unrelated subprojects.

Tue, Nov 15, 8:00 PM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)
Andrew added a comment to T301949: ToolsDB upgrade => Bullseye, MariaDB 10.4.

This is promising :)

Tue, Nov 15, 7:42 PM · Patch-For-Review, Data-Persistence (work done), Cloud-VPS (Debian Stretch Deprecation), cloud-services-team (Kanban), Toolforge, Data-Services
Andrew updated the task description for T323086: upgrade cloud-vps openstack to Openstack version 'Zed'.
Tue, Nov 15, 4:11 AM · cloud-services-team (Kanban)
Andrew updated the task description for T323087: The upgrade_openstack_node cookbook doesn't silence everything that needs silencing.
Tue, Nov 15, 3:14 AM · cloud-services-team (Kanban)
Andrew created T323087: The upgrade_openstack_node cookbook doesn't silence everything that needs silencing.
Tue, Nov 15, 3:13 AM · cloud-services-team (Kanban)
Andrew created T323086: upgrade cloud-vps openstack to Openstack version 'Zed'.
Tue, Nov 15, 3:09 AM · cloud-services-team (Kanban)
Andrew closed T305828: upgrade cloud-vps openstack to Openstack version 'Yoga' as Resolved.
Tue, Nov 15, 3:06 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew closed T322359: Upgrade Horizon to Openstack 'zed' as Resolved.
Tue, Nov 15, 3:03 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew closed T322359: Upgrade Horizon to Openstack 'zed', a subtask of T305828: upgrade cloud-vps openstack to Openstack version 'Yoga', as Resolved.
Tue, Nov 15, 3:02 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS

Fri, Nov 11

Susannaanas awarded T322755: shut down cloud-vps 'maps' project a Dislike token.
Fri, Nov 11, 8:14 AM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)
mrephabricator awarded T322755: shut down cloud-vps 'maps' project a Dislike token.
Fri, Nov 11, 3:23 AM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)

Thu, Nov 10

Andrew updated subscribers of T322865: dependency confusion with latest Bullseye cloud-vps base image.

@MoritzMuehlenhoff this might be of interest to you as well :)

Thu, Nov 10, 4:28 PM · cloud-services-team (Kanban)
Andrew triaged T322865: dependency confusion with latest Bullseye cloud-vps base image as Medium priority.
Thu, Nov 10, 4:24 PM · cloud-services-team (Kanban)
Andrew added a comment to T322865: dependency confusion with latest Bullseye cloud-vps base image.

Here is sources.list in the upstream image:

Thu, Nov 10, 4:21 PM · cloud-services-team (Kanban)
Andrew created T322865: dependency confusion with latest Bullseye cloud-vps base image.
Thu, Nov 10, 4:21 PM · cloud-services-team (Kanban)
Andrew reopened T319217: decommission labstore100[67].wikimedia.org as "Open".

Apparently cumin still thinks that labstore1007 exists.

Thu, Nov 10, 4:02 PM · SRE, ops-eqiad, Patch-For-Review, decommission-hardware, cloud-services-team (Kanban)
Andrew reopened T319217: decommission labstore100[67].wikimedia.org, a subtask of T309346: Replace labstore100[67] with clouddumps100[12], as Open.
Thu, Nov 10, 4:00 PM · Patch-For-Review, cloud-services-team (Kanban), Infrastructure-Foundations, SRE
Andrew added a comment to T322755: shut down cloud-vps 'maps' project.

OK! I think I misunderstood the nature of this project; I understood the other VMs there to be in service of the tileserver. I don't need to necessarily understand all the inner workings as long as you're happy to maintain the pieces that you lay claim to.

Thu, Nov 10, 5:50 AM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)
Andrew added a comment to T306073: Cloud VPS "fastcci" project Stretch deprecation.

Ok, I deleted that instance and I'm setting up a new one. Why are the other Bullseye instances also marked as "deprecated"?

Thu, Nov 10, 5:47 AM · Cloud-VPS (Debian Stretch Deprecation)

Wed, Nov 9

Andrew closed T302535: Move backy2 VM backups off of cloudvirts and on to cloudbackup100[34] as Resolved.
Wed, Nov 9, 9:15 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T302535: Move backy2 VM backups off of cloudvirts and on to cloudbackup100[34], a subtask of T293934: Q2:(Need By: TBD) rack/setup/install cloudbackup100[34], as Resolved.
Wed, Nov 9, 9:15 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
Andrew closed T321522: NeutronAgentDown cloudvirt1023 A Neutron agent is down, VMs will have connectivity issues as Resolved.
Wed, Nov 9, 9:15 PM · cloud-services-team (Kanban)
Andrew closed T319042: PXE boot failure on cloudvirt1023 as Resolved.
Wed, Nov 9, 9:14 PM · cloud-services-team (Kanban), SRE, ops-eqiad
Andrew closed T319042: PXE boot failure on cloudvirt1023, a subtask of T319001: Degraded RAID on cloudvirt1023, as Resolved.
Wed, Nov 9, 9:14 PM · cloud-services-team (Kanban), SRE, ops-eqiad
Andrew updated the task description for T187601: Examine replacing tiles.wmflabs.org with production tile server.
Wed, Nov 9, 4:54 PM · Maps, VPS-Projects
Andrew updated the task description for T322755: shut down cloud-vps 'maps' project.
Wed, Nov 9, 3:37 PM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)
Andrew created T322755: shut down cloud-vps 'maps' project.
Wed, Nov 9, 3:15 PM · User-Sandra_Fauconnier_WMSE, cloud-services-team (Kanban), Cloud-VPS (Debian Stretch Deprecation)
Andrew added a comment to T306073: Cloud VPS "fastcci" project Stretch deprecation.

The stretch instance 'fastcci-new-master.fastcci.eqiad1.wikimedia.cloud' has been offline (state 'error') for some time now. Does anyone want to speak up in it defense or shall I delete it?

Wed, Nov 9, 3:05 PM · Cloud-VPS (Debian Stretch Deprecation)
Andrew closed T322509: Request creation of ISA VPS project as Resolved.

I've created this project. If you wind up using toolforge instead, just reopen this ticket with a note and I'll clean up.

Wed, Nov 9, 2:43 PM · ISA, User-Sebastian_Berlin-WMSE, Cloud-VPS (Project-requests)
Andrew added a comment to T322688: Request creation of mastodon VPS project.

I don't immediately object to this, but I must not understand what 'closed' means in this context. How is an instance with no accounts useful?

Wed, Nov 9, 2:55 AM · Cloud-VPS (Project-requests)
Andrew added a comment to T322509: Request creation of ISA VPS project.

Just so I understand -- the monitoring is because this is a prototype of a project that you might want to scale up or move to dedicated hardware in the future? If so then cloud-vps might be the right choice, although toolforge would make your life much easier. The resources that you suggest initially don't seem necessarily out of line for toolforge.

Wed, Nov 9, 2:53 AM · ISA, User-Sebastian_Berlin-WMSE, Cloud-VPS (Project-requests)

Mon, Nov 7

Andrew added a comment to T319042: PXE boot failure on cloudvirt1023.

I would not advise moving any of the cloudvirts other than 1023, since they're all likely to be decom'd next year (if not sooner) regardless. We /can/ move 1023 but my preference would be to have it just work in place until its upcoming replacement. @ayounsi can you advise about if/when it will be possible to rebuild that server, and livt the curse from the other hosts in that rack?

Mon, Nov 7, 9:34 PM · cloud-services-team (Kanban), SRE, ops-eqiad
Andrew added a comment to T321798: Decision request - gitlab cloud repos merge strategy.

I am 100% fine with 'encourage'

Mon, Nov 7, 8:04 PM · cloud-services-team (Kanban), Cloud Services Proposals
Andrew added a comment to T321220: Openstack Magnum network setup.

I think magnum reports the k8s API as down because is not accessible from the internet (from the magnum API itself).

Mon, Nov 7, 7:42 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T320845: SystemdUnitDownForLong clouddumps1002:9100 Unit dumps-fetch-phabdumps.service on node clouddumps1002 has been down for long. as Resolved.
Mon, Nov 7, 3:08 AM · cloud-services-team (Kanban)
Andrew closed T320846: SystemdUnitDownForLong clouddumps1001:9100 Unit dumps-fetch-phabdumps.service on node clouddumps1001 has been down for long. as Resolved.
Mon, Nov 7, 3:08 AM · cloud-services-team (Kanban)

Fri, Nov 4

Andrew added a comment to T321839: toolforge NFS file cleanup.

I'm about to run these in labstore1004:/srv/tools/shared/tools:

Fri, Nov 4, 1:58 PM · cloud-services-team (Kanban)

Thu, Nov 3

Andrew created T322359: Upgrade Horizon to Openstack 'zed'.
Thu, Nov 3, 6:15 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew claimed T297444: decommission cloudmetrics100[1-2].eqiad.wmnet.
Thu, Nov 3, 3:20 PM · SRE, ops-eqiad, cloud-services-team (Hardware), decommission-hardware
Andrew added a comment to T322328: The service unit analytics-dumps-fetch-pageview_complete_dumps.service is in failed status on host clouddumps1002..
root@clouddumps1002:/usr# systemctl status analytics-dumps-fetch-pageview_complete_dumps.service
● analytics-dumps-fetch-pageview_complete_dumps.service - Copy pageview_complete_dumps files from Hadoop HDFS.
     Loaded: loaded (/lib/systemd/system/analytics-dumps-fetch-pageview_complete_dumps.service; static)
     Active: failed (Result: exit-code) since Thu 2022-11-03 05:00:40 UTC; 8h ago
TriggeredBy: ● analytics-dumps-fetch-pageview_complete_dumps.timer
    Process: 296219 ExecStart=/usr/local/bin/systemd-timer-mail-wrapper -T data-engineering-alerts@lists.wikimedia.org --only-on-error /usr/local/bin/kerberos-run-command dumpsgen>
   Main PID: 296219 (code=exited, status=1/FAILURE)
        CPU: 57.019s
Thu, Nov 3, 2:00 PM · Dumps-Generation, cloud-services-team (Kanban)
Andrew created T322328: The service unit analytics-dumps-fetch-pageview_complete_dumps.service is in failed status on host clouddumps1002..
Thu, Nov 3, 1:51 PM · Dumps-Generation, cloud-services-team (Kanban)

Wed, Nov 2

Andrew added a comment to T322221: The service unit dumps-fetch-phabdumps.service is in failed status on host clouddumps1002.

thanks!

Wed, Nov 2, 6:38 PM · Phabricator, serviceops-collab, Dumps-Generation, cloud-services-team (Kanban)
Andrew added a comment to T321798: Decision request - gitlab cloud repos merge strategy.

After a bit of discussion about how gitlab workflows will differ from gerrit workflows, I'm now leaning towards Squash + fast-forward, for the same reason that I mentioned above: it encourages users to think in terms of single feature patches or branches rather than just a free-flowing branch containing whatever. Of course there could also be best practices docs someplace to explain this.

Wed, Nov 2, 4:54 PM · cloud-services-team (Kanban), Cloud Services Proposals
Andrew updated the task description for T322221: The service unit dumps-fetch-phabdumps.service is in failed status on host clouddumps1002.
Wed, Nov 2, 2:45 PM · Phabricator, serviceops-collab, Dumps-Generation, cloud-services-team (Kanban)
Andrew created T322221: The service unit dumps-fetch-phabdumps.service is in failed status on host clouddumps1002.
Wed, Nov 2, 2:43 PM · Phabricator, serviceops-collab, Dumps-Generation, cloud-services-team (Kanban)
Andrew added a comment to T322149: The service unit dumps-fetch-media-contestwinners.service is in failed status on host clouddumps1002..

thx!

Wed, Nov 2, 2:38 PM · cloud-services-team (Kanban), Dumps-Generation
Andrew created T322219: Refactor tool deletion code for nfs-on-cinder.
Wed, Nov 2, 2:36 PM · cloud-services-team (Kanban)

Tue, Nov 1

valerio.bozzolan awarded T318191: Evaluate opening the readonly Wiki Replicas to the WAN (since we already have user authentication) a Baby Tequila token.
Tue, Nov 1, 9:56 PM · cloud-services-team (Kanban), Data-Services
Andrew closed T306075: Cloud VPS "gratitude" project Stretch deprecation as Resolved.

Thank you!

Tue, Nov 1, 4:52 PM · Cloud-VPS (Debian Stretch Deprecation)
Andrew reassigned T322149: The service unit dumps-fetch-media-contestwinners.service is in failed status on host clouddumps1002. from Andrew to jbond.

This is likely a consequence of:

Tue, Nov 1, 3:55 PM · cloud-services-team (Kanban), Dumps-Generation
Andrew added a comment to T322149: The service unit dumps-fetch-media-contestwinners.service is in failed status on host clouddumps1002..

/usr/bin/rsync -rt --chmod=go-w stat1007.eqiad.wmnet::srv/dumps/media/contest_winners/ /srv/dumps/xmldatadumps/public/other/media/contest_winners

Tue, Nov 1, 3:43 PM · cloud-services-team (Kanban), Dumps-Generation
Andrew created T322149: The service unit dumps-fetch-media-contestwinners.service is in failed status on host clouddumps1002..
Tue, Nov 1, 3:42 PM · cloud-services-team (Kanban), Dumps-Generation

Mon, Oct 31

Andrew updated the task description for T306068: Cloud VPS "deployment-prep" project Stretch deprecation.
Mon, Oct 31, 3:54 PM · Beta-Cluster-Infrastructure, Cloud-VPS (Debian Stretch Deprecation)
Andrew updated the task description for T306068: Cloud VPS "deployment-prep" project Stretch deprecation.
Mon, Oct 31, 3:53 PM · Beta-Cluster-Infrastructure, Cloud-VPS (Debian Stretch Deprecation)
Andrew added a comment to T306098: Cloud VPS "swift" project Stretch deprecation.

?

Mon, Oct 31, 3:28 PM · Cloud-VPS (Debian Stretch Deprecation)
Andrew added a comment to T306075: Cloud VPS "gratitude" project Stretch deprecation.

The Stretch VM in this project is still present, many months later. Can you please delete it if it's no longer in use?

Mon, Oct 31, 3:27 PM · Cloud-VPS (Debian Stretch Deprecation)
Andrew added a comment to T306066: Cloud VPS "cvn" project Stretch deprecation.

Can I get a progress update on this? I'm hoping to delete some more Stretch VMs tomorrow.

Mon, Oct 31, 3:26 PM · Cloud-VPS (Debian Stretch Deprecation)

Oct 28 2022

Andrew raised the priority of T288108: Figure out how to deal with security groups when rolling out metricsinfra scraping from Medium to High.

I am re-reading this task and think i should amend my 'little bit concerned' comment with "but monitoring sounds great so let's do it."

Oct 28 2022, 2:41 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew closed T318191: Evaluate opening the readonly Wiki Replicas to the WAN (since we already have user authentication) as Declined.

I agree that I'm not much worried about access to the data itself. Rather, I'm concerned about other security issues -- denial of service is the obvious one, but there's also the risk of any possible future exploits in our bespoke mariadb version.

Oct 28 2022, 2:35 PM · cloud-services-team (Kanban), Data-Services
Andrew added a comment to T321731: wmcs-image-create no longer works.

Pretty sure this should be close as 'invalid' but I will retest everything once the new bullseye build shows up on https://cloud.debian.org/images/cloud/bullseye/ (probably late next week).

Oct 28 2022, 2:32 PM · User-dcaro, cloud-services-team (Kanban)

Oct 27 2022

Andrew added a comment to T319217: decommission labstore100[67].wikimedia.org.

btw, I believe each of these servers is attached to an external disk shelf -- those shelves should also be decom'd.

Oct 27 2022, 11:04 PM · SRE, ops-eqiad, Patch-For-Review, decommission-hardware, cloud-services-team (Kanban)
Andrew reassigned T319217: decommission labstore100[67].wikimedia.org from Andrew to Jclark-ctr.
Oct 27 2022, 11:04 PM · SRE, ops-eqiad, Patch-For-Review, decommission-hardware, cloud-services-team (Kanban)
Andrew closed T309346: Replace labstore100[67] with clouddumps100[12], a subtask of T302981: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts, as Resolved.
Oct 27 2022, 10:37 PM · Patch-For-Review, Infrastructure-Foundations, SRE, ops-eqiad, netops, cloud-services-team (Hardware), DC-Ops
Andrew closed T309346: Replace labstore100[67] with clouddumps100[12] as Resolved.
Oct 27 2022, 10:37 PM · Patch-For-Review, cloud-services-team (Kanban), Infrastructure-Foundations, SRE
Andrew closed T316429: SystemdUnitDownForLong labstore1006:9100 Unit kiwix-mirror-update.service on node labstore1006 has been down for long. as Resolved.

this was due to upstream outages on kiwix hosts. That issue seems to be resolved now, and the alerts have cleared.

Oct 27 2022, 10:36 PM · cloud-services-team (Kanban)
Andrew closed T316445: SystemdUnitDownForLong labstore1007:9100 Unit kiwix-mirror-update.service on node labstore1007 has been down for long. as Resolved.

this was due to upstream outages on kiwix hosts. That issue seems to be resolved now, and the alerts have cleared.

Oct 27 2022, 10:35 PM · cloud-services-team (Kanban)
Andrew closed T316614: designate-sink lock ups (was: cloudcontrol1005/nova instance creation test is CRITICAL) as Invalid.

I haven't seen this happen for ages. Maybe it was a one-off?

Oct 27 2022, 10:35 PM · cloud-services-team (Kanban)
Andrew closed T317593: SystemdUnitDownForLong clouddumps1002:9100 Unit kiwix-mirror-update.service on node clouddumps1002 has been down for long. as Resolved.

this was due to upstream outages on kiwix hosts. That issue seems to be resolved now, and the alerts have cleared.

Oct 27 2022, 10:34 PM · cloud-services-team (Kanban)
Andrew closed T317597: SystemdUnitDownForLong clouddumps1001:9100 Unit kiwix-mirror-update.service on node clouddumps1001 has been down for long. as Resolved.

this was due to upstream outages on kiwix hosts. That issue seems to be resolved now, and the alerts have cleared.

Oct 27 2022, 10:34 PM · cloud-services-team (Kanban)
Andrew closed T318853: SystemdUnitDownForLong labstore1007:9100 Unit analytics-dumps-fetch-unique_devices.service on node labstore1007 has been down for long. as Resolved.

This seems to have cleared during the last run of this job.

Oct 27 2022, 10:33 PM · cloud-services-team (Kanban)
Andrew closed T318852: SystemdUnitDownForLong labstore1006:9100 Unit analytics-dumps-fetch-unique_devices.service on node labstore1006 has been down for long. as Resolved.

This seems to have cleared during the last fun of this job.

Oct 27 2022, 10:32 PM · cloud-services-team (Kanban)
Andrew closed T319312: Open Openstack APIs to the public internet as Resolved.

All done. There's at least one remaining api to open up: the puppet ENC. That's tracked in the subtasks of T317478

Oct 27 2022, 10:30 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T319312: Open Openstack APIs to the public internet, a subtask of T316436: Cloud VPS Terraform support, as Resolved.
Oct 27 2022, 10:29 PM · Epic, Cloud-VPS, cloud-services-team (Kanban)
Andrew updated subscribers of T321280: SystemdUnitDownForLong clouddumps1001:9100 Unit download_enterprise_htmldumps.service on node clouddumps1001 has been down for long..

Everyone (but mostly @ArielGlenn ) suspects that this will clear on the next run, which should be on the first.

Oct 27 2022, 10:28 PM · cloud-services-team (Kanban)
Andrew added a comment to T321731: wmcs-image-create no longer works.

I retried after manually creating the mountpoint and it worked. I haven't tested at length but it may be as simple as adding a sleep between creating the mountpoint and mounting.

Oct 27 2022, 6:42 PM · User-dcaro, cloud-services-team (Kanban)
Andrew created T321839: toolforge NFS file cleanup.
Oct 27 2022, 4:34 PM · cloud-services-team (Kanban)
Andrew added a comment to T321220: Openstack Magnum network setup.

Can someone catch me up about why we're talking about tenant subnets? Is this something we need in eqiad that we didn't need in codfw1dev? Or /are/ tenant subnets enabled in codfw1dev?

Oct 27 2022, 1:29 PM · Patch-For-Review, cloud-services-team (Kanban)