Page MenuHomePhabricator
Feed Advanced Search

Fri, Apr 19

Bstorm updated subscribers of T209527: Set up scratch and maps NFS services on cloudstore1008/9.
Fri, Apr 19, 7:39 PM · Patch-For-Review, cloud-services-team (Kanban)

Wed, Apr 17

Bstorm closed T204530: cloudvps: tools and toolsbeta trusty deprecation as Resolved.
Wed, Apr 17, 9:58 PM · Cloud-VPS (Ubuntu Trusty Deprecation), cloud-services-team (Kanban), Goal
Bstorm closed T219817: Update grid-configurator to keep tools-checker nodes submit nodes as Resolved.
Wed, Apr 17, 9:57 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
Bstorm closed T219817: Update grid-configurator to keep tools-checker nodes submit nodes, a subtask of T219243: Migrate tools-checker system to Stretch, as Resolved.
Wed, Apr 17, 9:57 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
Bstorm triaged T221301: Toolschecker webservice checks get out of sync likely from timeouts as High priority.
Wed, Apr 17, 9:56 PM · Toolforge, cloud-services-team (Kanban)
Bstorm awarded T221205: Toolforge: deploy sssd to tools-sgewebgrid* nodes a Love token.
Wed, Apr 17, 3:44 PM · cloud-services-team (Kanban), Cloud-VPS, LDAP, Toolforge

Sat, Apr 13

Bstorm added a comment to T220853: VMs on cloudvirt1015 crashing.

From dmesg:

[ 6872.646908] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[ 6872.646911] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[ 6872.646913] {1}[Hardware Error]: event severity: corrected
[ 6872.646915] {1}[Hardware Error]:  Error 0, type: corrected
[ 6872.646917] {1}[Hardware Error]:  fru_text: B3
[ 6872.646918] {1}[Hardware Error]:   section_type: memory error
[ 6872.646920] {1}[Hardware Error]:   error_status: 0x0000000000000400
[ 6872.646921] {1}[Hardware Error]:   physical_address: 0x00000077b4ea7800
[ 6872.646925] {1}[Hardware Error]:   node: 1 card: 2 module: 0 rank: 0 bank: 2 row: 45906 column: 992
[ 6872.646927] {1}[Hardware Error]:   error_type: 2, single-bit ECC
[ 6872.646949] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[ 6872.646951] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
[ 6872.646952] EDAC sbridge MC0: TSC eb28091adcc
[ 6872.646954] EDAC sbridge MC0: ADDR 77b4ea7800
[ 6872.646956] EDAC sbridge MC0: MISC 0
[ 6872.646958] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1555188932 SOCKET 0 APIC 0
[ 6872.646982] EDAC MC1: 0 CE memory read error on CPU_SrcID#1_Ha#1_Chan#0_DIMM#0 (channel:4 slot:0 page:0x77b4ea7 offset:0x800 grain:32 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 ha:1 channel_mask:1 rank:0)
[ 7164.888310] mce: [Hardware Error]: Machine check events logged
Sat, Apr 13, 9:45 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
Bstorm added a comment to T220853: VMs on cloudvirt1015 crashing.

A more complete version of what I copied out of the console before I rebooted can be found at P8394

Sat, Apr 13, 9:16 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
Bstorm created P8394 cloudvirt1015 misbehaving.
Sat, Apr 13, 9:16 PM
Bstorm added a comment to T220853: VMs on cloudvirt1015 crashing.

After forcing powercycle on the console, I have been evacuating things that are more important. Once the last three finish, it will be some k8s workers (one paws worker) and a bunch of toolsbeta stuff. I'll leave it at that for testing and such, but the tools-workers are depooled at this time (most are also still down after the reboot). The paws worker is off as well.

Sat, Apr 13, 9:14 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)

Thu, Apr 11

Bstorm added a comment to T166949: Homedir/UID info breaks after a while in Tools Kubernetes (can't read replica.my.cnf).

Restarting the kubernetes webservice process seems to have fixed the app. It looks like the PHP get_current_user() function started returning an empty string. get_current_user() checks the ownership of the script, so maybe this was caused by some sort of NFS communication error.

Possibly an LDAP issue as well?

Thu, Apr 11, 6:24 PM · Tool-Global-user-contributions, cloud-services-team (Kanban), Kubernetes, Toolforge, Cloud-VPS
Bstorm added a comment to T167086: Consider moving PAWS to its own Cloud VPS project, rather than using instances inside Toolforge.

That said, I can get the nfs up soon within the project (maybe even next week--because I am off tomorrow).

Thu, Apr 11, 3:24 PM · cloud-services-team (Kanban), Kubernetes, Patch-For-Review, PAWS, Cloud-Services
Bstorm added a comment to T167086: Consider moving PAWS to its own Cloud VPS project, rather than using instances inside Toolforge.

The NFS bit is partly about moving to a new project (as it is staged now), which is blocked by a bug still from cert manager and a lot of puppetization. So if we complete the NFS part, it won't show up in PAWS for a while anyway. The NFS piece is the easy bit. At this time, I don't think there is any way I'll have time to do the rest of it by then. Anything can happen, but I don't think we should plan on that with a hard timeline.

Thu, Apr 11, 3:22 PM · cloud-services-team (Kanban), Kubernetes, Patch-For-Review, PAWS, Cloud-Services

Wed, Apr 10

Bstorm closed T213516: Elasticsearch credential request for refill-api as Resolved.

Credentials created. Sorry for the wait!

Wed, Apr 10, 10:29 PM · cloud-services-team (Kanban), Toolforge
Bstorm renamed T220650: tools-manifest - webservicemonitor needs a longer timeout from tools-manifest - webservicemonitor is misbehaving somehow to tools-manifest - webservicemonitor needs a longer timeout.
Wed, Apr 10, 9:52 PM · cloud-services-team (Kanban), Toolforge
Bstorm lowered the priority of T220650: tools-manifest - webservicemonitor needs a longer timeout from High to Low.

Looking back through the log yet again, I think I've got this sorted out.

Wed, Apr 10, 9:50 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T220650: tools-manifest - webservicemonitor needs a longer timeout.

Yeah, mediaviews-api needs a venv rebuild, I'm fairly sure. This is working perfectly around that in some ways. What baffled me earlier was it complaining about being unable to restart things that are actually up and working. It might not be anything around webservice, but it seemed strange enough to dive down the rabbit hole.

Wed, Apr 10, 9:39 PM · cloud-services-team (Kanban), Toolforge
Bstorm lowered the priority of T220650: tools-manifest - webservicemonitor needs a longer timeout from Unbreak Now! to High.

Ok. I can now confirm after playing with it that is mostly working at this time. There is something weird about the behavior when it is trying to restart a webservice, though.

Wed, Apr 10, 9:35 PM · cloud-services-team (Kanban), Toolforge
Bstorm updated the task description for T220650: tools-manifest - webservicemonitor needs a longer timeout.
Wed, Apr 10, 9:23 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T220650: tools-manifest - webservicemonitor needs a longer timeout.

It appears to also completely fail to actually restart anything. Since it is failing, it may not actually be terribly bad that it's broken. Hard to tell. It is definitely growing log sizes for tools, though.

Wed, Apr 10, 8:16 PM · cloud-services-team (Kanban), Toolforge
Bstorm raised the priority of T220650: tools-manifest - webservicemonitor needs a longer timeout from High to Unbreak Now!.

This is trying to restart things that aren't broken and just misbehaving. I'll try to sort it out.

Wed, Apr 10, 8:07 PM · cloud-services-team (Kanban), Toolforge
Bstorm closed T220646: Tool dplbot not running correctly on the stretch grid as Invalid.

Looking through, this is actually incorrect because of an unrelated error in tooling. My apologies.

Wed, Apr 10, 8:06 PM · Toolforge
Bstorm triaged T220650: tools-manifest - webservicemonitor needs a longer timeout as High priority.
Wed, Apr 10, 7:58 PM · cloud-services-team (Kanban), Toolforge
Bstorm triaged T220646: Tool dplbot not running correctly on the stretch grid as Normal priority.
Wed, Apr 10, 7:22 PM · Toolforge
Bstorm closed T220201: Request creation of LTA-Tracker VPS project as Resolved.
Wed, Apr 10, 7:04 PM · cloud-services-team (Kanban), User-Zppix, Cloud-VPS (Project-requests)
Bstorm added a comment to T220201: Request creation of LTA-Tracker VPS project.

@Zppix The project should be good to go in horizon now

Wed, Apr 10, 6:23 PM · cloud-services-team (Kanban), User-Zppix, Cloud-VPS (Project-requests)
Bstorm closed T212625: Prepare and check storage layer for hywwiki, a subtask of T212597: Create Wikipedia Western Armenian, as Resolved.
Wed, Apr 10, 6:04 PM · User-Ladsgroup, User-Urbanecm, Patch-For-Review, Wiki-Setup (Create), Wikimedia-Language-setup
Bstorm closed T212625: Prepare and check storage layer for hywwiki as Resolved.

All set from WMCS end. Confirmed I can run queries from Toolforge.

Wed, Apr 10, 6:04 PM · cloud-services-team (Kanban), Data-Services, DBA, Operations
Bstorm claimed T212625: Prepare and check storage layer for hywwiki.
Wed, Apr 10, 8:53 AM · cloud-services-team (Kanban), Data-Services, DBA, Operations

Tue, Apr 9

Bstorm triaged T220543: montage-beta tool seems to be running up load and hanging in uninterruptible sleep as Normal priority.
Tue, Apr 9, 7:35 PM · cloud-services-team (Kanban), Toolforge
Bstorm updated the task description for T220531: Get the clouddb-services systems into Shinken and possibly icinga.
Tue, Apr 9, 7:15 PM · Patch-For-Review, cloud-services-team (Kanban), Data-Services
Bstorm updated subscribers of T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

Removing some subscribers to not spam the world if not needed.

Tue, Apr 9, 7:07 PM · Data-Services, cloud-services-team (Kanban)
Bstorm updated subscribers of T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

Just talked to @chasemp, who informed me that the original discussion included the idea of placing a floating IP on there that was iptables-restricted entirely to just tendril. We can certainly move to that, if that is the expectation and considered the best solution. That would mean we still don't have cloud clients connecting over that IP. I don't like it much because of the additional, not-so-well-monitored layer of depending on ferm, but I wanted to mention that here as an option that was previously discussed and can be done if it all comes down to that.

Tue, Apr 9, 7:06 PM · Data-Services, cloud-services-team (Kanban)
Bstorm updated subscribers of T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

I confirmed with @ayounsi that it's blocked by network restrictions going into the cloud. The issue would be that return traffic from these instances to the tendril server would have to be allowed. This leads me to three questions for the DBAs:

  1. Would we ever want to do that?
  2. How bad is it to just not monitor it with tendril in the case of toolsdb?
  3. Would it make sense to establish an instance of tendril (or similar) inside Cloud VPS for this and any future cases of virtualized DBs?
Tue, Apr 9, 6:56 PM · Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

I'm fully expecting that the prometheus stuff is going to be on the labs side. That's mostly about making sure we set up the right dashboards for decent monitoring.

Tue, Apr 9, 6:47 PM · Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

The other instance for this is 172.16.5.119, the secondary/slave.

Tue, Apr 9, 6:44 PM · Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

I can confirm that the firewalling isn't at the instance/security group level.

Tue, Apr 9, 6:40 PM · Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

Ping in general likely won't work into the cloud from anything outside it. However, I'm poking around at getting telnet 172.16.7.153 3306 working.

Tue, Apr 9, 6:37 PM · Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

Do you remind why we decided not to have floating IPs on those instances?

Tue, Apr 9, 6:32 PM · Data-Services, cloud-services-team (Kanban)
Bstorm moved T220020: Action items and work for retro 20190403 from Inbox to Epics on the cloud-services-team (Kanban) board.
Tue, Apr 9, 5:10 PM · Epic, cloud-services-team (Kanban)
Bstorm updated subscribers of T220531: Get the clouddb-services systems into Shinken and possibly icinga.

@aborrero mentioned https://gerrit.wikimedia.org/r/c/operations/puppet/+/499516 in meeting today, but I think (especially since that may not be complete), that should be added to in the course of this rather than being a blocker for getting this in shinken. Unmonitored DBs is very bad.

Tue, Apr 9, 5:07 PM · Patch-For-Review, cloud-services-team (Kanban), Data-Services
Bstorm added a comment to T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

The full name is clouddb1001.clouddb-services.eqiad.wmflabs, but you almost certainly need to use 172.16.7.153 instead unless we can resolve the DNS outside cloud in the future.

Tue, Apr 9, 4:55 PM · Data-Services, cloud-services-team (Kanban)
Bstorm moved T220531: Get the clouddb-services systems into Shinken and possibly icinga from Inbox to Doing on the cloud-services-team (Kanban) board.
Tue, Apr 9, 4:53 PM · Patch-For-Review, cloud-services-team (Kanban), Data-Services
Bstorm triaged T220531: Get the clouddb-services systems into Shinken and possibly icinga as High priority.
Tue, Apr 9, 4:53 PM · Patch-For-Review, cloud-services-team (Kanban), Data-Services
Bstorm added a comment to T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side.

That's part of the thing to figure out :)

Tue, Apr 9, 4:51 PM · Data-Services, cloud-services-team (Kanban)
Bstorm moved T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side from Inbox to Doing on the cloud-services-team (Kanban) board.
Tue, Apr 9, 4:47 PM · Data-Services, cloud-services-team (Kanban)
Bstorm added projects to T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side: cloud-services-team (Kanban), Data-Services.
Tue, Apr 9, 4:47 PM · Data-Services, cloud-services-team (Kanban)
Bstorm triaged T220530: Ensure clouddb1001 is monitored appropriately from the tendril/prometheus side as High priority.
Tue, Apr 9, 4:47 PM · Data-Services, cloud-services-team (Kanban)

Mon, Apr 8

Bstorm added a comment to T207590: Research CephFS as a replacement for NFS.

Also: because of weaknesses in the old system, we probably will want to start with Luminous or Mimic--especially since no other releases are actually supported. Bluestore-as-default is one of the biggest benefits.

Mon, Apr 8, 5:12 PM · Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T207590: Research CephFS as a replacement for NFS.

Some notes that also will apply to T90364:

  • Ceph is a very high-latency system without lots of grooming and love when used as anything but a straight-up object store like Swift. With proper tuning it can be somewhat faster than NFSv4 in sync mode (which we use).
  • 10G network on all nodes and clients should be viewed as a requirement except where we basically don't care at all about speed
  • If OSD nodes are large with lots of disks, a failure and rebuild could collapse the system by overstraining the OSDs and creating instability. Smaller, more numerous nodes allow for more resiliency and higher availability and performance.
  • The better the disk, the faster the processor needs to be...and single socket can outperform dual socket motherboards for ceph with the same processors--multiple cores good.
  • Ceph doesn't do comprehensive testing and development on Debian, though they do package for it with a basic install and check test. They also recommend upgrading stock Debian kernels. They do full comprehensive support on CentOS and Ubuntu. No mimic packages are available for Debian until Buster because of needing gcc8.
  • IO throughput requirement testing needs to be done so that we can tune things. I'm digging around in prometheus to find good metrics to watch and compare.
  • Reviewing the network architecture around Ceph is recommended to avoid collapsing the system because of network changes and unrecommended config
Mon, Apr 8, 5:09 PM · Data-Services, cloud-services-team (Kanban)

Fri, Apr 5

Bstorm added a comment to T160113: Move PAWS nfs onto its own share.

Apr 5 19:12:11 labstore1005 nfs-exportd[7797]: exportfs: Failed to stat /exp/project/paws: No such file or directory

Fri, Apr 5, 7:17 PM · cloud-services-team (Kanban), Cloud-Services, PAWS
Bstorm added a comment to T220164: osm4wiki generating around 300 perl processes wherever it runs, which overloads the server for purposes of gridengine.

Thanks for the context, @Kolossos. You were the only easily-found person in phabricator who has access to the project.
Is the project effectively abandoned and in need of a maintainer? I see plenz is on there, but I am not sure who they are in Phabricator, if they are on here.

Fri, Apr 5, 5:41 PM · Tools
Bstorm closed T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances as Resolved.
Fri, Apr 5, 4:46 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm closed T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances, a subtask of T193264: Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020, as Resolved.
Fri, Apr 5, 4:46 PM · Scoring-platform-team, Wikilabels, cloud-services-team (Kanban), Patch-For-Review, Epic, Cloud-VPS
Bstorm updated the task description for T220144: Decommission labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet.
Fri, Apr 5, 12:40 AM · Patch-For-Review, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm triaged T220164: osm4wiki generating around 300 perl processes wherever it runs, which overloads the server for purposes of gridengine as Normal priority.
Fri, Apr 5, 12:39 AM · Tools
Bstorm updated the task description for T220144: Decommission labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet.
Fri, Apr 5, 12:21 AM · Patch-For-Review, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm updated the task description for T220144: Decommission labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet.
Fri, Apr 5, 12:10 AM · Patch-For-Review, Operations, decommission, Data-Services, cloud-services-team (Kanban)

Thu, Apr 4

Bstorm added a comment to T219428: Decide how to handle encapi database.

I know I'm very late to the game on this. I'm just curious. If we think what I'm suggesting is good, I can shuffle things.

Thu, Apr 4, 7:09 PM · Cloud-VPS, cloud-services-team (Kanban)
Bstorm added a comment to T219428: Decide how to handle encapi database.

Do we want this stuff under mariadb profile or under wmcs more like modules/profile/manifests/wmcs/services/toolsdb_primary.pp (honest question)? Like modules/profile/wmcs/cloudinfra/ or something?

Thu, Apr 4, 7:07 PM · Cloud-VPS, cloud-services-team (Kanban)
Bstorm updated the task description for T220144: Decommission labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet.
Thu, Apr 4, 6:55 PM · Patch-For-Review, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm triaged T220144: Decommission labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet as Normal priority.
Thu, Apr 4, 6:54 PM · Patch-For-Review, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T193264: Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020.

osmdb is now on VMs.

Thu, Apr 4, 6:49 PM · Scoring-platform-team, Wikilabels, cloud-services-team (Kanban), Patch-For-Review, Epic, Cloud-VPS
Bstorm added a comment to T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances.
NOTE: the crash was caused by readonly permissions on recovery.conf, so when running su postgres -c 'pg_ctl promote -D /srv/postgres/9.6/main', the postgres user cannot rename the file as recovery.done like it normally would.
Thu, Apr 4, 6:04 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm added a comment to T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances.

@TheDJ and others, sorry the database crashed at one point because of permissions on a failover file. I'll see about fixing that in puppet and documenting a warning for future failovers. Some maps services may need a restart.

Thu, Apr 4, 6:00 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm added a comment to T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances.

Now we wait 5min at least.

Thu, Apr 4, 5:32 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm added a comment to T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances.

That tooks a bit because it required local rebase fussing to merge.

Thu, Apr 4, 5:32 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm added a comment to T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances.

Starting on the DNS change

Thu, Apr 4, 5:12 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm updated subscribers of T123978: Bring back abuse_filter_history view.
Thu, Apr 4, 2:53 PM · Security-Team, Patch-For-Review, Data-Services

Wed, Apr 3

Bstorm added a parent task for T90364: Test Ceph for instance storage: T220020: Action items and work for retro 20190403.
Wed, Apr 3, 11:15 PM · Wikimedia-Incident, cloud-services-team (Kanban), Cloud-Services
Bstorm added subtasks for T220020: Action items and work for retro 20190403: T90364: Test Ceph for instance storage, T207590: Research CephFS as a replacement for NFS.
Wed, Apr 3, 11:15 PM · Epic, cloud-services-team (Kanban)
Bstorm added a parent task for T207590: Research CephFS as a replacement for NFS: T220020: Action items and work for retro 20190403.
Wed, Apr 3, 11:15 PM · Data-Services, cloud-services-team (Kanban)
Bstorm triaged T220054: Consider an "apprentice" type program for getting folks introduced to an area by watching a mentor with a path to taking over responsibility as Normal priority.
Wed, Apr 3, 11:12 PM · cloud-services-team (Kanban)
Bstorm triaged T220053: Build a dev/testing environment for webservice that would make it easier to get people involved in fixes as Normal priority.
Wed, Apr 3, 11:06 PM · Toolforge, cloud-services-team (Kanban)
Bstorm triaged T220052: Create a couple of technical tasks available for new technical contributors and external folks as Normal priority.
Wed, Apr 3, 11:02 PM · cloud-services-team (Kanban)
Bstorm assigned T220051: Puppet cleanup around OpenStack to Andrew.

I'm leaving this more or less for @aborrero and @Andrew to edit :)

Wed, Apr 3, 11:01 PM · cloud-services-team (Kanban)
Bstorm triaged T220051: Puppet cleanup around OpenStack as Normal priority.
Wed, Apr 3, 11:00 PM · cloud-services-team (Kanban)
Bstorm added a comment to T220050: shell user conflict in cloud realm.

Sent email to the user registered at that address in LDAP about the use of the account to see if it is needed, etc.

Wed, Apr 3, 10:56 PM · cloud-services-team, LDAP
Bstorm updated the task description for T220020: Action items and work for retro 20190403.
Wed, Apr 3, 5:31 PM · Epic, cloud-services-team (Kanban)
Bstorm triaged T220020: Action items and work for retro 20190403 as Normal priority.
Wed, Apr 3, 5:27 PM · Epic, cloud-services-team (Kanban)
MusikAnimal awarded T215445: comment and actor view challenges for Cloud Services a Cup of Joe token.
Wed, Apr 3, 5:12 PM · cloud-services-team (Kanban), Data-Services

Tue, Apr 2

Bstorm added a comment to T219563: Add a DNS alias for the wikilabels database (wikilabels.db.svc.eqiad.wmflabs).

@Halfak, I've set up the alias. So you should be able to connect to wikilabels.db.svc.eqiad.wmflabs instead (which is pointed at clouddb1002). This will hopefully make life a tad easier in the future.

Tue, Apr 2, 10:59 PM · Scoring-platform-team (Current), Patch-For-Review, Data-Services, cloud-services-team (Kanban), Wikilabels, Cloud-VPS

Mon, Apr 1

Bstorm updated subscribers of T209527: Set up scratch and maps NFS services on cloudstore1008/9.
Mon, Apr 1, 11:51 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm claimed T209527: Set up scratch and maps NFS services on cloudstore1008/9.
Mon, Apr 1, 11:51 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm updated the task description for T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances.
Mon, Apr 1, 8:20 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm updated the task description for T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances.
Mon, Apr 1, 8:15 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm added a comment to T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances.

Setting schedule for change on Thursday with announcement going out today, then.

Mon, Apr 1, 8:15 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm added a comment to T167086: Consider moving PAWS to its own Cloud VPS project, rather than using instances inside Toolforge.

How would they be exposed exactly? The same as Toolforge, (two NFS shares mounted with one them pointed from a symbolic link)? IE, will the hack defined in T192214 still work?

Mon, Apr 1, 8:06 PM · cloud-services-team (Kanban), Kubernetes, Patch-For-Review, PAWS, Cloud-Services
Bstorm added a comment to T219652: Final migration of osmdb.eqiad.wmnet into Cloud VPS instances.

Great! Thanks.

Mon, Apr 1, 7:31 PM · Patch-For-Review, Maps, cloud-services-team (Kanban), Cloud-VPS
Bstorm added a comment to T219817: Update grid-configurator to keep tools-checker nodes submit nodes.

So I'm going to revert and then resubmit once tools-checker-01 and -02 are down

Mon, Apr 1, 7:07 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T219817: Update grid-configurator to keep tools-checker nodes submit nodes.

This should work now.
Before:

Mon, Apr 1, 7:07 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
Bstorm triaged T219817: Update grid-configurator to keep tools-checker nodes submit nodes as High priority.
Mon, Apr 1, 5:25 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T216749: Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet.

@Marostegui it was supposed to be down. It needed a kill -9.

Mon, Apr 1, 3:24 PM · ops-eqiad, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm updated the task description for T216749: Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet.
Mon, Apr 1, 3:23 PM · ops-eqiad, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm reassigned T216749: Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet from Bstorm to RobH.
Mon, Apr 1, 3:23 PM · ops-eqiad, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm updated the task description for T216749: Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet.
Mon, Apr 1, 3:20 PM · ops-eqiad, Operations, decommission, Data-Services, cloud-services-team (Kanban)

Fri, Mar 29

Bstorm renamed T216749: Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet from Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready to Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet.
Fri, Mar 29, 10:40 PM · ops-eqiad, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm renamed T216749: Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet from Reclaim/Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready to Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet as soon as they are ready.
Fri, Mar 29, 10:09 PM · ops-eqiad, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T216749: Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet.

Database services (postgres and mariadb) are now shut off, and the spare role is applied.

Fri, Mar 29, 10:07 PM · ops-eqiad, Operations, decommission, Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T219563: Add a DNS alias for the wikilabels database (wikilabels.db.svc.eqiad.wmflabs).

Good, I'll remove that...carefully.

Fri, Mar 29, 6:15 PM · Scoring-platform-team (Current), Patch-For-Review, Data-Services, cloud-services-team (Kanban), Wikilabels, Cloud-VPS