Page MenuHomePhabricator
Feed Advanced Search

Today

Bstorm awarded T272430: mixnmatch microsync process has a large error file a Like token.
Wed, Jan 20, 2:59 PM · Tools

Yesterday

Bstorm added a comment to T272435: cluebotng using a high storage on NFS.

Thanks!

Tue, Jan 19, 11:34 PM · Tools
Bstorm added a comment to T272247: 2021-01-17: tools NFS share cleanup.

That brings us down to /dev/drbd4 8.0T 5.6T 2.1T 73% /srv/tools. The user tickets should bring things well into the safe zone when their cleanups are done.

Tue, Jan 19, 11:33 PM · cloud-services-team (Kanban)
Bstorm removed a project from T272434: Toolforge tool 'archive-things-4' using very high disk space: cloud-services-team (Kanban).
Tue, Jan 19, 11:27 PM · Tools
Bstorm removed a project from T272435: cluebotng using a high storage on NFS: cloud-services-team (Kanban).
Tue, Jan 19, 11:26 PM · Tools
Bstorm created T272436: wmr-bot home directory using high NFS storage.
Tue, Jan 19, 11:26 PM · Tools
Bstorm created T272435: cluebotng using a high storage on NFS.
Tue, Jan 19, 11:20 PM · Tools
Bstorm created T272434: Toolforge tool 'archive-things-4' using very high disk space.
Tue, Jan 19, 11:13 PM · Tools
Bstorm added a comment to T272247: 2021-01-17: tools NFS share cleanup.

That was enough to get a recovery. However, it seems like a good idea to see what users can clean up since there are projects taking up quite significant space.

Tue, Jan 19, 11:04 PM · cloud-services-team (Kanban)
Bstorm added a parent task for T272430: mixnmatch microsync process has a large error file: T272247: 2021-01-17: tools NFS share cleanup.
Tue, Jan 19, 10:33 PM · Tools
Bstorm added a subtask for T272247: 2021-01-17: tools NFS share cleanup: T272430: mixnmatch microsync process has a large error file.
Tue, Jan 19, 10:33 PM · cloud-services-team (Kanban)
Bstorm triaged T272430: mixnmatch microsync process has a large error file as Medium priority.
Tue, Jan 19, 10:33 PM · Tools
Bstorm added a comment to T272247: 2021-01-17: tools NFS share cleanup.

The bigger files:

19749772 KB /srv/tools/shared/tools/project/request/error.log
21072788 KB /srv/tools/shared/tools/project/mediawiki-feeds/error.log
22473872 KB /srv/tools/shared/tools/project/wikidata-primary-sources/error.log
22900348 KB /srv/tools/shared/tools/project/khanamalumat/qaus.err
23343528 KB /srv/tools/shared/tools/project/cluebotng/logs/relay_irc.log
24260512 KB /srv/tools/shared/tools/project/fiwiki-tools/logs/seulojabot2.log
24343364 KB /srv/tools/shared/tools/project/ifttt/www/python/src/ifttt.log
26970304 KB /srv/tools/shared/tools/project/mix-n-match/error.log
27890700 KB /srv/tools/shared/tools/project/img-usage/public_html/wikidata-20170130-all.json
31437236 KB /srv/tools/shared/tools/project/freebase/freebase-rdf-latest.gz
31811904 KB /srv/tools/shared/tools/project/wdumps/dumpfiles/generated/wdump-1107.nt.gz
31811908 KB /srv/tools/shared/tools/project/wdumps/dumpfiles/generated/wdump-1104.nt.gz
32818048 KB /srv/tools/shared/tools/project/khanamalumat/purawiki.err
34621292 KB /srv/tools/shared/tools/project/verification-pages/verification-pages/log/production.log.1
34792852 KB /srv/tools/shared/tools/project/geohack/error.log
35880272 KB /srv/tools/shared/tools/project/wdumps/dumpfiles/generated/wdump-1097.nt.gz
36023964 KB /srv/tools/shared/tools/project/ping08bot/mybot.out
36285016 KB /srv/tools/shared/tools/project/wiki2prop/prediction_ranked_Wiki2PropDEPLOY_year2018_embedding300LG_DEPLOY.h5
49303704 KB /srv/tools/shared/tools/project/splinetools/dumps/enwiki-20141106-pages-articles.xml
64778744 KB /srv/tools/shared/tools/project/wikidata-analysis/public_html_tmp/dumpfiles/json-20191125/20191125.json.gz
78643272 KB /srv/tools/shared/tools/project/robokobot/virgule.err
89133980 KB /srv/tools/shared/tools/project/.shared/dumps/20201221.json.gz
89481636 KB /srv/tools/shared/tools/project/.shared/dumps/20210104.json.gz
101857128 KB /srv/tools/shared/tools/project/magnus-toolserver/error.log
107005676 KB /srv/tools/shared/tools/project/meetbot/meetbot.out
107035912 KB /srv/tools/shared/tools/project/meetbot/logs/messages.log
194101748 KB /srv/tools/shared/tools/project/mix-n-match/mnm-microsync.err

A few of those are easy enough to just clean up myself.

Tue, Jan 19, 10:26 PM · cloud-services-team (Kanban)
Bstorm added a comment to T272247: 2021-01-17: tools NFS share cleanup.

Running ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" > tools_large_files_20210119.txt

Tue, Jan 19, 4:51 PM · cloud-services-team (Kanban)
Bstorm added a comment to T272247: 2021-01-17: tools NFS share cleanup.

Got some clear heavy users here:

Tue, Jan 19, 4:45 PM · cloud-services-team (Kanban)
Bstorm added a comment to T272247: 2021-01-17: tools NFS share cleanup.

It's nice to see the alert being accurate these days.
/dev/drbd4 8.0T 6.3T 1.4T 83% /srv/tools

Tue, Jan 19, 4:35 PM · cloud-services-team (Kanban)
Bstorm awarded T272303: [ceph] Upgrade to 14.2.16 from 14.2.5 a Yellow Medal token.
Tue, Jan 19, 3:10 PM · cloud-services-team (Kanban)

Fri, Jan 15

Bstorm closed T270410: Prepare and check storage layer for niawiktionary as Resolved.

This is done,

Fri, Jan 15, 11:17 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm closed T270410: Prepare and check storage layer for niawiktionary, a subtask of T270409: Create Wiktionary Nias, as Resolved.
Fri, Jan 15, 11:16 PM · MW-1.36-notes (1.36.0-wmf.25; 2021-01-05), Patch-For-Review, User-Urbanecm, Wikimedia-Language-setup, Wiki-Setup (Create)
Bstorm closed T270414: Prepare and check storage layer for niawiki as Resolved.

This is done.

Fri, Jan 15, 11:05 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm closed T270414: Prepare and check storage layer for niawiki, a subtask of T270408: Create Wikipedia Nias, as Resolved.
Fri, Jan 15, 11:05 PM · MW-1.36-notes (1.36.0-wmf.25; 2021-01-05), Patch-For-Review, Wikimedia-Language-setup, User-Urbanecm, Wiki-Setup (Create)
Bstorm closed T269432: Prepare and check storage layer for wawikisource as Resolved.

This is done.

Fri, Jan 15, 10:58 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm closed T269432: Prepare and check storage layer for wawikisource, a subtask of T269431: Create Wikisource Walloon, as Resolved.
Fri, Jan 15, 10:58 PM · Patch-For-Review, MW-1.36-notes (1.36.0-wmf.21; 2020-12-08), Wiki-Setup (Create), User-Urbanecm
Bstorm added a comment to T271952: Request creation of "maps-experiments" VPS project.

The project is now created with default quotas, which should be able to get you started.

Fri, Jan 15, 10:40 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Bstorm closed T270280: Prepare and check storage layer for bclwiktionary as Resolved.

This is done.

Fri, Jan 15, 10:21 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm closed T270280: Prepare and check storage layer for bclwiktionary, a subtask of T270274: Create Wiktionary Bikol, as Resolved.
Fri, Jan 15, 10:20 PM · MW-1.36-notes (1.36.0-wmf.25; 2021-01-05), User-Urbanecm, Wiki-Setup (Create)
Bstorm claimed T234615: Re-create views for abuse_filter_log including two new columns.
Fri, Jan 15, 10:13 PM · cloud-services-team (Kanban), Data-Services
Bstorm closed T270276: Prepare and check storage layer for diqwiktionary as Resolved.

All set

Fri, Jan 15, 10:11 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm closed T270276: Prepare and check storage layer for diqwiktionary, a subtask of T270275: Create Wiktionary Zazaki, as Resolved.
Fri, Jan 15, 10:11 PM · Patch-For-Review, MW-1.36-notes (1.36.0-wmf.25; 2021-01-05), User-Urbanecm, Wiki-Setup (Create)
Bstorm closed T268458: Prepare and check storage layer for skrwiktionary as Resolved.

This one is all set.

Fri, Jan 15, 9:59 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm closed T268458: Prepare and check storage layer for skrwiktionary, a subtask of T268448: Create Wiktionary Saraiki, as Resolved.
Fri, Jan 15, 9:58 PM · MW-1.36-notes (1.36.0-wmf.20; 2020-12-01), User-Urbanecm, Wiki-Setup (Create)
Bstorm added a comment to T272161: Toolforge: reFill has not processed two days.

@Missvain Please check now. This service doesn't always recover well after an outage and needs a kick. The network failed for a while during T261134: upgrade cloud-vps openstack to Openstack version 'Stein', which would quite likely cause this.

Fri, Jan 15, 4:43 PM · Tool-refill
Bstorm closed T272127: 2021-01-15: ** PROBLEM alert - labstore1004/Ensure mysql credential creation for tools users is running is CRITICAL **, a subtask of T272125: Memory errors on clouddb1019, as Resolved.
Fri, Jan 15, 4:19 PM · DBA, ops-eqiad, SRE
Bstorm closed T272127: 2021-01-15: ** PROBLEM alert - labstore1004/Ensure mysql credential creation for tools users is running is CRITICAL ** as Resolved.
Fri, Jan 15, 4:19 PM · cloud-services-team (Kanban)
Bstorm moved T270276: Prepare and check storage layer for diqwiktionary from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Fri, Jan 15, 4:13 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm added a project to T270276: Prepare and check storage layer for diqwiktionary: cloud-services-team (Kanban).
Fri, Jan 15, 4:12 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm moved T270280: Prepare and check storage layer for bclwiktionary from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Fri, Jan 15, 4:11 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm moved T270280: Prepare and check storage layer for bclwiktionary from Backlog to Wiki replicas on the Data-Services board.
Fri, Jan 15, 4:11 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm added a project to T270280: Prepare and check storage layer for bclwiktionary: cloud-services-team (Kanban).
Fri, Jan 15, 4:11 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm added a comment to T272127: 2021-01-15: ** PROBLEM alert - labstore1004/Ensure mysql credential creation for tools users is running is CRITICAL **.

That fixed it:

Fri, Jan 15, 4:03 PM · cloud-services-team (Kanban)
Bstorm moved T268458: Prepare and check storage layer for skrwiktionary from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Fri, Jan 15, 4:00 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm moved T269432: Prepare and check storage layer for wawikisource from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Fri, Jan 15, 4:00 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm moved T270410: Prepare and check storage layer for niawiktionary from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Fri, Jan 15, 4:00 PM · cloud-services-team (Kanban), Data-Services, DBA
Bstorm moved T270414: Prepare and check storage layer for niawiki from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Fri, Jan 15, 4:00 PM · cloud-services-team (Kanban), Data-Services, DBA
Ariutta awarded T271875: Request creation of wikipathways VPS project a Party Time token.
Fri, Jan 15, 4:45 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Thu, Jan 14

Bstorm closed T271382: Request creation of annotation VPS project as Resolved.

Project is created with default quotas. You should have access via Horizon now.

Thu, Jan 14, 11:49 PM · cloud-services-team (Kanban), Wikidata, Abstract Wikipedia, Wikidata Lexicographical data, Cloud-VPS (Project-requests)
Bstorm closed T271442: Resource request for cyberbot project as Resolved.

Done. You should be good to go. Let me know if you don't have access to the IP or something.

Thu, Jan 14, 11:42 PM · cloud-services-team (Kanban), Cloud-VPS (Quota-requests)
Bstorm closed T271875: Request creation of wikipathways VPS project as Resolved.

This project is created with default quotas. Please try it out at https://horizon.wikimedia.org

Thu, Jan 14, 11:37 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Bstorm added a comment to T218338: labstore: Re-evaluate traffic shaping settings.

The write throttle is unchanged partly because we haven't upgraded the DRBD network yet. At very least, NFS reads should no longer feel like you are mounting it over a cell phone network.

Thu, Jan 14, 10:34 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm added a comment to T218338: labstore: Re-evaluate traffic shaping settings.

The read throttle on bastions is now much higher.

Thu, Jan 14, 10:32 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm added a comment to T218338: labstore: Re-evaluate traffic shaping settings.

This task should be linked to this patch. Oops: https://gerrit.wikimedia.org/r/c/operations/puppet/+/655952

Thu, Jan 14, 10:31 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm added a project to T266915: "Unable to query nbdime API" error: cloud-services-team (Kanban).

I haven't been able to come back around to this yet. It seems like it could be added to the extensions in the singleuser image in https://github.com/toolforge/paws/blob/master/images/singleuser/install-extensions

Thu, Jan 14, 4:52 PM · cloud-services-team (Kanban), PAWS
Bstorm added a project to T271997: New install of mw-vagrant with striker role fails on phab_setup_db provisioning step: MediaWiki-Vagrant.
Thu, Jan 14, 12:30 AM · MediaWiki-Vagrant
Bstorm created T271997: New install of mw-vagrant with striker role fails on phab_setup_db provisioning step.
Thu, Jan 14, 12:30 AM · MediaWiki-Vagrant

Wed, Jan 13

Bstorm moved T271442: Resource request for cyberbot project from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Wed, Jan 13, 5:03 PM · cloud-services-team (Kanban), Cloud-VPS (Quota-requests)
Bstorm claimed T271442: Resource request for cyberbot project.

Approved in weekly meeting.

Wed, Jan 13, 5:02 PM · cloud-services-team (Kanban), Cloud-VPS (Quota-requests)
Bstorm moved T271952: Request creation of "maps-experiments" VPS project from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Wed, Jan 13, 4:59 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Bstorm claimed T271952: Request creation of "maps-experiments" VPS project.
Wed, Jan 13, 4:58 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Bstorm moved T271875: Request creation of wikipathways VPS project from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Wed, Jan 13, 4:52 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Bstorm claimed T271875: Request creation of wikipathways VPS project.
Wed, Jan 13, 4:52 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Bstorm moved T271382: Request creation of annotation VPS project from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Wed, Jan 13, 4:50 PM · cloud-services-team (Kanban), Wikidata, Abstract Wikipedia, Wikidata Lexicographical data, Cloud-VPS (Project-requests)
Bstorm added a project to T271382: Request creation of annotation VPS project: cloud-services-team (Kanban).
Wed, Jan 13, 4:50 PM · cloud-services-team (Kanban), Wikidata, Abstract Wikipedia, Wikidata Lexicographical data, Cloud-VPS (Project-requests)
Bstorm claimed T271382: Request creation of annotation VPS project.

Approved in weekly meeting

Wed, Jan 13, 4:48 PM · cloud-services-team (Kanban), Wikidata, Abstract Wikipedia, Wikidata Lexicographical data, Cloud-VPS (Project-requests)

Tue, Jan 12

Bstorm triaged T271847: Improve cleanup behavior on failure for maintain-kubeusers as High priority.
Tue, Jan 12, 6:27 PM · Patch-For-Review, cloud-services-team (Kanban), Kubernetes, Toolforge
Bstorm closed T271842: maintain-kubeusers broken in Toolforge as Resolved.
$ kubectl -n maintain-kubeusers logs maintain-kubeusers-7f7b44754c-mgrjj
starting a run
Homedir already exists for /data/project/adhs-wde
Wrote config in /data/project/adhs-wde/.kube/config
Provisioned creds for user adhs-wde
finished run, wrote 1 new accounts

That fixed it. This was likely caused by a latency issue in etcd slowing down the cleanup of a failed request. Until we can make etcd more performant (T267966) we are going to see issues around that, so I think I need to teach this service how to clean up after itself (will create subtask).

Tue, Jan 12, 6:22 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Bstorm updated the task description for T271842: maintain-kubeusers broken in Toolforge.
Tue, Jan 12, 6:14 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Bstorm renamed T271842: maintain-kubeusers broken in Toolforge from webservice --backend=kubernetes python3.7 shell fails for new tool to maintain-kubeusers broken in Toolforge.
Tue, Jan 12, 6:14 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Bstorm triaged T271842: maintain-kubeusers broken in Toolforge as Unbreak Now! priority.

maintain-kubeusers-7f7b44754c-kkm76 0/1 CrashLoopBackOff 1513 32d
Unfortunately, the problem is the latter.

Tue, Jan 12, 6:11 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Bstorm claimed T271842: maintain-kubeusers broken in Toolforge.

This suggests your tool does not have authentication credentials created. That either means you beat the service that creates that or that the service is broken.

Tue, Jan 12, 6:08 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Bstorm added a comment to T269620: maintain-dbusers doesn't close connections right on harvest-replicas.

That's interesting...
I'll check that out. The script starts up every minute, but that's clearly not right.

Tue, Jan 12, 2:00 PM · Patch-For-Review, Data-Services, cloud-services-team (Kanban)
jcrespo awarded T260511: Parametrize wmf-pt-kill so it can connect to different sockets a Love token.
Tue, Jan 12, 9:11 AM · Data-Services, User-Kormat, DBA, cloud-services-team (Kanban)

Mon, Jan 11

Bstorm added a comment to T260511: Parametrize wmf-pt-kill so it can connect to different sockets.

I believe all replicas pass puppet now (after creating that grant). @Marostegui if you can check that the software is doing what it should be doing now, I think this can be closed.

Mon, Jan 11, 10:42 PM · Data-Services, User-Kormat, DBA, cloud-services-team (Kanban)
Bstorm added a comment to T260511: Parametrize wmf-pt-kill so it can connect to different sockets.

Yep, the user is not created. Creating the grant using the info in modules/role/templates/mariadb/grants/wiki-replicas.sql since that appears to be what is in the existing replicas.

Mon, Jan 11, 10:27 PM · Data-Services, User-Kormat, DBA, cloud-services-team (Kanban)
Bstorm added a comment to T260511: Parametrize wmf-pt-kill so it can connect to different sockets.

That patch was safe on the old servers (no change). On the multi-instance I see the error: Access denied for user 'wmf-pt-kill'@'localhost'. That sounds like it is pretty close to working.

Mon, Jan 11, 10:21 PM · Data-Services, User-Kormat, DBA, cloud-services-team (Kanban)
Bstorm updated the task description for T271476: Iron out issues in the proxy structure for multi-instance wikireplicas.
Mon, Jan 11, 10:15 PM · Patch-For-Review, Data-Services, cloud-services-team (Kanban)

Sat, Jan 9

Bstorm added a comment to T271058: cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem.

We confirmed this is the standby, so it won't impact the cloud during this nonsense (and thus isn't a "unbreak now" or real outage).
I just checked the web console, and apparently the network adapter's status is "unknown"


That might just be this version of iLO being Helpful, though. On Monday, if this is under warranty, we could parse the active health log, possibly (if it is enabled).

Sat, Jan 9, 3:04 AM · SRE, ops-eqiad, cloud-services-team (Kanban)
Bstorm triaged T271058: cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem as High priority.
Sat, Jan 9, 2:46 AM · SRE, ops-eqiad, cloud-services-team (Kanban)
Bstorm added a comment to T271058: cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem.

That last one is obviously from much earlier, but that's kinda weird.

Sat, Jan 9, 2:42 AM · SRE, ops-eqiad, cloud-services-team (Kanban)
Bstorm added a comment to T271058: cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem.

Does it have broken hardware or something? This is from dmesg:

Sat, Jan 9, 2:42 AM · SRE, ops-eqiad, cloud-services-team (Kanban)
Bstorm added a comment to T271058: cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem.

That error is happening a fair bit. Dunno if that is related.

Sat, Jan 9, 2:39 AM · SRE, ops-eqiad, cloud-services-team (Kanban)
Bstorm added a comment to T271058: cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem.

I was about to make another ticket for it until I saw your comment

Sat, Jan 9, 2:38 AM · SRE, ops-eqiad, cloud-services-team (Kanban)

Fri, Jan 8

Bstorm updated subscribers of T268280: labstore1006 spontaneous reboot.

So at this point, this has been in a failover state for a couple months. The last time this happened we gave up and failed back (and it happened again). I believe the warranty expired in 2020, so the opportunity to fix this on the last round of sudden reboots is already gone. That might not leave us with much. The system is strained while in a failover state, but it has no automatic HA.

Fri, Jan 8, 5:33 PM · cloud-services-team (Hardware)
Bstorm updated subscribers of T268280: labstore1006 spontaneous reboot.

According to T268285: update RAID controller firmware on labstore1006, 1007, we are already on recent firmware with regard to this issue. I'd briefly discussed involving HPE to get a fix with @Jclark-ctr back on that ticket, but I'm not sure that was done or if we have a service agreement/warranty either way.

Fri, Jan 8, 5:23 PM · cloud-services-team (Hardware)
Bstorm closed T268285: update RAID controller firmware on labstore1006, 1007, a subtask of T268280: labstore1006 spontaneous reboot, as Resolved.
Fri, Jan 8, 5:22 PM · cloud-services-team (Hardware)
Bstorm closed T268285: update RAID controller firmware on labstore1006, 1007 as Resolved.
Fri, Jan 8, 5:22 PM · ops-eqiad, cloud-services-team (Kanban), SRE
Bstorm added a comment to T268280: labstore1006 spontaneous reboot.
Fri, Jan 8, 5:21 PM · cloud-services-team (Hardware)
Bstorm added a comment to T271561: dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 .

I suspect this is related to https://lists.wikimedia.org/pipermail/wikitech-l/2020-November/094044.html

Fri, Jan 8, 5:16 PM · Dumps-Generation, Analytics-Kanban, Analytics, cloud-services-team (Kanban), Wikimedia-Portals
Bstorm added a project to T271561: dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 : Dumps-Generation.
Fri, Jan 8, 5:07 PM · Dumps-Generation, Analytics-Kanban, Analytics, cloud-services-team (Kanban), Wikimedia-Portals
Bstorm closed T271509: [wmcs][haproxy][puppet] Puppet failing on clouddb-wikireplicas-proxy as Resolved.

Looks fine. Thanks for finding the bug.

Fri, Jan 8, 4:34 PM · cloud-services-team (Kanban)
Bstorm added a project to T271554: k8splay project has broken puppet because of incorrect FQDNs: Cloud-VPS.
Fri, Jan 8, 4:27 PM · Cloud-VPS
Bstorm triaged T271554: k8splay project has broken puppet because of incorrect FQDNs as Medium priority.
Fri, Jan 8, 4:27 PM · Cloud-VPS

Thu, Jan 7

Bstorm updated subscribers of T271476: Iron out issues in the proxy structure for multi-instance wikireplicas.

@aborrero I need to sync up with you on the naming and IPVS stuff here when you have time. I'll suggest a scheduled time if I miss you tomorrow.

Thu, Jan 7, 10:55 PM · Patch-For-Review, Data-Services, cloud-services-team (Kanban)
Bstorm added a comment to T271476: Iron out issues in the proxy structure for multi-instance wikireplicas.

At this point, each proxy has the capability to route to the new replicas as well as the old ones, but it only routes to each instances primary. I presume we want the analytics replica to be the standby for "web" and vice versa, right @Marostegui? That seems better than requiring manual intervention if something happens.

Thu, Jan 7, 10:27 PM · Patch-For-Review, Data-Services, cloud-services-team (Kanban)
Bstorm raised the priority of T271476: Iron out issues in the proxy structure for multi-instance wikireplicas from Medium to High.
Thu, Jan 7, 10:23 PM · Patch-For-Review, Data-Services, cloud-services-team (Kanban)
Bstorm updated the task description for T271476: Iron out issues in the proxy structure for multi-instance wikireplicas.
Thu, Jan 7, 10:23 PM · Patch-For-Review, Data-Services, cloud-services-team (Kanban)
Bstorm triaged T271476: Iron out issues in the proxy structure for multi-instance wikireplicas as Medium priority.
Thu, Jan 7, 10:18 PM · Patch-For-Review, Data-Services, cloud-services-team (Kanban)
Bstorm created T271476: Iron out issues in the proxy structure for multi-instance wikireplicas.
Thu, Jan 7, 10:18 PM · Patch-For-Review, Data-Services, cloud-services-team (Kanban)
Bstorm closed T270820: Some public PAWS files give 504 errors as Resolved.
Thu, Jan 7, 10:10 PM · cloud-services-team (Kanban), PAWS
Bstorm added a comment to T270820: Some public PAWS files give 504 errors.

That did it https://public.paws.wmcloud.org/12410844/100days/100days-Day-088.ipynb

Thu, Jan 7, 10:09 PM · cloud-services-team (Kanban), PAWS
Bstorm committed rPAWSb53261dcf705: nbserve timeouts: add timeout and hoist all of them to server block (authored by Bstorm).
nbserve timeouts: add timeout and hoist all of them to server block
Thu, Jan 7, 6:48 PM