Bstorm (Brooke)
Ops Witch -- Wikimedia Cloud Services Team

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Jan 22 2018, 10:09 PM (43 w, 1 d)
Availability
Available
IRC Nick
bstorm_
LDAP User
Bstorm
MediaWiki User
BStorm (WMF) [ Global Accounts ]

On the wikis, I'm BStorm (WMF), bstorm_ on IRC and Bstorm on gerrit and WikiTech.

I work for or provide services to the Wikimedia Foundation, but this is my only Phabricator account. Edits, statements, or other contributions made from this account are my own, and may not reflect the views of the Foundation.

Recent Activity

Yesterday

Bstorm added a comment to T209517: Upgrade/reboot labsdb* servers.

Note: labsdb1004's remote serial terminal seems broken. lasdb1006 looked bad, but recovered after reboot.

Tue, Nov 20, 6:30 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm awarded T209956: Adopt Puppet data types a Love token.
Tue, Nov 20, 3:58 PM · cloud-services-team (Kanban)

Mon, Nov 19

Bstorm added a project to T204422: Install OCRmyPDF dependencies on Tools: cloud-services-team (Kanban).
Mon, Nov 19, 9:36 PM · cloud-services-team (Kanban), Toolforge, Wikisource
Bstorm closed T206238: Quiet the logging from maintain-dbusers as Resolved.

It's quiet as a mouse now. It should still spit out logs when it actually does something.

Mon, Nov 19, 9:21 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm claimed T205713: Prepare and check storage layer for liwikinews.
Mon, Nov 19, 7:57 PM · cloud-services-team (Kanban), Cloud-Services, DBA
Bstorm moved T205713: Prepare and check storage layer for liwikinews from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Nov 19, 7:57 PM · cloud-services-team (Kanban), Cloud-Services, DBA
Bstorm added a project to T205713: Prepare and check storage layer for liwikinews: cloud-services-team (Kanban).
Mon, Nov 19, 7:57 PM · cloud-services-team (Kanban), Cloud-Services, DBA
Bstorm moved T206238: Quiet the logging from maintain-dbusers from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Nov 19, 6:49 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm moved T191491: Adjust bandwidth/connection limits, memory settings on labstore1006,7 as appropriate from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Nov 19, 6:48 PM · cloud-services-team (Kanban), User-ArielGlenn, Cloud-Services, Datasets-General-or-Unknown, Operations
Bstorm added a comment to T206239: 2018-10-04: tools and NFS share cleanup (high usage).

Unless we want to wait for the subtasks.

Mon, Nov 19, 6:47 PM · cloud-services-team (Kanban)
Bstorm added a comment to T206239: 2018-10-04: tools and NFS share cleanup (high usage).

For us that's pretty good. We could probably just close this one for now.

Mon, Nov 19, 6:47 PM · cloud-services-team (Kanban)
Bstorm updated the task description for T209517: Upgrade/reboot labsdb* servers.
Mon, Nov 19, 4:57 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm updated the task description for T209517: Upgrade/reboot labsdb* servers.
Mon, Nov 19, 4:48 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm added a comment to T209517: Upgrade/reboot labsdb* servers.

@Banyek I think as long as it works for you, and they are all on different days, it's fine for the wiki replicas.

Mon, Nov 19, 3:14 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA

Sat, Nov 17

Bstorm added a comment to T209627: Kubernetes and docker packages for stretch are needed for toolforge bastions.

@aborrero I dare say you can be. We will probably both need to mirror updated k8s stretch packages and docker-ce stretch packages into tools aptly and then hack some puppet around them so that our setup can be maintained. To unblock the grid upgrade, all we need is a kubernetes-client with all it needs to get by. Part of that is likely a flannel package (which we'd have to invent) or flannel installed via kubeadm. I'll have to dig deeper to be sure exactly what is required to just get a bastion talking to both existing k8s and sonofgridengine with minimal tech debt for the next phases of k8s upgrades.

Sat, Nov 17, 10:24 PM · Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)

Fri, Nov 16

Bstorm added a comment to T209517: Upgrade/reboot labsdb* servers.

@Banyek Just looking to confirm that you will be available during the Toolsdb primary and secondary reboots as support to verify things are working correctly and help if not for 11/20 @ 17:15 for labsdb1004 and 11/21 @ 17:15 for labsdb1005.

Fri, Nov 16, 10:36 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm added a comment to T209517: Upgrade/reboot labsdb* servers.

We stopped supporting mariadb on jessie some months ago- I am not sure you will have packages to upgrade to.

Fri, Nov 16, 3:11 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA

Thu, Nov 15

Bstorm closed T209075: Add some exec nodes to the grid as Declined.

Honestly, we've had no queue waiters for a while since we enabled all disabled exec hosts. I'm going to reject this task for now, pending the new grid build.

Thu, Nov 15, 11:05 PM · cloud-services-team (Kanban), Toolforge
Bstorm closed T209075: Add some exec nodes to the grid, a subtask of T208940: Grid slow on Toolforge, as Declined.
Thu, Nov 15, 11:05 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T209031: Not able to scoop comment table in labs for mediawiki reconstruction process.

I'm aiming to write tests for this script shortly because it is too complex to not have them. Overloading the scripts functionality with something it wasn't written for makes me a bit nervous. It is already very easy to introduce mistakes requiring very careful review and manual QA in test dbs when I make updates.

Thu, Nov 15, 11:03 PM · Analytics-Kanban, DBA, Data-Services, Analytics
Bstorm added a comment to T209517: Upgrade/reboot labsdb* servers.

Thanks @awight. Does 11/20 @ 17:15 UTC sound good? I can work on that reboot while @aborrero does 1006.
@Banyek, will you be around for mysql upgrades or whatever?

Thu, Nov 15, 8:40 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm removed a project from T209627: Kubernetes and docker packages for stretch are needed for toolforge bastions: Patch-For-Review.
Thu, Nov 15, 7:41 PM · Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)
Bstorm triaged T209627: Kubernetes and docker packages for stretch are needed for toolforge bastions as High priority.
Thu, Nov 15, 6:55 PM · Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T209517: Upgrade/reboot labsdb* servers.

So far it looks like replication is picking up where it left off nicely on labsdb1007 (done).

Thu, Nov 15, 4:44 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm updated the task description for T209517: Upgrade/reboot labsdb* servers.
Thu, Nov 15, 4:43 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm added a comment to T209517: Upgrade/reboot labsdb* servers.

@aborrero I'd say it's worthy of notifying users for toolsdb/wikilabels (labsdb1004/5) and possibly osmdb (labsdb1006/7) masters but not the wiki replicas or the secondaries (except wikilabels). The users won't see any significant issue on the replicas.

Thu, Nov 15, 3:34 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm added a comment to T205713: Prepare and check storage layer for liwikinews.

That and create the _p db. Silly bugs.

Thu, Nov 15, 3:10 PM · cloud-services-team (Kanban), Cloud-Services, DBA

Wed, Nov 14

Bstorm added a comment to T176757: CamelCase vs. VPS instance naming.

Fun :)

Wed, Nov 14, 9:57 PM · cloud-services-team (Kanban)
Bstorm closed T189158: Change `image` view to properly expose the new `img_description_id` field as Resolved.
Wed, Nov 14, 7:58 PM · cloud-services-team (Kanban), Patch-For-Review, Data-Services
Bstorm closed T189158: Change `image` view to properly expose the new `img_description_id` field, a subtask of T188132: Merge image_comment_temp table into the image table, as Resolved.
Wed, Nov 14, 7:58 PM · MW-1.32-notes, Patch-For-Review, MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Core Platform Team Kanban (Blocked Externally), Core Platform Team ( Code Health (TEC13)), MW-1.31-release-notes (WMF-deploy-2018-04-03 (1.31.0-wmf.28)), MediaWiki-Database
Bstorm triaged T209530: Build user data backup service based on remote sync rather than NFS as Normal priority.
Wed, Nov 14, 7:55 PM · cloud-services-team (Kanban)
Bstorm added a comment to T209527: Set up scratch and maps NFS services on cloudstore1008/9.

Done so far:

Wed, Nov 14, 7:48 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm added a comment to T189158: Change `image` view to properly expose the new `img_description_id` field.

The patch is deployed throughout. Should that be it for this task @Anomie ?

Wed, Nov 14, 7:45 PM · cloud-services-team (Kanban), Patch-For-Review, Data-Services
Bstorm moved T209527: Set up scratch and maps NFS services on cloudstore1008/9 from Inbox to Doing on the cloud-services-team (Kanban) board.
Wed, Nov 14, 7:43 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm triaged T209527: Set up scratch and maps NFS services on cloudstore1008/9 as Normal priority.
Wed, Nov 14, 7:43 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm closed T193655: rack/setup/install cloudstore1008 & cloudstore1009 as Resolved.
Wed, Nov 14, 7:37 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Bstorm updated the task description for T193655: rack/setup/install cloudstore1008 & cloudstore1009.
Wed, Nov 14, 7:37 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Bstorm updated subscribers of T209517: Upgrade/reboot labsdb* servers.

@Halfak labsdb1004/5 would affect wikilabels. We may just do reboots in place like last time due to the tables that don't replicate per: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups_and_Replication

  • labsdb1004 is the replica for most tables on 1005, but it is the only server for wikilabels (just so that information is out there).
Wed, Nov 14, 5:58 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm updated subscribers of T209517: Upgrade/reboot labsdb* servers.
Wed, Nov 14, 5:44 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm added a project to T209517: Upgrade/reboot labsdb* servers: Data-Services.
Wed, Nov 14, 5:42 PM · User-Banyek, Patch-For-Review, Data-Services, cloud-services-team (Kanban), DBA
Bstorm added a comment to T193655: rack/setup/install cloudstore1008 & cloudstore1009.

This was just stuck at a prompt. Stupid mistake, the output after that stage of boot was redirected to the other console. Proceeding.

Wed, Nov 14, 5:35 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Bstorm added a comment to T193655: rack/setup/install cloudstore1008 & cloudstore1009.

Nothing. I guess this is just more digging, then, unless both systems are somehow broken.

Wed, Nov 14, 4:23 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Bstorm added a comment to T193655: rack/setup/install cloudstore1008 & cloudstore1009.

Redirection settings are confirmed correct. Looking around other settings in the docs.

Wed, Nov 14, 4:18 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Bstorm added a comment to T209480: labnet1001/labstore1004 combined alert on 2018-11-14.

There is a cap on the user connections (the user that eats connections being OpenStack aka Cloud VPS). It just has burst capabilities and can briefly go over what we have set. I suspect that with the limits we have in place, it cannot go all that much higher. A slightly higher limit would help us get through Neutron migrations (which is almost certainly what is causing the bursts in connections).

Wed, Nov 14, 3:37 PM · cloud-services-team (Kanban), DBA

Tue, Nov 13

Bstorm added a comment to T209031: Not able to scoop comment table in labs for mediawiki reconstruction process.

Note: I'm not done reading back yet--but yeah, that's what I was thinking of. There's a lot here.

Tue, Nov 13, 6:04 PM · Analytics-Kanban, DBA, Data-Services, Analytics
Bstorm added a comment to T209031: Not able to scoop comment table in labs for mediawiki reconstruction process.
  • Specialized views - Views for comments from each of revision, archive, and logging, separately. We have to test whether or not sqooping from these views would be fast enough, but it seems they would be useful for cloud db users in general.
  • Access to underlying tables - We could query the underlying tables, and that would bypass any performance problems we have with the views. We would duplicate the sanitizing logic from the views, and maintain it to be always the same as it is in cloud db. This would require special permissions to the cloud db.
  • Materialized views - This sounds like the best choice, as suggested by @Anomie. We thought they were discouraged by DBAs due to the implied slow-downs in replication. But if that's not a concern, let's do it!
Tue, Nov 13, 5:51 PM · Analytics-Kanban, DBA, Data-Services, Analytics
Bstorm added a comment to T207970: toolforge: add misctools and jobutils packages to stretch.

Works now!

Tue, Nov 13, 5:39 PM · Patch-For-Review, Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T207970: toolforge: add misctools and jobutils packages to stretch.

No, misctools is a latest install. It looks like it is trying to downgrade?
apt-get purgeing it :)

Tue, Nov 13, 5:39 PM · Patch-For-Review, Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T207970: toolforge: add misctools and jobutils packages to stretch.

I wonder if this is some buried pinning thing. Checking that.

Tue, Nov 13, 5:34 PM · Patch-For-Review, Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T207970: toolforge: add misctools and jobutils packages to stretch.
The following packages have unmet dependencies:
 misctools : Depends: mariadb-client-core-5.5 but it is not installable
E: Unable to correct problems, you have held broken packages.
Error: /Stage[main]/Profile::Toolforge::Grid::Exec_environ/Package[misctools]/ensure: change from 1.32 to 1.31 failed: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install misctools' returned 100: Reading package lists...
Tue, Nov 13, 5:27 PM · Patch-For-Review, Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)
Bstorm closed T209381: labels.wmflabs.org is down as Resolved.

I think we are good on this. Please re-open if I am wrong. More growing pains with the new region.

Tue, Nov 13, 5:24 PM · cloud-services-team (Kanban), Wikilabels, Scoring-platform-team (Current)
Bstorm triaged T209396: postgresql on labsdb1004 needs some kind of puppet management of pg_hba.conf as Normal priority.
Tue, Nov 13, 5:18 PM · Scoring-platform-team, Data-Services, cloud-services-team (Kanban), Wikilabels
Bstorm added a comment to T209381: labels.wmflabs.org is down.

Looks like it's working again.

Tue, Nov 13, 5:15 PM · cloud-services-team (Kanban), Wikilabels, Scoring-platform-team (Current)
Bstorm added a comment to T209381: labels.wmflabs.org is down.

Added the CIDR to pg_hba.conf, which is not overridden by puppet. Reloaded postgres.

Tue, Nov 13, 5:14 PM · cloud-services-team (Kanban), Wikilabels, Scoring-platform-team (Current)
Bstorm added a comment to T209381: labels.wmflabs.org is down.

2018-11-13 16:34:14 GMT FATAL: no pg_hba.conf entry for host "172.16.4.244", user "u_wikilabels", database "u_wikilabels", SSL off

Tue, Nov 13, 4:38 PM · cloud-services-team (Kanban), Wikilabels, Scoring-platform-team (Current)
Bstorm added a comment to T207970: toolforge: add misctools and jobutils packages to stretch.

Yeah, the repo was fine. The issue was the dependency declaration for the one package. Jobutils works I think.

Tue, Nov 13, 4:18 PM · Patch-For-Review, Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)

Fri, Nov 9

Bstorm added a subtask for T209189: Revisit and update python testing in puppet: T208783: Migrate tests from nose to pytest.
Fri, Nov 9, 9:38 PM · Operations, cloud-services-team (Kanban), Puppet, Proposal
Bstorm added a parent task for T208783: Migrate tests from nose to pytest: T209189: Revisit and update python testing in puppet.
Fri, Nov 9, 9:38 PM · Operations
Bstorm edited projects for T209189: Revisit and update python testing in puppet, added: cloud-services-team (Kanban); removed cloud-services-team.
Fri, Nov 9, 9:38 PM · Operations, cloud-services-team (Kanban), Puppet, Proposal
Bstorm added projects to T209189: Revisit and update python testing in puppet: Proposal, Puppet, cloud-services-team.
Fri, Nov 9, 9:37 PM · Operations, cloud-services-team (Kanban), Puppet, Proposal
Bstorm created T209189: Revisit and update python testing in puppet.
Fri, Nov 9, 9:36 PM · Operations, cloud-services-team (Kanban), Puppet, Proposal
Bstorm added a comment to T208783: Migrate tests from nose to pytest.

I'm now poking around also at what it would look like if all the python my team uses ended up in separate packages (debs etc), and I don't hate it...🤔

Fri, Nov 9, 12:11 AM · Operations

Thu, Nov 8

Bstorm added a comment to T209117: Large MySQL query to commonswiki.labsdb dies with `ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query`.

@dschwen Do you have some idea what time it gets killed at? Can you set a timer in your script (if you haven't already)?

Thu, Nov 8, 10:23 PM · cloud-services-team (Kanban), Data-Services
Bstorm added a comment to T209117: Large MySQL query to commonswiki.labsdb dies with `ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query`.

@Banyek This appears to be affected by the query killer wmf-pt-kill. What is the current limit placed on things? I wonder if this just goes too long or if it should be changed/tuned to allow longer queries?

Thu, Nov 8, 10:20 PM · cloud-services-team (Kanban), Data-Services
Bstorm edited projects for T209117: Large MySQL query to commonswiki.labsdb dies with `ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query`, added: Data-Services, cloud-services-team (Kanban); removed Cloud-VPS.
Thu, Nov 8, 10:19 PM · cloud-services-team (Kanban), Data-Services
Bstorm updated subscribers of T209117: Large MySQL query to commonswiki.labsdb dies with `ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query`.
Thu, Nov 8, 10:18 PM · cloud-services-team (Kanban), Data-Services
Bstorm added a comment to T207970: toolforge: add misctools and jobutils packages to stretch.

toolsbeta-sgebastion-03 is actually able to complete puppet runs (on the other hand)! Exec nodes are held up by misctools deps being incorrect, but I thought I'd add the happy note.

Thu, Nov 8, 8:52 PM · Patch-For-Review, Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T207970: toolforge: add misctools and jobutils packages to stretch.

Fixed the network access issue cross-region. However:

Thu, Nov 8, 7:53 PM · Patch-For-Review, Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T208783: Migrate tests from nose to pytest.

To put it a different way, I'm uncomfortable deploying untested python to the environment in cloud and generally prefer to test infrastructure code in general (rspec/unittest or whatever). At least within the scope of cloud materials in the repo, I'm trying to bring it all into a testable form (even if just linting). I got python3 in the containers, and the python3 tests work great where they are (only in my tests right now), and I ensured that the tests are conditionally run. (Note, the tests I put up are in the sonofgridengine module, if you are curious and have thoughts.)

Thu, Nov 8, 6:59 PM · Operations
Bstorm added a comment to T208783: Migrate tests from nose to pytest.

I recently added python tests to one of my modules with a significant python script in it. I honestly don't see how a python script that isn't tested that does something complicated enough to merit being in python should be in puppet (on the flip side of this). I have been on a personal crusade to start implementing testing discipline for code managed by the cloud team so that taking over existing projects is safer and more consistent.

Thu, Nov 8, 6:52 PM · Operations
Bstorm moved T209075: Add some exec nodes to the grid from Inbox to Doing on the cloud-services-team (Kanban) board.
Thu, Nov 8, 4:58 PM · cloud-services-team (Kanban), Toolforge
Bstorm claimed T209075: Add some exec nodes to the grid.
Thu, Nov 8, 4:57 PM · cloud-services-team (Kanban), Toolforge
Bstorm updated the task description for T209075: Add some exec nodes to the grid.
Thu, Nov 8, 4:56 PM · cloud-services-team (Kanban), Toolforge
Bstorm closed T208940: Grid slow on Toolforge as Resolved.
Thu, Nov 8, 4:55 PM · cloud-services-team (Kanban), Toolforge
Bstorm triaged T209075: Add some exec nodes to the grid as Normal priority.
Thu, Nov 8, 4:52 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T208940: Grid slow on Toolforge.

I think we may want to add some nodes in general as well. The number of waiting jobs is dropped to a much more reasonable level, but it's been a while since new nodes joined. Let's make a subtask for that. Tasks in qw are now down to 2.

Thu, Nov 8, 4:49 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T208940: Grid slow on Toolforge.

I'll enable the other disabled nodes, either way. :)

Thu, Nov 8, 3:54 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T208940: Grid slow on Toolforge.

is -mem 8g the setting it has always had? I think that's the total actual RAM on an exec node. That would mean it would need a totally quiet node to run on.

Thu, Nov 8, 3:54 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T208940: Grid slow on Toolforge.

I can say that my own submissions (which are not CPU heavy, but they are very RAM heavy) are going through fine once a day, so it is flowing at least.

Thu, Nov 8, 3:52 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T208940: Grid slow on Toolforge.

We can use this:
$ qstat -u "*" | grep qw | wc -l
174

Thu, Nov 8, 3:35 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T208940: Grid slow on Toolforge.

There are several exec nodes disabled, probably from forgotten rebalancing efforts. I've just enabled one of them and can enable more, though I'm poking around in case I can find anything that's really busted.

Thu, Nov 8, 3:29 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T208940: Grid slow on Toolforge.

Can you give me the commands you are using for this so I'm doing a straight one-to-one comparison? Also what jobs are yours? I'm poking around this.

Thu, Nov 8, 3:22 PM · cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T208579: tools-mail: Migrate to Stretch.

Do we know if email submissions are tagged with the queue "mailq" from any of the script or settings what you are stripping out?

Thu, Nov 8, 3:06 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS (Ubuntu Trusty Deprecation)

Wed, Nov 7

Bstorm added a comment to T208221: tools-service: Build missing packages on Stretch.

I still get errors for toollabs-webservice as well on stretch.

Wed, Nov 7, 10:25 PM · Cloud-VPS (Ubuntu Trusty Deprecation), cloud-services-team (Kanban)
Bstorm added a comment to T208221: tools-service: Build missing packages on Stretch.

Oh yes, bastions need misctools as well (confirmed)

Wed, Nov 7, 10:18 PM · Cloud-VPS (Ubuntu Trusty Deprecation), cloud-services-team (Kanban)
Bstorm reopened T208221: tools-service: Build missing packages on Stretch, a subtask of T207591: tools-service: Document current services and try them on Stretch, as Open.
Wed, Nov 7, 10:14 PM · Patch-For-Review, Cloud-VPS (Ubuntu Trusty Deprecation), cloud-services-team (Kanban)
Bstorm reopened T208221: tools-service: Build missing packages on Stretch as "Open".

E: Unable to locate package jobutils
Error: /Stage[main]/Gridengine::Submit_host/Package[jobutils]/ensure: change from purged to present failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install jobutils' returned 100: Reading package lists...
Building dependency tree...

Wed, Nov 7, 10:14 PM · Cloud-VPS (Ubuntu Trusty Deprecation), cloud-services-team (Kanban)
Bstorm added a comment to T208873: Add python3 to the containers that run in CI for puppet.

Thank you!!!

Wed, Nov 7, 5:51 PM · Patch-For-Review, Continuous-Integration-Infrastructure, cloud-services-team (Kanban)
Bstorm edited projects for T208916: cloudvps: neutron issue with split brain, added: cloud-services-team (Kanban); removed cloud-services-team.

After a puppet change earlier today, the openstack the neutron servers were both brought up running at the same time as masters--split brain. We resolved this by simply rebooting one of them so it took on the standby role, but removing the puppet service restart is in order until we can make it better handle the failover setup.

Wed, Nov 7, 12:59 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS

Tue, Nov 6

Bstorm triaged T208873: Add python3 to the containers that run in CI for puppet as Normal priority.
Tue, Nov 6, 6:34 PM · Patch-For-Review, Continuous-Integration-Infrastructure, cloud-services-team (Kanban)
Bstorm added a comment to T203254: labstore1004 and labstore1005 high load issues following upgrades.

labstore1001/2 are the walking dead, FYI. They are blank spares to be decommissioned when 8/9 replace 1003. 1003 is only still there because we don't have replacements up yet. 1003 is scratch and misc, which is more transient data in some cases.

Tue, Nov 6, 3:35 PM · Patch-For-Review, cloud-services-team (Kanban)
Bstorm added a comment to T193655: rack/setup/install cloudstore1008 & cloudstore1009.

Oh thanks! I'll take a look at that. I figure it must either be a BIOS config or possibly kernel option issue.

Tue, Nov 6, 3:10 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations

Thu, Nov 1

Bstorm moved T189158: Change `image` view to properly expose the new `img_description_id` field from Inbox to Doing on the cloud-services-team (Kanban) board.
Thu, Nov 1, 11:00 PM · cloud-services-team (Kanban), Patch-For-Review, Data-Services
Bstorm added a project to T189158: Change `image` view to properly expose the new `img_description_id` field: cloud-services-team (Kanban).
Thu, Nov 1, 11:00 PM · cloud-services-team (Kanban), Patch-For-Review, Data-Services
Bstorm claimed T189158: Change `image` view to properly expose the new `img_description_id` field.
Thu, Nov 1, 10:59 PM · cloud-services-team (Kanban), Patch-For-Review, Data-Services
Bstorm added a comment to T189158: Change `image` view to properly expose the new `img_description_id` field.

So would the replicated stuff on the replica servers be ready for me to run the scripts? Just double-checking before I go on with depooling and regenerating the views.

Thu, Nov 1, 10:59 PM · cloud-services-team (Kanban), Patch-For-Review, Data-Services

Wed, Oct 31

Bstorm added a comment to T193655: rack/setup/install cloudstore1008 & cloudstore1009.

Aaaand same freeze when installing on 4.14. That's fun. I can try the kernel on cloudstore1009 as well, but cloudstore1009 so far behaves the same as 08 in general.

Wed, Oct 31, 6:05 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Bstorm added a comment to T193655: rack/setup/install cloudstore1008 & cloudstore1009.

Yes, the timestamps for GET requests for the right scripts on install1002 are there when I reset one of the servers.

Wed, Oct 31, 4:26 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Bstorm added a comment to T208357: toolforge - Deprecate BigBrother in Grid Engine.

Bigbrother seems like a false sense of security in some ways because it doesn't trigger alerts for reboot loops and things like that (which I've seen it doing before). So, I'm not sure it is providing very good service in the first place.

Wed, Oct 31, 3:18 PM · Toolforge, cloud-services-team (Kanban)
jijiki awarded T204033: Request creation of k8splay VPS project a Like token.
Wed, Oct 31, 7:22 AM · cloud-services-team (Kanban), User-jijiki, Cloud-VPS (Project-requests)