Andrew (Andrew Bogott)
User

Projects (9)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2014, 11:35 PM (145 w, 5 d)
Availability
Available
IRC Nick
andrewbogott
LDAP User
Unknown
MediaWiki User
Andrewbogott

Recent Activity

Wed, Aug 9

Andrew added a comment to T167556: Define a metric to track OpenStack system availability.

Here are some user-facing things that I'd like to have metrics for:

Wed, Aug 9, 7:18 PM · Goal, cloud-services-team (FY2017-18)

Tue, Aug 8

Andrew closed T171606: nova-compute process check flapping as Resolved.
Tue, Aug 8, 7:38 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T170828: Build VPS base images as Resolved.
Tue, Aug 8, 7:37 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew closed T120683: logrotate/disk space on silver for nutcracker log as Resolved.

When all is well, silver is fine -- If something breaks that increases the logging rate then I get alerts.

Tue, Aug 8, 1:28 PM · Cloud-VPS, Operations, Cloud-Services

Mon, Aug 7

Andrew added a comment to T169350: Build new tools puppetmaster.

clush is back up and running on a new host. The only action-item remaining here is the deletion of the old tools-puppetmaster-02.

Mon, Aug 7, 8:19 PM · cloud-services-team (Kanban), Cloud-Services, Toolforge
Andrew added a comment to T161554: Provide large disk space to WikiBrain for memory-mapped file.

OK, after a quick chat with Aaron, I've created two big VMs for you:

Mon, Aug 7, 3:30 PM · Cloud-VPS (Project-requests), artificial-intelligence
Andrew renamed T171606: nova-compute process check flapping from nova-compute process monitoring: check twice to nova-compute process check flapping.
Mon, Aug 7, 2:57 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T168962: wikitech-static sync check shouldn't happen so often as Resolved.
Mon, Aug 7, 2:41 PM · Patch-For-Review, Cloud-Services, Operations
Andrew closed T167157: rack/setup/install labtestpuppetmaster2001 as Resolved.

This is up and working.

Mon, Aug 7, 2:26 PM · Patch-For-Review, Cloud-VPS, Operations
Andrew updated the task description for T171786: Switch to new labs puppetmasters.
Mon, Aug 7, 2:22 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew closed T171641: Change nova fullstack test back to testing all hosts as Resolved.

Resolved by https://gerrit.wikimedia.org/r/#/c/368454/

Mon, Aug 7, 1:48 PM · Patch-For-Review, cloud-services-team

Sat, Aug 5

Andrew moved T170828: Build VPS base images from Doing to Done on the cloud-services-team (Kanban) board.
Sat, Aug 5, 1:59 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew moved T168110: Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15 from Doing to Done on the cloud-services-team (Kanban) board.
Sat, Aug 5, 1:59 PM · cloud-services-team (Kanban), Operations, Cloud-VPS, Cloud-Services

Fri, Aug 4

Andrew updated the task description for T171786: Switch to new labs puppetmasters.
Fri, Aug 4, 8:13 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew updated the task description for T171786: Switch to new labs puppetmasters.
Fri, Aug 4, 3:42 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew updated the task description for T171786: Switch to new labs puppetmasters.
Fri, Aug 4, 2:51 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS

Thu, Aug 3

Andrew closed T167905: rack/setup/install labpuppetmaster100[12].wikimedia.org as Resolved.
Thu, Aug 3, 9:00 PM · Cloud-VPS, Operations
Andrew closed T167905: rack/setup/install labpuppetmaster100[12].wikimedia.org, a subtask of T168110: Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15, as Resolved.
Thu, Aug 3, 9:00 PM · cloud-services-team (Kanban), Operations, Cloud-VPS, Cloud-Services
Andrew closed T172064: New Trusty base images hang on boot as Resolved.
Thu, Aug 3, 8:59 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew closed T172064: New Trusty base images hang on boot, a subtask of T170828: Build VPS base images, as Resolved.
Thu, Aug 3, 8:59 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew added a comment to T172344: Can't create or manage domains in the PAWS project.

As far as I can tell, paws.wmflabs.org (the record) can only be in paws.wmflabs.org (the domain). So having the stand-alone record be in a different project from the domain-with-subrecords won't work. That means that #2 is probably impossible to do within tools if we want *.paws.wmflabs.org to be owned by the paws project.

Thu, Aug 3, 8:55 PM · Cloud-Services
Andrew closed T171061: puppet::self may be unused as Invalid.

so, ok, clearly not unused.

Thu, Aug 3, 8:37 PM · Cloud-VPS
Andrew updated the task description for T171786: Switch to new labs puppetmasters.
Thu, Aug 3, 8:34 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew updated the task description for T171786: Switch to new labs puppetmasters.
Thu, Aug 3, 4:35 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew added a comment to T172064: New Trusty base images hang on boot.

btw, it turns out the init script does not observe the START=no setting in /etc/default/puppet which I tried ages ago and which is why I decided that this wasn't a puppet problem :(

Thu, Aug 3, 3:36 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew added a comment to T172064: New Trusty base images hang on boot.
  • Old images use puppet 3.4.3
  • New image suse puppet 3.8.5
Thu, Aug 3, 3:35 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew added a comment to T172064: New Trusty base images hang on boot.

Bisect blames 2b18741526dff42582d84c25e0a8a7fddec080f0 which is very surprising! I need to test more but it seems to be right.

Thu, Aug 3, 12:00 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS

Wed, Aug 2

Andrew added a comment to T172064: New Trusty base images hang on boot.

I am able to build booting images if I roll back to puppet patch ffdfa2821bca02a0ec013d1e618d4d9690f7ec7d

Wed, Aug 2, 6:53 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS

Mon, Jul 31

Andrew added a comment to T172064: New Trusty base images hang on boot.
  • removing all custom-installed packages
  • building on a fresh build instance
Mon, Jul 31, 10:39 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew moved T170828: Build VPS base images from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Jul 31, 6:04 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew moved T172064: New Trusty base images hang on boot from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Jul 31, 6:04 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew closed T171959: instance root passwords vs. multiple puppetmasters as Resolved.

Due to various concerns we're going to just disable these passwords for now.

Mon, Jul 31, 6:04 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew closed T171959: instance root passwords vs. multiple puppetmasters, a subtask of T171786: Switch to new labs puppetmasters, as Resolved.
Mon, Jul 31, 6:04 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew closed T172111: Fix fqdn for promethium as Resolved.

that was easy

Mon, Jul 31, 1:47 PM · cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew closed T172111: Fix fqdn for promethium, a subtask of T171786: Switch to new labs puppetmasters, as Resolved.
Mon, Jul 31, 1:47 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew created T172111: Fix fqdn for promethium.
Mon, Jul 31, 1:37 PM · cloud-services-team (Kanban), Operations, Cloud-VPS

Sun, Jul 30

Andrew added a comment to T172064: New Trusty base images hang on boot.

Here are similar log snippets from an old, working image.

Sun, Jul 30, 9:13 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew added a comment to T172064: New Trusty base images hang on boot.
  • building with a 4.x kernel instead of the default trusty kernel
  • rearranging the disk volumes to be /sda rather than /vda
Sun, Jul 30, 8:40 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew created T172077: removing a user from projectadmin on wikitech produces a blank page.
Sun, Jul 30, 8:20 PM · MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), User-bd808, Patch-For-Review, cloud-services-team (Kanban), wikitech.wikimedia.org
Andrew added a comment to T172064: New Trusty base images hang on boot.

There are quite a few google hits about hangs after that 'random: nonblocking pool is initialized' message. Things I've tried:

Sun, Jul 30, 4:29 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew updated the task description for T172064: New Trusty base images hang on boot.
Sun, Jul 30, 4:29 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew created T172064: New Trusty base images hang on boot.
Sun, Jul 30, 4:25 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS

Fri, Jul 28

Andrew updated the task description for T171786: Switch to new labs puppetmasters.
Fri, Jul 28, 3:08 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew added a subtask for T171786: Switch to new labs puppetmasters: Unknown Object (Task).
Fri, Jul 28, 3:07 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew created T171959: instance root passwords vs. multiple puppetmasters.
Fri, Jul 28, 2:20 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS

Thu, Jul 27

Andrew updated the task description for T171786: Switch to new labs puppetmasters.
Thu, Jul 27, 10:59 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew created T171880: Add AAAA records for labpuppetmaster1001 and 1002.
Thu, Jul 27, 6:00 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew updated the task description for T171786: Switch to new labs puppetmasters.
Thu, Jul 27, 4:11 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS
Andrew closed T171313: novaadmin removed from many keystone projects as Resolved.

Attached patch should resolve the ultimate cause.

Thu, Jul 27, 2:57 PM · cloud-services-team (Kanban), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), wikitech.wikimedia.org
Andrew closed T171313: novaadmin removed from many keystone projects, a subtask of T171280: wikitech api list=novainstances not returning list of instances, as Resolved.
Thu, Jul 27, 2:56 PM · Operations, Cloud-Services

Wed, Jul 26

Andrew created T171786: Switch to new labs puppetmasters.
Wed, Jul 26, 8:48 PM · Patch-For-Review, cloud-services-team (Kanban), Operations, Cloud-VPS

Tue, Jul 25

Andrew created T171641: Change nova fullstack test back to testing all hosts.
Tue, Jul 25, 7:23 PM · Patch-For-Review, cloud-services-team
Andrew created T171606: nova-compute process check flapping.
Tue, Jul 25, 3:51 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T169811: Request increase quota for ores-staging to 52GB RAM as Resolved.
Tue, Jul 25, 2:18 PM · User-bd808, Cloud-VPS (Quota-requests), ORES, Scoring-platform-team
Andrew closed T169811: Request increase quota for ores-staging to 52GB RAM, a subtask of T140904: Existing Labs project quota increase requests (Tracking), as Resolved.
Tue, Jul 25, 2:18 PM · User-bd808, Tracking, Cloud-Services
Andrew closed T169811: Request increase quota for ores-staging to 52GB RAM, a subtask of T169809: Set up larger ores-compute instance, as Resolved.
Tue, Jul 25, 2:18 PM · ORES, Scoring-platform-team

Mon, Jul 24

Andrew added a comment to T171188: Move the main WMCS puppetmaster into the Labs realm.

Here are some things that need to be thought about/figured out before we can go forward:

Mon, Jul 24, 3:58 PM · Puppet, Cloud-VPS, Operations
Andrew added a comment to T171188: Move the main WMCS puppetmaster into the Labs realm.

I'm pretty sure that #1 is moot -- at least, anytime we discuss it we conclude that the 'labs-support' vlan isn't really a useful concept and should be eliminated.

Mon, Jul 24, 3:56 PM · Puppet, Cloud-VPS, Operations
Andrew triaged T164847: Striker gives fatal error when a SUL account already in use tries to attach to a second LDAP account as Normal priority.
Mon, Jul 24, 3:35 PM · Patch-For-Review, cloud-services-team (Kanban), Striker, User-bd808
Andrew added a comment to T166712: Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation.

Is this tagged with cloud-services-team in error, or is there something you need from us?

Mon, Jul 24, 3:32 PM · Analytics, cloud-services-team
Andrew triaged T170447: Set good availability-zone defaults for nova users as Normal priority.
Mon, Jul 24, 3:25 PM · Cloud-VPS, cloud-services-team (Kanban)
Andrew closed T138809: Clicking on a project name in Identity logs you out of Horizon as Resolved.
Mon, Jul 24, 3:15 PM · Upstream, Cloud-Services, Horizon
Andrew added a comment to T171473: labvirt1015 crashes.

Thank you, Chris! This is new hardware and we can live without it... can we leave this in your hands to follow up with Dell? Is there any additional info you need?

Mon, Jul 24, 2:10 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations
Andrew added a comment to T171473: labvirt1015 crashes.

(I should note that there's no data of interest on that box -- reimaging is just fine)

Mon, Jul 24, 1:56 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations
Andrew created T171473: labvirt1015 crashes.
Mon, Jul 24, 1:52 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations

Fri, Jul 21

Andrew closed T171280: wikitech api list=novainstances not returning list of instances as Resolved.

I have a fix to prevent this from happening again... in the meantime I've added novaadmin back to everything.

Fri, Jul 21, 10:16 PM · Operations, Cloud-Services
Andrew added a subtask for T171280: wikitech api list=novainstances not returning list of instances: T171313: novaadmin removed from many keystone projects.
Fri, Jul 21, 9:52 PM · Operations, Cloud-Services
Andrew added a parent task for T171313: novaadmin removed from many keystone projects: T171280: wikitech api list=novainstances not returning list of instances.
Fri, Jul 21, 9:52 PM · cloud-services-team (Kanban), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), wikitech.wikimedia.org
Andrew created P5782 big slice of labtestwiki ldap logs.
Fri, Jul 21, 8:02 PM
Andrew added a comment to T171313: novaadmin removed from many keystone projects.

So currently I think this was caused by a misfire in OpenStackManager's removeUserFromBastionProject():

Fri, Jul 21, 4:18 PM · cloud-services-team (Kanban), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), wikitech.wikimedia.org
Andrew added a comment to T171313: novaadmin removed from many keystone projects.

The first sign of trouble in the keystone log is:

Fri, Jul 21, 3:35 PM · cloud-services-team (Kanban), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), wikitech.wikimedia.org
Andrew added a comment to T171313: novaadmin removed from many keystone projects.

In total it was removed from 53 projects. I'm now checking to see if any one user is in all of those projects (other than novaadmin)

Fri, Jul 21, 3:23 PM · cloud-services-team (Kanban), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), wikitech.wikimedia.org
Andrew added a comment to T171313: novaadmin removed from many keystone projects.

We've replaced novaadmin in deployment-prep; now it's missing from the following:

Fri, Jul 21, 3:06 PM · cloud-services-team (Kanban), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), wikitech.wikimedia.org
Andrew created T171313: novaadmin removed from many keystone projects.
Fri, Jul 21, 3:01 PM · cloud-services-team (Kanban), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), wikitech.wikimedia.org
Andrew added a comment to T171280: wikitech api list=novainstances not returning list of instances.

I just can't think of any reason why those roles would've been removed :( investigating

Fri, Jul 21, 2:49 PM · Operations, Cloud-Services
Andrew added a comment to T171280: wikitech api list=novainstances not returning list of instances.

There was a brief period when novaadmin couldn't log in, is it possible you just caught it at a bad moment? The above curl seems ok to me now.

Fri, Jul 21, 2:26 PM · Operations, Cloud-Services

Thu, Jul 20

Andrew closed T171069: Cannot login/change password to MABot@wikitech as Resolved.

Yep, all looks good to me.

Thu, Jul 20, 8:29 PM · cloud-services-team, wikitech.wikimedia.org
Andrew added a comment to T171069: Cannot login/change password to MABot@wikitech.

Ok, I renamed MABot to 'MABot former'. I think when you retry you should log out entirely and create the account as though you are a new user -- that's the path that is most tested.

Thu, Jul 20, 7:23 PM · cloud-services-team, wikitech.wikimedia.org
Andrew added a comment to T171005: Deploy TemplateStyles for wikitech-static.

The extension is now installed and loading on wikitech-static. It looks terrible, for now -- waiting to see if a re-sync fixes things.

Thu, Jul 20, 6:43 PM · wikitech.wikimedia.org
Andrew closed T170854: Update mediawiki on wikitech-static as Resolved.

Now running 1.29.0 (52abe24)

Thu, Jul 20, 6:42 PM · wikitech.wikimedia.org, Operations

Jul 20 2017

Andrew closed T171116: Build updated labvirt-star cert as Resolved.
Jul 20 2017, 2:15 PM · Patch-For-Review, cloud-services-team
Andrew closed T171158: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded as Resolved.

I resolved this by running the query in https://ask.openstack.org/en/question/494/how-to-reset-incorrect-quota-count/

Jul 20 2017, 1:56 PM · Release-Engineering-Team (Kanban), Wikimedia-Incident, Continuous-Integration-Infrastructure, Cloud-VPS
Andrew created T171136: dss keys disabled prematurely.
Jul 20 2017, 1:58 AM · Cloud-VPS, cloud-services-team

Jul 19 2017

Andrew added a comment to T171069: Cannot login/change password to MABot@wikitech.

I see the MABot account in the wikitech user table but don't see an ldap record. It might be that the creation process you followed just doesn't work right, or it might be that there was some kind of unreported collision during creation (possibly due to the re-used email address, although that would surprise me.)

Jul 19 2017, 11:25 PM · cloud-services-team, wikitech.wikimedia.org
Andrew added a comment to T171069: Cannot login/change password to MABot@wikitech.

It's possible that you were unlucky and hit us in the middle of an ldap outage... does the same happen if you try now?

Jul 19 2017, 10:35 PM · cloud-services-team, wikitech.wikimedia.org
Andrew updated the task description for T171116: Build updated labvirt-star cert.
Jul 19 2017, 10:11 PM · Patch-For-Review, cloud-services-team
Andrew created T171116: Build updated labvirt-star cert.
Jul 19 2017, 10:10 PM · Patch-For-Review, cloud-services-team
Andrew added a comment to T170828: Build VPS base images.

Jessie and Stretch are updated. There are unexpected issues with the Trusty build which I'm working on.

Jul 19 2017, 6:18 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
Andrew created T171061: puppet::self may be unused.
Jul 19 2017, 3:13 PM · Cloud-VPS

Jul 18 2017

Andrew added projects to T170944: RFC: What to do about wikitech per-project puppet config?: cloud-services-team, Cloud-VPS.
Jul 18 2017, 3:49 PM · Cloud-VPS, cloud-services-team
Andrew renamed T170944: RFC: What to do about wikitech per-project puppet config? from What to do about wikitech per-project puppet config? to RFC: What to do about wikitech per-project puppet config?.
Jul 18 2017, 3:40 PM · Cloud-VPS, cloud-services-team
Andrew created T170944: RFC: What to do about wikitech per-project puppet config?.
Jul 18 2017, 3:37 PM · Cloud-VPS, cloud-services-team
Andrew added a comment to T127771: Make labs wikitech role aware.

Is this still meaningful now that there's no puppet config on wikitech? Does Horizon have the same issue?

Jul 18 2017, 3:32 PM · Cloud-VPS, Cloud-Services
Andrew committed R2073:3161e42d5289: Add hierakey template for hiera key browsing (authored by Andrew).
Add hierakey template for hiera key browsing
Jul 18 2017, 4:01 AM
Andrew committed R2073:e5dad00088fe: Add some hiera key browsing (authored by Andrew).
Add some hiera key browsing
Jul 18 2017, 3:58 AM
Andrew committed R2073:f01a8ba86b0d: split redundant functions out of classes() and hiera() (authored by Andrew).
split redundant functions out of classes() and hiera()
Jul 18 2017, 3:58 AM
Andrew committed R2073:38d819165322: add hiera config to server display (authored by Andrew).
add hiera config to server display
Jul 18 2017, 2:03 AM

Jul 17 2017

Andrew committed R2073:b79d793f7471: Add a dsh endpoint for all servers in all projects (authored by Andrew).
Add a dsh endpoint for all servers in all projects
Jul 17 2017, 5:54 PM
Andrew created T170854: Update mediawiki on wikitech-static.
Jul 17 2017, 5:37 PM · wikitech.wikimedia.org, Operations
Andrew committed R2073:e1ea36b019e0: Add a 'servers' tab and a servers page (authored by Andrew).
Add a 'servers' tab and a servers page
Jul 17 2017, 5:11 PM