Page MenuHomePhabricator

Andrew (Andrew Bogott)
User

Projects (10)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2014, 11:35 PM (265 w, 4 d)
Availability
Available
IRC Nick
andrewbogott
LDAP User
Unknown
MediaWiki User
Andrewbogott [ Global Accounts ]

Recent Activity

Today

Andrew committed rLPRI3e3415b772f8: remove a dangling comma (authored by Andrew).
remove a dangling comma
Fri, Dec 6, 9:24 AM
Andrew added a comment to T145703: Horizon loses credentials every day.

This might be related to this:

Fri, Dec 6, 8:17 AM · Security, cloud-services-team (Kanban), Horizon
Andrew created T239974: Upgrade Horizon to version 'train'.
Fri, Dec 6, 6:13 AM · cloud-services-team

Yesterday

Andrew added a comment to T237749: Upgrade wmcs OpenStack version to Ocata.

codfw1-dev is running Ocata, and I've scheduled an upgrade window for eqiad1. Approximate steps are...

This sounds right to me. Thanks!
One question, a few new nova services were introduced but I don't see any reference to them in the steps. No steps are required? I'm thinking on the placement and cell stuff. Just double-checking.

Thu, Dec 5, 12:51 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew added a comment to T237749: Upgrade wmcs OpenStack version to Ocata.

codfw1-dev is running Ocata, and I've scheduled an upgrade window for eqiad1. Approximate steps are...

Thu, Dec 5, 12:21 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T236570: "wikimetrics" Cloud VPS project jessie deprecation as Resolved.

This project has been deleted.

Thu, Dec 5, 6:54 AM · Cloud-VPS (Debian Jessie Deprecation)
Andrew added a comment to T235685: (No Need By Date Provided) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.

*bump*

Thu, Dec 5, 3:54 AM · Operations, ops-eqiad
Andrew added a comment to T239884: Replace labstore2003/2004 with cloudbackup2001/2002.

The new cloudbackup hosts are now up and running. The tools backup happens on Thursdays and the misc projects backup happens on Tuesdays.

Thu, Dec 5, 3:51 AM · cloud-services-team (Kanban)
Andrew moved T239884: Replace labstore2003/2004 with cloudbackup2001/2002 from Inbox to Doing on the cloud-services-team (Kanban) board.
Thu, Dec 5, 3:48 AM · cloud-services-team (Kanban)
Andrew created T239884: Replace labstore2003/2004 with cloudbackup2001/2002.
Thu, Dec 5, 3:48 AM · cloud-services-team (Kanban)
Andrew closed T238226: cloudbackup2002 doesn't boot, times out in enabling /dev/mapper/cloudbackup--vg-data as Resolved.
Thu, Dec 5, 3:42 AM · cloud-services-team
Andrew closed T236580: "wikifactmine" Cloud VPS project jessie deprecation as Resolved.

All of these VMs have been deleted and the project is pending deletion at the end of the year.

Thu, Dec 5, 3:24 AM · Cloud-VPS (Debian Jessie Deprecation)
Andrew updated the task description for T238181: 2019 Cloud Services annual survey.
Thu, Dec 5, 3:18 AM · cloud-services-team (Kanban), Cloud-Services
Andrew updated the task description for T238181: 2019 Cloud Services annual survey.
Thu, Dec 5, 3:17 AM · cloud-services-team (Kanban), Cloud-Services

Wed, Dec 4

Andrew added a comment to T212573: Request creation of indico VPS project.

As per https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2019_Purge, this project is now a candidate for deletion since no one has claimed it on wiki or responded to my emails. It's not too late to indicate otherwise on that page if it's still of use to someone.

Wed, Dec 4, 3:13 AM · Cloud-VPS (Project-requests)

Mon, Dec 2

Andrew added a comment to T239569: cloudstore1008 crash - Memory error.

I see nothing at all in the syslog that would explain this crash -- just an empty spot

Mon, Dec 2, 8:43 AM · Operations, ops-eqiad, cloud-services-team (Kanban), Cloud-Services
Andrew created T239569: cloudstore1008 crash - Memory error.
Mon, Dec 2, 7:19 AM · Operations, ops-eqiad, cloud-services-team (Kanban), Cloud-Services
Andrew closed T239160: nova: set up cell and host mappings, a subtask of T237749: Upgrade wmcs OpenStack version to Ocata, as Resolved.
Mon, Dec 2, 6:35 AM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T239160: nova: set up cell and host mappings as Resolved.

I ran

Mon, Dec 2, 6:35 AM · cloud-services-team (Kanban)

Wed, Nov 27

Andrew closed T239360: determine status of cloudweb2001-dev, a subtask of T220426: labtestnet2002: repurpose as cloudweb2001-dev.wikimedia.org, as Resolved.
Wed, Nov 27, 9:21 PM · ops-codfw, DC-Ops, Cloud-VPS, cloud-services-team (Kanban), Operations
Andrew closed T239360: determine status of cloudweb2001-dev as Resolved.

This is an actively used server. It hasn't been reporting to puppet because of:

Wed, Nov 27, 9:21 PM · DC-Ops, cloud-services-team
Andrew closed T239161: nova: start using the placement service as Resolved.

this is now deployed on eqiad1-r

Wed, Nov 27, 5:37 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T239161: nova: start using the placement service, a subtask of T237749: Upgrade wmcs OpenStack version to Ocata, as Resolved.
Wed, Nov 27, 5:37 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew updated the task description for T239161: nova: start using the placement service.
Wed, Nov 27, 5:28 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew added a comment to T196797: When changes to a Designate zone occur (e.g. record creations/deletions), there is a brief period in which the entire zone is NXDOMAIN.

@Krenair when you have time can you retest this? I suspect that it was fixed by various upgrades.

Wed, Nov 27, 4:49 PM · cloud-services-team (Kanban), Beta-Cluster-reproducible, Cloud-VPS
Andrew renamed T170370: Investigate use of Puppet "environments" for per-project Puppet manifests from Invesitgate use of Puppet "environments" for per-project Puppet manifests to Investigate use of Puppet "environments" for per-project Puppet manifests.
Wed, Nov 27, 4:45 PM · cloud-services-team (Kanban), Puppet, Cloud-VPS
Andrew closed T180916: Puppet flapping on mounting /mnt/nfs/labstore-secondary-project failures ("Device busy or already mounted") as Resolved.

closing until someone reports this happening again

Wed, Nov 27, 4:45 PM · cloud-services-team (Kanban), Data-Services, Cloud-VPS
Andrew closed T42022: Add icinga checks for all nova, glance, and keystone related services as Resolved.

I think we have as many of these as we need now :)

Wed, Nov 27, 4:42 PM · cloud-services-team (Kanban), Patch-For-Review, observability, Cloud-VPS
Andrew closed T206787: Neutron migrate is causing hosts to show up in grafana as <host>-<neutron-ip> as Resolved.

I think this is moot now since the migration is long since finished.

Wed, Nov 27, 4:38 PM · cloud-services-team (Kanban), Cloud-Services
Andrew closed T201082: labtestweb2001 is sending updates to a read-only db host: db2037 as Resolved.

using a new, local-to-codfw1dev-database now

Wed, Nov 27, 4:36 PM · cloud-services-team (Kanban), wikitech.wikimedia.org, Wikimedia-production-error
Andrew closed T159165: GPU resources for Labs as Declined.

we don't have any near-term plans to support this.

Wed, Nov 27, 4:33 PM · cloud-services-team (Kanban), Scoring-platform-team, artificial-intelligence, Cloud-VPS
Andrew closed T153036: Horizon has no logging or watchlist for changes to Puppet/Hiera data as Resolved.

resolved with instance-puppet git repo

Wed, Nov 27, 4:32 PM · cloud-services-team (Kanban), Horizon
Andrew closed T90542: Make sure that toollabs can function fully even with one virt* host fully down, a subtask of T90534: Make toollabs reliable enough (tracking), as Resolved.
Wed, Nov 27, 4:31 PM · Epic, cloud-services-team (Kanban), Tracking-Neverending, Toolforge
Andrew closed T90542: Make sure that toollabs can function fully even with one virt* host fully down, a subtask of T91068: Set up a schedule for doing failover exercises for toollabs, as Resolved.
Wed, Nov 27, 4:31 PM · cloud-services-team (Kanban), Toolforge
Andrew closed T90542: Make sure that toollabs can function fully even with one virt* host fully down as Resolved.

I think this is largely resolved -- we have monitoring that keeps us from having too many eggs in one basket.

Wed, Nov 27, 4:31 PM · cloud-services-team (Kanban), Goal, ToolLabs-Goals-Q4, Cloud-Services
Andrew assigned T88711: Fully puppetize Grid Engine to Bstorm.

this can be closed, can't it?

Wed, Nov 27, 4:30 PM · cloud-services-team (Kanban), Goal, Puppet, Toolforge
Andrew closed T46720: Only list LDAP servers location in the same datacenter in the nslcd configuration as Resolved.
Wed, Nov 27, 4:29 PM · LDAP, cloud-services-team (Kanban), Cloud-VPS
Andrew closed T210215: Quota usage not being counted properly in new region as Resolved.
Wed, Nov 27, 4:28 PM · cloud-services-team (Kanban), Patch-For-Review, Cloud-VPS
Andrew closed T178409: Applied puppet classes not appearing in horizon for integration-slave-docker-c2-m4-d40-1005.integration.eqiad.wmflabs as Resolved.

I'm going to close this pending a re-appearance of the issue

Wed, Nov 27, 4:25 PM · cloud-services-team (Kanban), User-Addshore, Horizon
Andrew closed T170944: RFC: What to do about wikitech per-project puppet config? as Resolved.

this is resolved with the new git backend for horizon puppet config.

Wed, Nov 27, 4:25 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew claimed T153608: Migrate references from $instance.eqiad.wmflabs to $instance.$project.eqiad.wmflabs.
Wed, Nov 27, 4:23 PM · cloud-services-team (Kanban), Puppet, Cloud-Services
Andrew closed T153279: labnet/ labtestnet2001 - disk space - nova-api.log needs rotation as Invalid.

this hasn't been an issue lately.

Wed, Nov 27, 4:23 PM · cloud-services-team (Kanban), Operations, Cloud-Services
Andrew closed T149589: Puppet tab in Horizon unusably slow as Resolved.

now the yaml backend is the default.

Wed, Nov 27, 4:22 PM · cloud-services-team (Kanban), Patch-For-Review, Operations, Puppet, Cloud-Services
Andrew closed T123817: Allocate vlan and IPs for labtest VMs, a subtask of T120293: Create labtest cluster, as Invalid.
Wed, Nov 27, 4:22 PM · Goal, Cloud-Services
Andrew closed T123817: Allocate vlan and IPs for labtest VMs as Invalid.
Wed, Nov 27, 4:21 PM · cloud-services-team (Kanban), Cloud-Services
Andrew claimed T91619: Clean out unused security groups on toollabs.
Wed, Nov 27, 4:21 PM · cloud-services-team (Kanban), Toolforge
Andrew added a comment to T176891: DNS resolution chosing IPv6 addrs on hosts with only link-local IPv6 addresses.

I think this is still happening:

Wed, Nov 27, 4:20 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew closed T210328: puppet-enc causing puppet intermittent failures as Resolved.

this seems to be fixed.

Wed, Nov 27, 4:13 PM · cloud-services-team (Kanban), Cloud-Services
Andrew closed T181551: Puppet 4.x breaks the role/profile filters on Horizon, a subtask of T235708: Redesign for wmcs custom puppet settings, as Resolved.
Wed, Nov 27, 4:11 PM · MW-1.35-notes (1.35.0-wmf.5; 2019-11-05), cloud-services-team (Kanban), Cloud-VPS
Andrew closed T181551: Puppet 4.x breaks the role/profile filters on Horizon as Resolved.

I have mournfully ripped out the code that manages class documentation :(

Wed, Nov 27, 4:11 PM · cloud-services-team (Kanban), Horizon
Andrew closed T177880: Automatically run maintain-views and and maintain-meta_p when config changes on cloud replicas as Declined.

Because this requires coordination with DBAs and pooling/depooling it's not straightforward to automate.

Wed, Nov 27, 4:11 PM · cloud-services-team (Kanban), Data-Services
Andrew closed T237058: /usr/sbin/ssh-key-ldap-lookup misconfigured in codfw1dev as Invalid.

this is just a symptom of T239347, so closing in favor of that

Wed, Nov 27, 4:08 PM · cloud-services-team (Kanban)
Andrew created T239347: create a 'normal' network for codf1dev neutron w/public IPs.
Wed, Nov 27, 4:07 PM · cloud-services-team (Kanban)
Andrew moved T239160: nova: set up cell and host mappings from Inbox to Doing on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:04 PM · cloud-services-team (Kanban)
Andrew moved T239161: nova: start using the placement service from Inbox to Doing on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:04 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew triaged T235708: Redesign for wmcs custom puppet settings as Medium priority.
Wed, Nov 27, 4:03 PM · MW-1.35-notes (1.35.0-wmf.5; 2019-11-05), cloud-services-team (Kanban), Cloud-VPS
Andrew moved T238181: 2019 Cloud Services annual survey from Important to Doing on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:03 PM · cloud-services-team (Kanban), Cloud-Services
Andrew updated the task description for T239170: Create a new nova database on m5 named 'nova_cell0'.
Wed, Nov 27, 3:27 PM · DBA, cloud-services-team (Kanban)
Andrew added a comment to T239170: Create a new nova database on m5 named 'nova_cell0'.

Also, can we maybe be more specific about that grant and use certain IPs instead of using %? For either nova and the new nova_cell0

Wed, Nov 27, 2:39 PM · DBA, cloud-services-team (Kanban)

Tue, Nov 26

Andrew closed T238708: Delete instance-puppet entries for deleted VMs as Resolved.
Tue, Nov 26, 10:59 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew closed T238708: Delete instance-puppet entries for deleted VMs, a subtask of T235708: Redesign for wmcs custom puppet settings, as Resolved.
Tue, Nov 26, 10:59 PM · MW-1.35-notes (1.35.0-wmf.5; 2019-11-05), cloud-services-team (Kanban), Cloud-VPS
Andrew updated the task description for T239161: nova: start using the placement service.
Tue, Nov 26, 6:04 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew added a comment to T239160: nova: set up cell and host mappings.

yep, tagged you by mistake.

Tue, Nov 26, 3:55 PM · cloud-services-team (Kanban)
Andrew added a comment to T239170: Create a new nova database on m5 named 'nova_cell0'.

Those steps sound right to me. Backups would be nice -- I think the nova db is backed up but I'm not positive.

Tue, Nov 26, 3:54 PM · DBA, cloud-services-team (Kanban)
Andrew reopened T238708: Delete instance-puppet entries for deleted VMs, a subtask of T235708: Redesign for wmcs custom puppet settings, as Open.
Tue, Nov 26, 3:03 PM · MW-1.35-notes (1.35.0-wmf.5; 2019-11-05), cloud-services-team (Kanban), Cloud-VPS
Andrew reopened T238708: Delete instance-puppet entries for deleted VMs as "Open".

I need to alter this so it doesn't make a commit if there was nothing to delete.

Tue, Nov 26, 3:03 PM · cloud-services-team (Kanban), Cloud-VPS

Mon, Nov 25

Andrew renamed T239170: Create a new nova database on m5 named 'nova_cell0' from new nova database on m5 to Create a new nova database on m5 named 'nova_cell0'.
Mon, Nov 25, 9:49 PM · DBA, cloud-services-team (Kanban)
Andrew added a subtask for T239160: nova: set up cell and host mappings: T239170: Create a new nova database on m5 named 'nova_cell0'.
Mon, Nov 25, 9:49 PM · cloud-services-team (Kanban)
Andrew added a parent task for T239170: Create a new nova database on m5 named 'nova_cell0': T239160: nova: set up cell and host mappings.
Mon, Nov 25, 9:49 PM · DBA, cloud-services-team (Kanban)
Andrew added a project to T239160: nova: set up cell and host mappings: DBA.
Mon, Nov 25, 9:47 PM · cloud-services-team (Kanban)
Andrew created T239170: Create a new nova database on m5 named 'nova_cell0'.
Mon, Nov 25, 9:47 PM · DBA, cloud-services-team (Kanban)
Andrew created T239168: Increase m5 database connection limit for 'nova' database.
Mon, Nov 25, 9:45 PM · cloud-services-team (Kanban)
Andrew updated the task description for T239161: nova: start using the placement service.
Mon, Nov 25, 8:59 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew updated the task description for T239161: nova: start using the placement service.
Mon, Nov 25, 8:40 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew updated the task description for T239161: nova: start using the placement service.
Mon, Nov 25, 8:39 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew updated the task description for T239161: nova: start using the placement service.
Mon, Nov 25, 8:38 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew created T239161: nova: start using the placement service.
Mon, Nov 25, 8:13 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew created T239160: nova: set up cell and host mappings.
Mon, Nov 25, 8:12 PM · cloud-services-team (Kanban)
Andrew added a comment to T238181: 2019 Cloud Services annual survey.

I sent the initial survey invitation this morning.

Mon, Nov 25, 8:09 PM · cloud-services-team (Kanban), Cloud-Services
Andrew closed T238708: Delete instance-puppet entries for deleted VMs, a subtask of T235708: Redesign for wmcs custom puppet settings, as Resolved.
Mon, Nov 25, 6:21 PM · MW-1.35-notes (1.35.0-wmf.5; 2019-11-05), cloud-services-team (Kanban), Cloud-VPS
Andrew closed T238708: Delete instance-puppet entries for deleted VMs as Resolved.
Mon, Nov 25, 6:21 PM · cloud-services-team (Kanban), Cloud-VPS
Andrew created T239146: Mitigate race conditions from horizon writing to instance-puppet git.
Mon, Nov 25, 6:20 PM · MW-1.35-notes (1.35.0-wmf.5; 2019-11-05), cloud-services-team (Kanban), Cloud-VPS

Fri, Nov 22

Andrew added a comment to T145703: Horizon loses credentials every day.

I just ran an experiment forcing my traffic from one labweb to the other, and my session persisted. So it's not a split-brain issue, or at least not an obvious one.

Fri, Nov 22, 5:38 PM · Security, cloud-services-team (Kanban), Horizon
Andrew closed T227411: prometheus-pdns-exporter log noise about unexpected metrics as Resolved.

done -- logs are nice and quiet now.

Fri, Nov 22, 4:03 PM · Operations, observability
Andrew added a comment to T227411: prometheus-pdns-exporter log noise about unexpected metrics.

That patch seems to quiet the alerts; I'll see about building and deploying

Fri, Nov 22, 3:39 PM · Operations, observability
Andrew added a comment to T227411: prometheus-pdns-exporter log noise about unexpected metrics.

@Andrew : I created an (untested) patch which should fix this, can you take it from here?

Fri, Nov 22, 3:00 PM · Operations, observability
Andrew added a comment to T145703: Horizon loses credentials every day.

I wiped arturo's tokens from the keystone database.

Fri, Nov 22, 2:18 AM · Security, cloud-services-team (Kanban), Horizon

Thu, Nov 21

Andrew updated subscribers of T227411: prometheus-pdns-exporter log noise about unexpected metrics.

I upgraded pdns to version 4 yesterday and now there's a lot more of this. I don't see the metrics being complained about defined in prometheus-pdns-exporter so I'm not sure how to address this -- @MoritzMuehlenhoff if you want to point me in the right direction I'm happy to do the coding.

Thu, Nov 21, 8:15 PM · Operations, observability
Andrew merged task T238859: prometheus-pdns-exporter complaining about many unknown metrics into T227411: prometheus-pdns-exporter log noise about unexpected metrics.
Thu, Nov 21, 8:11 PM · cloud-services-team (Kanban)
Andrew merged T238859: prometheus-pdns-exporter complaining about many unknown metrics into T227411: prometheus-pdns-exporter log noise about unexpected metrics.
Thu, Nov 21, 8:11 PM · Operations, observability
Andrew added a comment to T237749: Upgrade wmcs OpenStack version to Ocata.

eqiad1 (cloudservices1003/1004) now running ocata Designate.

Thu, Nov 21, 8:04 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew created T238859: prometheus-pdns-exporter complaining about many unknown metrics.
Thu, Nov 21, 7:39 PM · cloud-services-team (Kanban)
Andrew closed T238338: Import packages for Openstack Ocata as Resolved.
Thu, Nov 21, 6:58 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T238338: Import packages for Openstack Ocata, a subtask of T237749: Upgrade wmcs OpenStack version to Ocata, as Resolved.
Thu, Nov 21, 6:58 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T210715: cloudvps: PDNS 3.x vs 4.x, a subtask of T237749: Upgrade wmcs OpenStack version to Ocata, as Resolved.
Thu, Nov 21, 3:00 PM · Patch-For-Review, cloud-services-team (Kanban)
Andrew closed T210715: cloudvps: PDNS 3.x vs 4.x as Resolved.
Thu, Nov 21, 3:00 PM · cloud-services-team (Kanban)
Andrew added a comment to T145703: Horizon loses credentials every day.

I reproduced what Arturo is seeing -- the session cookie is present /until/ I visit horizon, at which point it's cleared. So Horizon definitely thinks that we're not allowed. It also looks to me like the keystone tokens are created correctly (with 7-day lifespan) so I'm not sure who is making the decision that our access has expired.

Thu, Nov 21, 3:00 PM · Security, cloud-services-team (Kanban), Horizon

Wed, Nov 20

Andrew added a comment to T145703: Horizon loses credentials every day.

I'm having trouble producing this reliably enough to debug. If this happens to someone else, please paste the contents of your sessionid cookie here before logging in again so I can try to track things down.

Wed, Nov 20, 7:44 PM · Security, cloud-services-team (Kanban), Horizon
Andrew added a comment to T236309: define requirements for new devtools cloud vps project to replace "git", "gerrit" and "phabricator".

For the new project, please file a ticket here: https://phabricator.wikimedia.org/project/profile/2875/

Wed, Nov 20, 3:15 PM · Phabricator, Gerrit
Andrew added a comment to T224528: rack/setup codfw: cloudbackup2001.codfw.wmnet and cloudbackup2002.codfw.wmnet.

Thanks! Marked as active.

Wed, Nov 20, 12:27 AM · Cloud-Services, ops-codfw, Operations