chasemp (Chase)Administrator
Engineering Manager & Engineer (WMF)

Projects (27)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Sep 16 2014, 11:39 AM (131 w, 3 d)
Roles
Administrator
Availability
Available
IRC Nick
chasemp
LDAP User
Rush
MediaWiki User
CPettet (WMF)

“Be bold and mighty forces will come to your aid”
― Johann Wolfgang von Goethe

Local changes.
for upgrades

Recent Activity

Today

chasemp added a comment to T158913: Move labstore1002 and labstore1002-array1 and labstore1002-array2 to different rack (currently in C3).

It's similar to what we are doing with labstore1004/1005, or same model. I believe those are in neighboring racks.

Fri, Mar 24, 4:30 PM · DC-Ops, Labs, Operations
chasemp added a comment to T160908: Instance creation fails before first puppet run around 1% of the time.

Took about a day and half to leak 7 instances

Fri, Mar 24, 4:00 PM · Patch-For-Review, Operations, Labs

Wed, Mar 22

chasemp added a comment to T161159: Cannot access the database: Can't connect to MySQL server on '10.192.48.41' (111) (10.192.48.41).

seems that way, I didn't see that sal and texted @Marostegui to ask (sorry buddy!)

Wed, Mar 22, 9:39 PM · DBA

Tue, Mar 21

chasemp added a comment to T160908: Instance creation fails before first puppet run around 1% of the time.

https://gerrit.wikimedia.org/r/#/c/343636/

Tue, Mar 21, 7:22 PM · Patch-For-Review, Operations, Labs
chasemp triaged T160205: Add interstitial to wikidata-externalid-url as "Normal" priority.
Tue, Mar 21, 6:35 PM · Wikidata, Labs, Tool-Labs
chasemp changed the status of T158204: Eqiad: (2) hardware access request for labnet1003/1004 from "Open" to "Stalled".

Let's hold on this one out of the pending 3 for last, I want to do some more review on CPU specs since the existing is such a hodgepodge and our model is in flux.

Tue, Mar 21, 12:40 PM · hardware-requests, Labs, Operations
chasemp closed T154860: User dschwen unable to log into horizon.wikimedia.org (An error occurred authenticating. Please try again later.) as "Resolved".
Tue, Mar 21, 12:05 PM · Horizon, Labs
chasemp added a comment to T143349: Deprecate precise instances in Labs by 2017-03-31.

I have finally deleted all three Precise instances from the integration labs project and updated the task detail to reflect it. The sub task T158652 is still open pending puppet patches, but that is not a concern for this task.

Tue, Mar 21, 12:03 PM · Patch-For-Review, Labs-Infrastructure, Labs
chasemp added a comment to T143349: Deprecate precise instances in Labs by 2017-03-31.

A note that the appointed time grows nigh, and this is quickly becoming the most mysterious item left on the list:

wildcat dannyb no Andrew working with Danny on migration

ping @Danny_B

Tue, Mar 21, 12:03 PM · Patch-For-Review, Labs-Infrastructure, Labs
chasemp added a comment to T160884: Request creation of getstarted labs project.

Thanks for the overview @Freddy2001. We'll get to this within the week.

Tue, Mar 21, 12:01 PM · Labs
chasemp added a comment to T118154: determine hardware needs for dumps in eqiad and codfw.

So @chasemp, can we get with Rob and get these boxes ordered?

Tue, Mar 21, 12:00 PM · Operations, Dumps-Generation
chasemp added a comment to T158204: Eqiad: (2) hardware access request for labnet1003/1004.

Is there a specific cpu seed we have to stick to? 24 cores without HT is dual 12 core CPUs. Anything between 2-2.6 ok?

Tue, Mar 21, 11:57 AM · hardware-requests, Labs, Operations
chasemp added a comment to T158207: Eqiad: (2) hardware access request for labcontrol1003/1004.

Is there a specific cpu seed we have to stick to? 24 cores without HT is dual 12 core CPUs. Anything between 2-2.6 ok?

Tue, Mar 21, 11:54 AM · hardware-requests, Operations, Labs
chasemp edited the description of T154706: Codfw: (1) hardware access request for labtest.
Tue, Mar 21, 11:50 AM · hardware-requests, Labs, Operations
chasemp added a comment to T154706: Codfw: (1) hardware access request for labtest.

@chasemp:

Is there a specific existing server that meets this requirement to base a new spec off of?

Tue, Mar 21, 11:50 AM · hardware-requests, Labs, Operations

Mon, Mar 20

chasemp changed the destination URL U9 tools jobs by source from https://graphite-labs.wikimedia.org/render?title='Tools OS Breakdown 30d'&yMin=0&width=800&height=400&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-12*.job_count),%27precise%27))&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-14*.job_count),%27trusty%27))&target=cactiStyle(alias(tools.tools-k8s-master-01.KubernetesCollector.namespaces.active,%27k8s%27))&from=-180days to https://graphite-labs.wikimedia.org/render?title='Tools OS Breakdown 30d'&yMin=0&width=800&height=400&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-12*.job_count),%27precise%27))&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-14*.job_count),%27trusty%27))&target=cactiStyle(alias(tools.tools-k8s-master-01.KubernetesCollector.namespaces.active,%27k8s%27))&from=-30days.
Mon, Mar 20, 4:25 PM
chasemp changed the destination URL U9 tools jobs by source from https://graphite-labs.wikimedia.org/render?title=Tools&yMin=0&width=800&height=400&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-12*.job_count),%27precise%27))&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-14*.job_count),%27trusty%27))&target=cactiStyle(alias(tools.tools-k8s-master-01.KubernetesCollector.namespaces.active,%27k8s%27))&from=-180days to https://graphite-labs.wikimedia.org/render?title='Tools OS Breakdown 30d'&yMin=0&width=800&height=400&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-12*.job_count),%27precise%27))&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-14*.job_count),%27trusty%27))&target=cactiStyle(alias(tools.tools-k8s-master-01.KubernetesCollector.namespaces.active,%27k8s%27))&from=-180days.
Mon, Mar 20, 4:25 PM
chasemp changed the destination URL U9 tools jobs by source from https://graphite-labs.wikimedia.org/render?title=Tools&yMin=0&width=800&height=400&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*12*.job_count),%27precise%27))&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools*14*.job_count),%27trusty%27))&target=cactiStyle(alias(tools.tools-k8s-master-01.KubernetesCollector.namespaces.active,%27k8s%27))&from=-180days to https://graphite-labs.wikimedia.org/render?title=Tools&yMin=0&width=800&height=400&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-12*.job_count),%27precise%27))&target=cactiStyle(alias(sumSeries(tools.tools-services-01.sge.hosts.tools-*-14*.job_count),%27trusty%27))&target=cactiStyle(alias(tools.tools-k8s-master-01.KubernetesCollector.namespaces.active,%27k8s%27))&from=-180days.
Mon, Mar 20, 4:19 PM
chasemp added a comment to T160884: Request creation of getstarted labs project.

+1

Mon, Mar 20, 3:38 PM · Labs
chasemp added a comment to T87224: Weird state of /data/project for dumps (semi-missing files).

ignores permissions and does not reproduce root@dumps-stats:/data/project# ls -lh wikistats

Mon, Mar 20, 2:44 PM · Labs, Labs-Infrastructure
chasemp edited the description of T160908: Instance creation fails before first puppet run around 1% of the time.
Mon, Mar 20, 1:04 PM · Patch-For-Review, Operations, Labs
chasemp renamed T160908: Instance creation fails before first puppet run around 1% of the time from "Instance creation stalls before first puppet run around 1% of the time" to "Instance creation fails before first puppet run around 1% of the time".
Mon, Mar 20, 12:59 PM · Patch-For-Review, Operations, Labs
chasemp assigned T160908: Instance creation fails before first puppet run around 1% of the time to Andrew.
Mon, Mar 20, 12:59 PM · Patch-For-Review, Operations, Labs
chasemp triaged T160908: Instance creation fails before first puppet run around 1% of the time as "High" priority.
Mon, Mar 20, 12:58 PM · Patch-For-Review, Operations, Labs
chasemp created T160908: Instance creation fails before first puppet run around 1% of the time.
Mon, Mar 20, 12:58 PM · Patch-For-Review, Operations, Labs
chasemp renamed T159721: labvirt1001 and 1002 cannot launch new VMs from "labvirt1001 can't launch new VMs" to "labvirt1001 and 1002 cannot launch new VMs".
Mon, Mar 20, 12:54 PM · Labs-Infrastructure, Labs
chasemp closed T159459: openstack instance creation sometimes takes >480s as "Resolved".

This is still super important but the immediate issue of this task is resolved it seems and followed up by T159721: labvirt1001 and 1002 cannot launch new VMs

Mon, Mar 20, 12:54 PM · Operations, Labs
chasemp removed a parent task for T104733: Set up A-based SPF for tools.wmflabs.org: T97574: Provision and test tools-mailrelay-02.
Mon, Mar 20, 12:34 PM · Mail, Labs-Team-Backlog, Labs, Tool-Labs
chasemp removed a subtask for T97574: Provision and test tools-mailrelay-02: T104733: Set up A-based SPF for tools.wmflabs.org.
Mon, Mar 20, 12:34 PM · Patch-For-Review, Labs, Tool-Labs
chasemp edited the description of T143349: Deprecate precise instances in Labs by 2017-03-31.
Mon, Mar 20, 12:30 PM · Patch-For-Review, Labs-Infrastructure, Labs
chasemp added a comment to T160838: Add monitoring for nfs-exportd on active labstore specifically.

https://gerrit.wikimedia.org/r/#/c/343624/

Mon, Mar 20, 12:26 PM · Labs, Operations

Sat, Mar 18

chasemp created T160838: Add monitoring for nfs-exportd on active labstore specifically.
Sat, Mar 18, 8:20 PM · Labs, Operations
chasemp created P5070 (An Untitled Masterwork).
Sat, Mar 18, 7:38 PM

Fri, Mar 17

chasemp added a comment to T151296: Cannot access replica databases - access denied.

This is a known issue and T158420 will resolve it but at present there is no mechanism for maintainer per-user replica creds, only per tool. It's in progress though.

Fri, Mar 17, 5:44 PM · Labs, Tool-Labs
chasemp assigned T158420: Make maintain-dbusers.py create replica.my.cnf files for user accounts as well to madhuvishy.
Fri, Mar 17, 5:44 PM · Patch-For-Review, Tool-Labs, Labs, Tracking
chasemp added a comment to T159407: Requesting /data/project NFS share for Nova_Resource:Twl.

I'm not sure if this is the right solution, almost certainly it's not a good solution. How large are the backups expected to be?

Fri, Mar 17, 5:42 PM · The-Wikipedia-Library, Labs
chasemp closed T95107: Labs: Could not find dependency File[/usr/lib/ganglia/python_modules] for File[/usr/lib/ganglia/python_modules/gmond_memcached.py] as "Resolved".

closing due to age and activity (seems fixed?)

Fri, Mar 17, 2:21 PM · Patch-For-Review, Puppet, Labs
chasemp closed T70508: Report when an instance has finished its initial Puppet run as "Declined".
Fri, Mar 17, 2:17 PM · Labs, Wikimedia-Labs-General
chasemp closed T45526: Invalidate the nscd group cache of instances in a project when a user is added or removed as "Declined".

closing due to age and activity, I don't think this has been an issue

Fri, Mar 17, 2:15 PM · Labs, Labs-Infrastructure
chasemp closed T45028: Nagios checks needed for labs-ns0/labs-ns1 as "Resolved".

this exists

Fri, Mar 17, 2:05 PM · Labs, Labs-Infrastructure
chasemp closed T56702: add Central Logging Service documentation as "Invalid".

no longer accurate

Fri, Mar 17, 2:01 PM · Labs, Documentation, Labs-Infrastructure
chasemp closed T56702: add Central Logging Service documentation, a subtask of T2001: Documentation is out of date, incomplete (tracking), as "Invalid".
Fri, Mar 17, 2:01 PM · Documentation, Tracking, MediaWiki-Documentation
chasemp closed T71326: Database is slow. Load times abnormally high at times. as "Invalid".

closing due to age and activity

Fri, Mar 17, 1:59 PM · Labs, Labs-Infrastructure
chasemp closed T87224: Weird state of /data/project for dumps (semi-missing files) as "Invalid".

closing this due to age and activity

Fri, Mar 17, 1:52 PM · Labs, Labs-Infrastructure
chasemp closed T100108: Puppet errors on newly created instances as "Resolved".

this is long since old I believe

Fri, Mar 17, 1:50 PM · Labs
chasemp closed T101661: Provide all labs users with username / passwords for the Postgres database as "Declined".

this seems to not be an issue and I'm not inclined to worry about it with the current demand

Fri, Mar 17, 1:47 PM · Labs-Sprint-101, Labs
chasemp closed T103058: enable hba on tools-precise-dev as "Invalid".

precise is no longer supported

Fri, Mar 17, 1:42 PM · Labs, Tool-Labs
chasemp closed T69884: Track labsdb stats on Labs Graphite as "Resolved".

Since these are production servers it seems most appropriate they would appear in prod graphite/promethius

Fri, Mar 17, 1:39 PM · Labs, Tool-Labs
chasemp closed T69884: Track labsdb stats on Labs Graphite, a subtask of T69879: Useful graphite metrics to be tracked for Tool labs (tracking), as "Resolved".
Fri, Mar 17, 1:39 PM · Tracking, Labs, Tool-Labs
chasemp closed T104416: High load on idle machines as "Declined".

I'm closing for age and lack of activity

Fri, Mar 17, 1:33 PM · Labs
chasemp closed T107094: Rewrite the meta_p table populating code to python and have it run on a cron as "Resolved".

as of efcac33f8a5d00427a0593e9e7b6e8a020c86f40 this is hopefully at least viable and further work should be tracked in specific issues

Fri, Mar 17, 1:30 PM · Tool-Labs, Labs
chasemp closed T110556: Ironic on Labs as "Declined".

https://wikitech.wikimedia.org/wiki/Labs_labs_labs/Bare_Metal

Fri, Mar 17, 1:26 PM · Labs-Sprint-114, Labs-Infrastructure, Labs
chasemp closed T111602: Install python-enum34 on toollabs as "Resolved".

So it is available on trusty now, but not precise.
This should be fairly easy to package for precise. Is there a guide for how to get a package into the WMF repo?

Fri, Mar 17, 1:24 PM · Pywikibot-core, Tool-Labs, Labs
chasemp closed T111602: Install python-enum34 on toollabs, a subtask of T55704: Packages to be added to toollabs puppet, as "Resolved".
Fri, Mar 17, 1:24 PM · Labs, Tracking, Tool-Labs
chasemp added a comment to T95094: Some grid jobs are in odd state.

There are still some jobs in that state; I have changed http://tools.wmflabs.org/?status to display "n/a" for jobs with no information, so this is an easy way to find those.

Fri, Mar 17, 1:21 PM · Labs, Tool-Labs
chasemp closed T113646: Instances spontaneously suspended as "Declined".

considering age and no activity I'm bouncing this task

Fri, Mar 17, 1:19 PM · Labs
chasemp closed T99130: Investigate alternatives to dedicated exec node for gifti's tools as "Declined".

Eventually this workload moves to k8s with all others but for now I'm marking this declined with https://phabricator.wikimedia.org/T156981#3077562

Fri, Mar 17, 1:16 PM · Labs, Tool-Labs
chasemp updated subscribers of T94500: bigbrother doesn't stop.

Change 330265 merged by Andrew Bogott:
toollabs: bigbrother: stop tracking jobs when rcfile is deleted

https://gerrit.wikimedia.org/r/330265

Fri, Mar 17, 1:14 PM · Patch-For-Review, Labs, Tool-Labs
chasemp closed T124731: proxylistener does not verify that request comes from Tools project as "Invalid".

I don't know why I did not try a simple curl :-). When I webservice php5.6 shell and then curl http://tools-proxy-01/does-not-exist, this hits the proxy as:

10.68.23.240 - - [31/Jan/2017:21:56:27 +0000] "GET /does-not-exist HTTP/1.1" 404 1409 "-" "curl/7.38.0"

10.68.23.240 is the IP of tools-worker-1003.tools.eqiad.wmflabs, i. e. proxylistener would contact the trustworthy ident server on tools-worker-1003 and not one on the user-controlled containers. Therefore I think Kubernetes does not offer any attack vector.

So this task can be un-securitied and then closed as invalid. Thanks and sorry for the confusion!

Fri, Mar 17, 1:13 PM · Security, Tool-Labs, Labs
chasemp changed the visibility for T124731: proxylistener does not verify that request comes from Tools project.
Fri, Mar 17, 1:13 PM · Security, Tool-Labs, Labs
chasemp closed T127698: virt host reboots sometimes breaks puppet on instances as "Declined".

Since this is >1yr old and we haven't updated it at all I'm going to close in favor of resurfacing the issue if needed

Fri, Mar 17, 1:11 PM · Labs-Infrastructure, Labs
chasemp closed T69881: Track gridengine stats on Graphite as "Resolved".

We have the basics of this now:

Fri, Mar 17, 1:06 PM · Labs, Tool-Labs
chasemp closed T69881: Track gridengine stats on Graphite, a subtask of T69879: Useful graphite metrics to be tracked for Tool labs (tracking), as "Resolved".
Fri, Mar 17, 1:06 PM · Tracking, Labs, Tool-Labs
chasemp closed T106871: paramiko (python SSH implementation) needs older hashes for host authentication as "Invalid".

We removed paramiko from the backup pipeline

Fri, Mar 17, 1:02 PM · Labs, Operations
chasemp closed T106871: paramiko (python SSH implementation) needs older hashes for host authentication, a subtask of T106474: Make continuous backups of NFS data to codfw, as "Invalid".
Fri, Mar 17, 1:02 PM · Labs-Sprint-108, Patch-For-Review, Labs-Sprint-107, Labs
chasemp closed T123590: Monitor labs new instance creation as "Resolved".

We have this running as of 99ef86ae0e2b74370e543d3fe22a46e8b0928df3 and have found several issues from the normalized and ongoing testing

Fri, Mar 17, 12:59 PM · Wikimedia-Incident, Labs
chasemp added a comment to T143349: Deprecate precise instances in Labs by 2017-03-31.

A note that the appointed time grows nigh, and this is quickly becoming the most mysterious item left on the list:

Fri, Mar 17, 12:54 PM · Patch-For-Review, Labs-Infrastructure, Labs
chasemp added a comment to T159846: Wikmaps Warper - Migrate / Upgrade maps-warper from Precise to Trusty.

Okay I think the new instance should be working fine now. I'll announce it to the main users and see if any issues crop up. If alls good we can turn off the old instance in the next couple of days

Fri, Mar 17, 12:51 PM · wikimaps-warper
chasemp closed T160686: ug_expiry column of the user_groups table is not present on Labs as "Resolved".

should be good to go, let me know if not

Fri, Mar 17, 12:47 PM · DBA, Labs-Infrastructure, Labs
chasemp assigned T136712: Virtualenvs slow on tool labs NFS to madhuvishy.

We want to experiment with enabling lookupcache=all everywhere. This is currently set on the k8s-workers and the bastions afaict. Passing to @madhuvishy from conversations yesterday. It appears historical reasoning on disabling lookupcache are not well understood, and we should look how changing this effects an active mount (remount?) and consider how to roll it out to consolidate. Not only are bastions different from trusty exec nodes atm, but also from k8s workers. That's the sort of inconsistency we'll spend no limit of time fighting.

Fri, Mar 17, 12:40 PM · Labs, Tool-Labs
chasemp triaged T159835: Labvirt1001 has insanely slow IO as "High" priority.
Fri, Mar 17, 12:36 PM · ops-eqiad, Operations, Labs-Infrastructure, Labs
chasemp triaged T159721: labvirt1001 and 1002 cannot launch new VMs as "High" priority.
Fri, Mar 17, 12:36 PM · Labs-Infrastructure, Labs
chasemp added a comment to T144025: Clean up data in /data/scratch/mwoffliner.

@chasemp I'll clean it piece by piece to see if it works fine without. Until now I have only verified other aspects (than storage) of the full dumping.

Fri, Mar 17, 12:34 PM · Labs
chasemp closed T156586: Mount /public/dumps for osmit project as "Resolved".

labstore1003.eqiad.wmnet:/dumps nfs4 28T 18T 11T 64% /public/dumps

Fri, Mar 17, 12:25 PM · Operations, Labs

Thu, Mar 16

chasemp added a comment to T156586: Mount /public/dumps for osmit project.

It should appear on a Puppet run sometime in the next hour or so.

Thu, Mar 16, 9:10 PM · Operations, Labs
chasemp added a subtask for T135931: Tool Labs users missing replica.my.cnf (tracking): T134074: Create replica.my.cnf for bkeegan on tools.
Thu, Mar 16, 8:23 PM · Tool-Labs, Labs, Tracking
chasemp added a parent task for T134074: Create replica.my.cnf for bkeegan on tools: T135931: Tool Labs users missing replica.my.cnf (tracking).
Thu, Mar 16, 8:23 PM · Labs
chasemp closed T140099: Creating a instance with precise fails as "Invalid".

this is no longer allowed

Thu, Mar 16, 8:18 PM · Labs
chasemp added a comment to T154860: User dschwen unable to log into horizon.wikimedia.org (An error occurred authenticating. Please try again later.).

@dschwen is 2fa working for you via wikitech currently? Can you disable and renable 2fa and see if it works in both venues?

Thu, Mar 16, 8:15 PM · Horizon, Labs
chasemp triaged T160686: ug_expiry column of the user_groups table is not present on Labs as "Normal" priority.

I'll attempt to run the view generation when I can in the next few days

Thu, Mar 16, 8:14 PM · DBA, Labs-Infrastructure, Labs
chasemp closed T146065: eqiad: 2 hardware access request for research labsdbs as "Declined".

My understanding is this has been put on hold until TBD. I don't want to leave this request for hardware open and create confusion, and I'm not sure what the specs and needs will be when things come back around.

Thu, Mar 16, 8:10 PM · Research-and-Data-Backlog, Labs, hardware-requests, Operations
chasemp closed T149750: Can't ssh into xenon.rcm.eqiad.wmflabs as "Resolved".

I'm working from the assumption this issue is fine now.

Thu, Mar 16, 8:06 PM · Labs-project-other, Labs-Infrastructure, Labs
chasemp closed Unknown Object (Task), a subtask of T150767: LabsDB replica service for tools and labs - issues and missing available views (tracking), as "Resolved".
Thu, Mar 16, 8:04 PM · DBA, Labs-Infrastructure, Labs
chasemp added a parent task for T151296: Cannot access replica databases - access denied: T135931: Tool Labs users missing replica.my.cnf (tracking).
Thu, Mar 16, 8:04 PM · Labs, Tool-Labs
chasemp added a subtask for T135931: Tool Labs users missing replica.my.cnf (tracking): T151296: Cannot access replica databases - access denied.
Thu, Mar 16, 8:03 PM · Tool-Labs, Labs, Tracking
chasemp updated subscribers of T154860: User dschwen unable to log into horizon.wikimedia.org (An error occurred authenticating. Please try again later.).

@dschwen still happening? @Andrew ping if so

Thu, Mar 16, 7:54 PM · Horizon, Labs
chasemp added a parent task for T154933: Several users and tools have invalid credentials in replica.my.cnf: T135931: Tool Labs users missing replica.my.cnf (tracking).
Thu, Mar 16, 7:53 PM · Labs, Tool-Labs
chasemp added a subtask for T135931: Tool Labs users missing replica.my.cnf (tracking): T154933: Several users and tools have invalid credentials in replica.my.cnf.
Thu, Mar 16, 7:53 PM · Tool-Labs, Labs, Tracking
chasemp triaged T156586: Mount /public/dumps for osmit project as "Normal" priority.

@Sabas88 we could configure /public/dumps to be available on all instances in this project. Is that acceptable?

Thu, Mar 16, 7:52 PM · Operations, Labs
chasemp closed T156636: Labs instance ci-jessie-wikimedia-498353 can not be deleted as "Resolved".

marking resolved as this may be an artifact of bad config gone by but let's track new happenings in new tickets

Thu, Mar 16, 7:50 PM · Continuous-Integration-Infrastructure, Labs-Infrastructure, Labs
chasemp closed T158685: paws returns 502 bad gateway as "Resolved".
Thu, Mar 16, 7:40 PM · PAWS, Labs
chasemp triaged T160264: Shut down "cewbot" as "Normal" priority.

@Kanashimi can you speak to this?

Thu, Mar 16, 7:38 PM · Labs, Tool-Labs
chasemp closed T76971: Investigate enabling host-based auth to all hosts from bastions as "Invalid".

I spoke with @yuvipanda and in his words 'this idea should die in a fire'

Thu, Mar 16, 7:12 PM · Labs
chasemp closed T155820: Puppet fails on integration instances: nfs_mount[home-on-labstoresvc]: umount: /home: not mounted as "Resolved".
Thu, Mar 16, 7:04 PM · Patch-For-Review, Continuous-Integration-Infrastructure, Labs-Infrastructure, Labs
chasemp added a comment to T152399: Reassign service/pod IP ranges for kubernetes on tool labs.

raw etherpad script for posterity

Thu, Mar 16, 6:23 PM · Tools-Kubernetes, Labs, Tool-Labs
chasemp closed T152399: Reassign service/pod IP ranges for kubernetes on tool labs as "Resolved".

seems like this is sorted, we'll reopen if issues surface

Thu, Mar 16, 6:20 PM · Tools-Kubernetes, Labs, Tool-Labs
chasemp added a comment to T154355: page_lang column of the page table is not replicated to Labs.

metawiki_p.page now contains the page_lang column; however, user_groups view still dos not contain the ug_expiry column. Shall I open a new task for that?

Thu, Mar 16, 6:19 PM · DBA, Labs
chasemp added a comment to T157359: labsdb1006/1007 (postgresql) maintenance.

@aude @MaxSem @Kolossos Can you verify your applications (e.g. restarting them) and see that they work as expected to be 100% the maintenance and upgrade did not cause any issue? This is now Postgresql 9.4.

Thu, Mar 16, 6:16 PM · Patch-For-Review, DBA, Labs-Infrastructure, Labs, Operations

Wed, Mar 15

chasemp added a comment to T152399: Reassign service/pod IP ranges for kubernetes on tool labs.

This maintenance lasted just shy of an hour. All Kubernetes services
should have been back at around 50 minutes in. This was longer than the
expected 30 due to extra time for initial depooling of existing Pods. At
the moment all Kubernetes services seem to be functioning as expected.
Non-Kubernetes Tool Labs functions were not impacted.

Wed, Mar 15, 8:37 PM · Tools-Kubernetes, Labs, Tool-Labs

Tue, Mar 14

chasemp added a comment to T152399: Reassign service/pod IP ranges for kubernetes on tool labs.

https://etherpad.wikimedia.org/p/T152399

Tue, Mar 14, 10:40 PM · Tools-Kubernetes, Labs, Tool-Labs