chasemp (Chase)Administrator
Lead Operations Engineer (Wikimedia Cloud Services)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Sep 16 2014, 11:39 AM (160 w, 6 d)
Roles
Administrator
Availability
Available
IRC Nick
chasemp
LDAP User
Rush
MediaWiki User
CPettet (WMF)

Recent Activity

Yesterday

chasemp added a comment to T171508: Investigate and implement alternative for showmount based check at instance boot time.

Replied on CR :)

Mon, Oct 16, 9:25 PM · cloud-services-team (Kanban), Patch-For-Review, Cloud-Services
chasemp closed T175077: nova compute hosts disk space alert does not page as Resolved.
Mon, Oct 16, 8:02 PM · Patch-For-Review, Cloud-VPS
chasemp added a comment to T175077: nova compute hosts disk space alert does not page.
Resources modified
Mon, Oct 16, 8:00 PM · Patch-For-Review, Cloud-VPS
chasemp added a comment to T169774: Cleanup: 2017-07-02 Toolforge data loss for permissive data.

marked stalled so we remember to remove the temp restore data

Mon, Oct 16, 7:45 PM · Wikimedia-Incident, Data-Services, Toolforge, cloud-services-team (Kanban)

Thu, Oct 12

chasemp updated subscribers of T177914: Switch labstore servers to default SSH configuration.

I believe paramiko is no longer in use. I know it's been removed for all the backup components that have been redone, but I'm unsure if there are components that have not been reimplemented. @madhuvishy any perspective here?

Thu, Oct 12, 8:14 PM · cloud-services-team (Kanban), Data-Services, Operations

Wed, Oct 11

chasemp closed T177103: Catchpoint tests failing under Toolforge availability product as Resolved.

looks good now to me, thank you!

Wed, Oct 11, 10:02 PM · Patch-For-Review, Toolforge
chasemp added a comment to T178008: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks).

thanks @Dzahn

Wed, Oct 11, 9:55 PM · Patch-For-Review, monitoring, Operations
chasemp triaged T178008: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) as Normal priority.
Wed, Oct 11, 9:55 PM · Patch-For-Review, monitoring, Operations
chasemp assigned T177944: Alert for 'All k8s worker nodes are healthy on checker.tools.wmflabs.org' to Andrew.
Wed, Oct 11, 2:37 PM · Toolforge
chasemp triaged T177944: Alert for 'All k8s worker nodes are healthy on checker.tools.wmflabs.org' as Normal priority.
Wed, Oct 11, 2:27 PM · Toolforge
chasemp created T177944: Alert for 'All k8s worker nodes are healthy on checker.tools.wmflabs.org'.
Wed, Oct 11, 2:27 PM · Toolforge

Fri, Oct 6

chasemp triaged T165136: Ferm rules for labstore NFS hosts as Normal priority.
Fri, Oct 6, 2:48 PM · Patch-For-Review, Cloud-VPS, Operations
chasemp triaged T177570: Request to increase active connection quota for user s51187 on analytics.db.svc.eqiad.wmflabs as Normal priority.

We did profile things and put in per user limits on the new setup for sanity as one of the patterns we see if a rogue Tool spawning huge numbers of connections and flooding things. 10 was based on the patterns seen at the time. We talked about then the need for appeal per Tool, but I'm not sure what the status of that mechanism is. @madhuvishy, is it possible to specify a per Tool override of the default connection limit ceiling atm?

Fri, Oct 6, 2:12 PM · Data-Services, XTools, DBA
chasemp closed T177484: tools-mail queue length alert from prometheus cron as Resolved.

@chasemp I bet that's a side effect of T166561: Rollout prometheus-node-exporter 0.14 in labs, is it persisting or has been transient during package upgrades?

Fri, Oct 6, 2:10 PM · cloud-services-team (Kanban), Toolforge, User-bd808

Thu, Oct 5

chasemp added a comment to T177427: Remove non-interactive bots from #wikimedia-cloud.

I personally am for it. I don't mind the idea of unbreak now still showing up via wikibugs.

Thu, Oct 5, 4:56 PM · Patch-For-Review, Cloud-Services, cloud-services-team (Kanban)
chasemp added a comment to T176926: Request creation of IIAB VPS project.

@zhuyifei1999 said what I would suggest. Storage is not quota'd in the way you are imagining. Instances are created with a "flavor" and that has a certain amount of storage associated but we don't format it all by default so you need to apply follow those instructions to do so. We would not be able to allocate more storage for an existing instance right now and fyi quota allocated above the default project allocation would need a separate quota increase task for posterity and tracking on our end if we did.

Thu, Oct 5, 12:36 PM · User-bd808, Cloud-VPS (Project-requests)
chasemp added a comment to T177484: tools-mail queue length alert from prometheus cron.
shinken shinken@shinken-01.shinken.eqiad.wmflabs via wmflabs.org 
7:32 AM (0 minutes ago)
Thu, Oct 5, 12:33 PM · cloud-services-team (Kanban), Toolforge, User-bd808
chasemp renamed T177484: tools-mail queue length alert from prometheus cron from tools-mail queue length alert to tools-mail queue length alert from prometheus cron.
Thu, Oct 5, 12:27 PM · cloud-services-team (Kanban), Toolforge, User-bd808
chasemp updated the task description for T177484: tools-mail queue length alert from prometheus cron.
Thu, Oct 5, 12:20 PM · cloud-services-team (Kanban), Toolforge, User-bd808
chasemp added a comment to T177484: tools-mail queue length alert from prometheus cron.

I think the cron may be ok now so I'm purging frozen messages with

Thu, Oct 5, 12:20 PM · cloud-services-team (Kanban), Toolforge, User-bd808
chasemp triaged T177484: tools-mail queue length alert from prometheus cron as Normal priority.
Thu, Oct 5, 12:18 PM · cloud-services-team (Kanban), Toolforge, User-bd808
chasemp created T177484: tools-mail queue length alert from prometheus cron.
Thu, Oct 5, 12:18 PM · cloud-services-team (Kanban), Toolforge, User-bd808

Tue, Oct 3

chasemp added a comment to T176361: Run performance tests on VPS/Labs using local proxy.

I took care of T177279 but can someone sync up with me on IRC to see what underlying hosts the instances you are using to test live on?

Tue, Oct 3, 6:52 PM · Performance-Team
chasemp closed T177279: Request increased quota for webperf labs project as Resolved.
Tue, Oct 3, 6:51 PM · Performance-Team (Radar), Cloud-VPS (Quota-requests)
chasemp triaged T177279: Request increased quota for webperf labs project as Normal priority.
Tue, Oct 3, 6:46 PM · Performance-Team (Radar), Cloud-VPS (Quota-requests)
chasemp added a comment to T177279: Request increased quota for webperf labs project.

I am going to bump this project to handle 1 more xlarge instance, but I'm wondering what Cloud instance(s) are you using for perf? I was hoping to ensure they were on the best underlying candidate host, and to do the same for this new xlarge.

Tue, Oct 3, 6:44 PM · Performance-Team (Radar), Cloud-VPS (Quota-requests)
chasemp added a comment to T176926: Request creation of IIAB VPS project.

+1

Tue, Oct 3, 3:33 PM · User-bd808, Cloud-VPS (Project-requests)
chasemp assigned T161899: End self-service new Trusty instance creation in Cloud VPS; standardize on Debian base images to bd808.
Tue, Oct 3, 3:30 PM · cloud-services-team (Kanban), Cloud-VPS, User-bd808, Operations

Mon, Oct 2

chasemp added a comment to T171473: labvirt1015 crashes.

Final status of etherpad we were using to coordinate migrations off labvirt1015 for posterity

Mon, Oct 2, 12:59 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations

Sun, Oct 1

chasemp added a comment to T171473: labvirt1015 crashes.

Entirety of labvirt1015 console during crash https://usercontent.irccloud-cdn.com/file/mwxQTBO0/Screen%20Shot%202017-10-01%20at%202.26.59%20PM.png

Sun, Oct 1, 10:05 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations
chasemp added a comment to T177164: puppet-phabricator and gerrit-test3 have gone down.

yes, this labvirt has crashed and we are attempting to recover these instances. Apologies for the inconvenience, appreciate the patience :)

Sun, Oct 1, 10:02 PM · Cloud-VPS
chasemp added a comment to T171473: labvirt1015 crashes.

Note to self: fix cold-migrate to handle already shut down instances

Sun, Oct 1, 9:48 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations

Sat, Sep 30

chasemp added a comment to T177145: Huggle development environment - portable virtual box.

We need to avoid users downloading this off an NFS filesystem.

Sat, Sep 30, 9:41 PM · Huggle, Cloud-Services

Fri, Sep 29

chasemp assigned T177103: Catchpoint tests failing under Toolforge availability product to madhuvishy.
Fri, Sep 29, 6:19 PM · Patch-For-Review, Toolforge
chasemp triaged T176891: DNS resolution chosing IPv6 addrs on hosts with only link-local IPv6 addresses as Normal priority.
Fri, Sep 29, 6:16 PM · cloud-services-team (Kanban), Cloud-VPS
chasemp added a comment to T176891: DNS resolution chosing IPv6 addrs on hosts with only link-local IPv6 addresses.

I'm kind of thinking at least all of Toolforge would make sense but let's see if it has the effect we think in this case before we worry about the logistics of rolling it out wider /proposal

Fri, Sep 29, 6:15 PM · cloud-services-team (Kanban), Cloud-VPS
chasemp added a comment to T177103: Catchpoint tests failing under Toolforge availability product.

I deactivated the failing tests for now until we debug them.

Fri, Sep 29, 6:13 PM · Patch-For-Review, Toolforge
chasemp updated subscribers of T177103: Catchpoint tests failing under Toolforge availability product.

@madhuvishy is it possible the labsdb* failing tests relates back to the rewrite for account handling? I'm wondering if these checks use creds that got clobbered.

Fri, Sep 29, 6:12 PM · Patch-For-Review, Toolforge
chasemp triaged T177103: Catchpoint tests failing under Toolforge availability product as Normal priority.
Fri, Sep 29, 6:11 PM · Patch-For-Review, Toolforge
chasemp created T177103: Catchpoint tests failing under Toolforge availability product.
Fri, Sep 29, 6:11 PM · Patch-For-Review, Toolforge
chasemp added a comment to T176891: DNS resolution chosing IPv6 addrs on hosts with only link-local IPv6 addresses.

Let's start with "Disable IPv6 entirely on the VM using /etc/sysctl.conf" and see how it goes? Let's revert https://gerrit.wikimedia.org/r/#/c/380318/ too to cleanup.

Fri, Sep 29, 5:52 PM · cloud-services-team (Kanban), Cloud-VPS

Tue, Sep 26

chasemp added a comment to T176597: Request creation of webperf VPS project.

+1'd

Tue, Sep 26, 4:08 PM · Cloud-VPS (Project-requests)
chasemp closed T175002: db1009 (m5, used primarily for cloud services) unresponsive for minutes as Resolved.

What should we do with this task? is it all good now?

Tue, Sep 26, 1:58 PM · Patch-For-Review, DBA, cloud-services-team

Tue, Sep 19

chasemp created P6025 (An Untitled Masterwork).
Tue, Sep 19, 5:18 PM
chasemp updated subscribers of T176044: Replace kernel and reboot labvirt1015, 1016, 1017, 1018.

A summary from the IRC conversation I had with @RobH on 2017-09-12

Tue, Sep 19, 2:59 PM · Patch-For-Review, cloud-services-team (Kanban)

Mon, Sep 18

chasemp assigned T176044: Replace kernel and reboot labvirt1015, 1016, 1017, 1018 to Andrew.
Mon, Sep 18, 2:42 PM · Patch-For-Review, cloud-services-team (Kanban)
chasemp triaged T176024: Namespaces not found (pykube, kubernetes) as Normal priority.

IIUC from reading on IRC this is due to k8s not understanding namespaces with an underscore. What ended up happening here?

Mon, Sep 18, 2:27 PM · Toolforge
chasemp added a comment to T176044: Replace kernel and reboot labvirt1015, 1016, 1017, 1018.
Linux labvirt1001 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1002 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1003 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1004 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1005 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1006 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1007 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1008 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1009 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1010 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1011 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1012 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1013 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1014 4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1015 4.4.0-93-generic #116~14.04.1-Ubuntu SMP Mon Aug 14 16:07:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1016 4.4.0-83-generic #106~14.04.1-Ubuntu SMP Mon Jun 26 18:10:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1017 4.4.0-93-generic #116~14.04.1-Ubuntu SMP Mon Aug 14 16:07:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux labvirt1018 4.4.0-83-generic #106~14.04.1-Ubuntu SMP Mon Jun 26 18:10:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Mon, Sep 18, 2:08 PM · Patch-For-Review, cloud-services-team (Kanban)
chasemp added a comment to T174860: Define naming scheme for connecting to new wiki replica cluster.

That's seems really great

Mon, Sep 18, 2:01 PM · Patch-For-Review, cloud-services-team (Kanban), User-bd808, Data-Services
chasemp triaged T176110: Toolforge tool.heritage webservice keeps crashing as Normal priority.
Mon, Sep 18, 2:00 PM · Toolforge, Wiki-Loves-Monuments-Database

Sep 14 2017

chasemp closed T175196: Request increased quota for wikidata-query labs project as Resolved.

Great done, I upped RAM to 72500 for good measure. Keep the quota, we'll come back to it if it's ever a real contention.

Sep 14 2017, 2:26 PM · Discovery, Wikidata, Wikidata-Query-Service, Cloud-VPS (Quota-requests)
chasemp closed T175567: Create a project for Wikimedia Armenia as Resolved.

Created with Ladsgroup as a project admin. Best of luck!

Sep 14 2017, 2:18 PM · Cloud-VPS (Project-requests), cloud-services-team
chasemp claimed T175567: Create a project for Wikimedia Armenia.
Sep 14 2017, 2:17 PM · Cloud-VPS (Project-requests), cloud-services-team
chasemp added a comment to T175712: Install cumin in the WMCS infrastructure.

As a side effect, Beta-Cluster-Infrastructure and Continuous-Integration-Infrastructure would need a way to have a per project cumin master. We don't have access to the WMCS salt master.

The instances are:

deployment-salt02.deployment-prep.eqiad.wmflabs
integration-saltmaster.integration.eqiad.wmflabs

Sep 14 2017, 1:40 PM · Cloud-VPS, Operations-Software-Development

Sep 13 2017

chasemp triaged T175698: Need easier tool for working on redundancy than "Inhalte übernommen" Template Tool (german WP) as Normal priority.
Sep 13 2017, 6:28 PM · Tools
chasemp triaged T175768: Improvements for the Toolforge 'webservice' command as Normal priority.
Sep 13 2017, 6:28 PM · Outreachy (Round-15), Toolforge
chasemp triaged T175774: Rename Tool labs to Toolforge in Persönliche Bekanntschaften tool as Normal priority.
Sep 13 2017, 6:28 PM · Tools, I18n
chasemp triaged T175846: Request creation of Zppix-Wiki-AI VPS project as Normal priority.

We will discuss in our weekly meeting next tuesday :)

Sep 13 2017, 6:28 PM · User-Zppix, Cloud-VPS (Project-requests)
chasemp triaged T161675: Re-think puppet management for deployment-prep as Normal priority.
Sep 13 2017, 2:28 PM · Release-Engineering-Team (Next), User-Joe, Beta-Cluster-Infrastructure, Cloud-Services, Puppet

Sep 12 2017

chasemp created P5997 labservices1001 designate refactor.
Sep 12 2017, 6:19 PM
chasemp created P5996 labservices1002 designate refactor.
Sep 12 2017, 5:59 PM

Sep 7 2017

chasemp added a comment to T174860: Define naming scheme for connecting to new wiki replica cluster.

Looks ok to me. I was worried if underscores would be allowed on dns entries (which some wikis sadly have, which are also wildcards for mysql), but it seems to be accepted (it is only frowned upon on hostnames).

Are you (cloud) going to take care of changing the dns every time a wiki is added?

Sep 7 2017, 2:40 PM · Patch-For-Review, cloud-services-team (Kanban), User-bd808, Data-Services

Sep 6 2017

chasemp created P5966 (An Untitled Masterwork).
Sep 6 2017, 9:26 PM
chasemp created P5965 (An Untitled Masterwork).
Sep 6 2017, 7:28 PM

Sep 5 2017

chasemp created T175077: nova compute hosts disk space alert does not page.
Sep 5 2017, 8:54 PM · Patch-For-Review, Cloud-VPS
chasemp added a comment to T175029: rabbitmq: Consume and log messages sent to notifications.error.

should we try to get this into logstash?

Sep 5 2017, 7:49 PM · Patch-For-Review, cloud-services-team (Kanban), Release-Engineering-Team (Watching / External), Nodepool, Cloud-VPS, Continuous-Integration-Infrastructure
chasemp added a comment to T174860: Define naming scheme for connecting to new wiki replica cluster.

I really dislike foo.labsdb. It is trading all sanity for conciseness I think. I think wikireplica-web.eqiad.wmnet and wikireplica-analytics.eqiad.wmnet are service urls and we should follow some service url standard here with a respective FQDN so that our sanity is preserved. We have a few other use cases (services that should be using service urls) that should fall in line here but this is probably the most painful to change longterm.

Sep 5 2017, 7:48 PM · Patch-For-Review, cloud-services-team (Kanban), User-bd808, Data-Services
chasemp added a comment to T169133: WDQS testing setup platform sizing.

+1'd to grant access to the 300G flavor via meeting

Sep 5 2017, 3:52 PM · cloud-services-team (Kanban), Cloud-VPS (Quota-requests), Discovery, Wikidata-Query-Service, Wikidata
chasemp added a comment to T174618: Request creation of project-smtp VPS project.

+1 -- no name preference

Sep 5 2017, 3:46 PM · User-bd808, cloud-services-team (Kanban), Cloud-VPS (Project-requests)
chasemp added a comment to T169133: WDQS testing setup platform sizing.

I'm making a note to discuss it today in our meeting ;)

Sep 5 2017, 1:07 PM · cloud-services-team (Kanban), Cloud-VPS (Quota-requests), Discovery, Wikidata-Query-Service, Wikidata
chasemp updated subscribers of T175002: db1009 (m5, used primarily for cloud services) unresponsive for minutes.

I'm wondering if this is related:

Sep 5 2017, 12:47 PM · Patch-For-Review, DBA, cloud-services-team

Sep 1 2017

chasemp added a comment to T169133: WDQS testing setup platform sizing.

We could bump up the quota (on the project) temporarily to allow rebuild of those instances with a flavor that has a larger disk. We have a flavor that has 300G of disk already we have granted selectively elsewhere so that would work out. I believe this is a 20G root partition and using the /srv extension puppet manifest to mount the remaining at /srv. We don't have the ability to sanely extend the disk of an existing instance FYI so changing instance sizes is a rebuild.

Sep 1 2017, 1:20 PM · cloud-services-team (Kanban), Cloud-VPS (Quota-requests), Discovery, Wikidata-Query-Service, Wikidata

Aug 31 2017

chasemp changed the destination URL U16 yuvisignal from https://phab.wmfusercontent.org/file/data/krma24kz3esbdmad5a7w/PHID-FILE-fbcs3esuhpjcnsltxvto/yuvi_signal to https://i.imgur.com/8RkQA60.jpg.
Aug 31 2017, 9:34 PM
chasemp created U16 yuvisignal.
Aug 31 2017, 9:33 PM

Aug 30 2017

chasemp added a comment to T166845: monitor some things on all Cloud instances (discussion).

SSH availability via whatever new and fancy Cumin things get setup seems ideal

Aug 30 2017, 8:29 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
chasemp added a comment to T160996: Creation of a Program Committee for Wikimedia developer events (a Wikimania Hackathon session).

Yes, the topics listed in the heads up are just examples of tech topics that are relevant to the strategy. A Program Committee is being formed and will be announced shortly. The PC will select the position papers and structure them into sessions .

Aug 30 2017, 8:28 PM · Wikimania-Hackathon-2017, Developer-Relations (Jul-Sep 2017)
chasemp added a comment to T160996: Creation of a Program Committee for Wikimedia developer events (a Wikimania Hackathon session).

Will there still be a Program Committee for sussing out proposals and such then?

Aug 30 2017, 8:08 PM · Wikimania-Hackathon-2017, Developer-Relations (Jul-Sep 2017)
chasemp updated subscribers of T167357: Determine need and replacement for dmz_cidr configuration in nova-network.

@ayounsi and I spoke about this for a few minutes today. General agreement that the allowance here are too broad and even if we wanted to keep them incomplete. But a better mid-term plan seems to be to reduce this to actual hosts that need to preserve source IP to function (NFS, etc), and to reduce hosts in that category (outside of the labnet boundary) to 0 long term.

Aug 30 2017, 7:50 PM · Cloud-Services
chasemp added a comment to T168584: Labsdb* servers need to be rebooted.

No definite date has been set as we are working on T173511 as the precursor to moving over quarry (and probably PAWS). I think we are talking months as long as we can do it gracefully.

Aug 30 2017, 4:42 PM · Scoring-platform-team (Current), DBA, cloud-services-team, Operations
chasemp added a comment to T41785: Create a labs SMTP smarthost.

That Toolforge mail server is a real mess. It may be the remaining holdover from the early days of un-puppetized things and we have been kicking that can down the road for a good long while. Any chance at improvement there is very welcome.

Aug 30 2017, 4:17 PM · Operations, Cloud-Services, Mail

Aug 29 2017

chasemp assigned T168584: Labsdb* servers need to be rebooted to madhuvishy.
Aug 29 2017, 4:12 PM · Scoring-platform-team (Current), DBA, cloud-services-team, Operations
chasemp added a comment to T174306: Request creation of suggestbot VPS project.

For now using a local DB hosting this outside of Toolforge seems best. Good luck!

Aug 29 2017, 3:52 PM · Cloud-VPS (Project-requests)
chasemp triaged T174306: Request creation of suggestbot VPS project as Normal priority.
Aug 29 2017, 3:40 PM · Cloud-VPS (Project-requests)
chasemp added a comment to T168765: Create Wikiversity Hindi.

Was the maintain-views step not completely performed?

MariaDB [hiwikiversity_p]> show tables;
 Empty set (0.00 sec)

Looks like it is missing on 1001 and 1003 (I assumed Andrew did it there). I will do it in a sec

Done on 1001 and 1003. show tables now works

Aug 29 2017, 2:55 PM · Wiki-Setup (Create), Analytics, Analytics-Wikistats, MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)), Wikidata, User-Urbanecm, Patch-For-Review, Hindi-Sites, Wikimedia-Language-setup, Operations

Aug 28 2017

chasemp added a comment to T156869: Design a method for keeping user-created tables in sync across labsDBs.

@Halfak +1. It was great to get this all on-task though as we will no doubt reference it in the future.

Aug 28 2017, 7:33 PM · Data-Services, DBA

Aug 25 2017

chasemp triaged T172421: Request creation of deep-learning-services VPS project as Normal priority.
Aug 25 2017, 1:06 PM · User-bd808, cloud-services-team (Kanban), Cloud-VPS (Project-requests)
chasemp edited projects for T172421: Request creation of deep-learning-services VPS project, added: Cloud-VPS (Project-requests); removed VPS-project-Wikipedia-Requests.
Aug 25 2017, 1:06 PM · User-bd808, cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Aug 24 2017

chasemp added a comment to T170492: figure out if nodepool is overwhelming rabbitmq and/or nova.

We have been having rabbitmq and/or timeout issues with operations this afternoon. Prior to the first rabbitmq restart here is what I saw.

Aug 24 2017, 7:55 PM · cloud-services-team (Kanban), Release-Engineering-Team (Watching / External), Nodepool, Cloud-VPS, Continuous-Integration-Infrastructure, Patch-For-Review

Aug 22 2017

chasemp added a comment to T165779: rack/setup/install labnet100[34].

Thanks @RobH

Aug 22 2017, 11:50 PM · Cloud-Services, Operations

Aug 21 2017

chasemp added a comment to T169133: WDQS testing setup platform sizing.

How many VMs at that spec (count)?

Aug 21 2017, 3:46 PM · cloud-services-team (Kanban), Cloud-VPS (Quota-requests), Discovery, Wikidata, Wikidata-Query-Service
chasemp added a comment to T173511: Implement technical details and process for "datasets_p" on wikireplica hosts.

@Halfak and I spoke a bit about this this morning. We talked about a SIG for wikireplica things and how this could relate with him being an interested party. I was planning on bringing it up at the cloud-services-team meeting tomorrow as a general point of interest since the near-term work and future of the wikireplica setup has a lot of moving parts and then talking with the DBA crew about their thoughts. Two kind of things here: a one-off consideration for a specific dataset, and then a more general thinking on datasets living alongside the replica data in general.

Aug 21 2017, 3:44 PM · Data-Services, DBA, cloud-services-team (Kanban), Analytics, Research
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.

Mostly silly troubleshooting on my part here :) When truncating the file for lighttpd that is not using O_APPEND the seek position remains since the execution environment process still has the file handle open. Stopping the lighttpd process post truncation makes this clear. We have done cleanup of this variety a bunch of times and it has never really been sussed out, probably as the environment was not really stable enough for us to notice the long term effects and the cleanup is effective when the most serious offenders are usually scheduled jobs or jobs that do not maintain long running file handles on the log file.

Aug 21 2017, 3:35 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T173708: Cannot log into Tool Labs local DB via 'sql' command.

This was one of the first tools on Tool Labs. Maybe some legacy issue?

Aug 21 2017, 1:44 PM · Toolforge

Aug 17 2017

chasemp added a comment to T172911: Add @Niedzielski to the reading-web-staging group in horizon.

Are you a project member or a project admin? I think only admin users can create instances.

Aug 17 2017, 6:43 PM · Horizon, Marvin
chasemp added a comment to T172650: "last" command on WMF Labs/Tools allows users to view IPs of other toolforge users.

I think we should remove the restrictive security ACL here. This is a policy and notification issue rather than a sensitive security issue.

Aug 17 2017, 6:40 PM · User-bd808, Privacy, cloud-services-team (Kanban), Cloud-Services, Security
chasemp triaged T172650: "last" command on WMF Labs/Tools allows users to view IPs of other toolforge users as Normal priority.
Aug 17 2017, 6:39 PM · User-bd808, Privacy, cloud-services-team (Kanban), Cloud-Services, Security
chasemp added a comment to T172911: Add @Niedzielski to the reading-web-staging group in horizon.

floating IP is an unrelated thing to instance creation.

Aug 17 2017, 6:30 PM · Horizon, Marvin
chasemp added a comment to T172650: "last" command on WMF Labs/Tools allows users to view IPs of other toolforge users.

A general notice about the nature of shared platforms that includes the ability for other users to gather connected IP addresses as well as determine status of users and such seems worthwhile.

Aug 17 2017, 6:19 PM · User-bd808, Privacy, cloud-services-team (Kanban), Cloud-Services, Security
chasemp added a comment to T173526: Toolforge intermittent Puppet failures for puppet-enc.

@Andrew I'm wondering if could be related to new puppet masters in any way? Seems roughly I didn't notice this until all that shook out.

Aug 17 2017, 5:54 PM · Cloud-Services
chasemp triaged T173526: Toolforge intermittent Puppet failures for puppet-enc as Normal priority.
Aug 17 2017, 5:53 PM · Cloud-Services