chasemp (Chase)Administrator
Lead Operations Engineer (Wikimedia Cloud Services)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Sep 16 2014, 11:39 AM (152 w, 4 d)
Roles
Administrator
Availability
Available
IRC Nick
chasemp
LDAP User
Rush
MediaWiki User
CPettet (WMF)

Recent Activity

Thu, Aug 17

chasemp added a comment to T172911: Add @Niedzielski to the reading-web-staging group in horizon.

Are you a project member or a project admin? I think only admin users can create instances.

Thu, Aug 17, 6:43 PM · Horizon, Marvin
chasemp added a comment to T172650: "last" command on WMF Labs/Tools allows users to view IPs of other toolforge users.

I think we should remove the restrictive security ACL here. This is a policy and notification issue rather than a sensitive security issue.

Thu, Aug 17, 6:40 PM · Privacy, cloud-services-team (Kanban), Cloud-Services, Security
chasemp triaged T172650: "last" command on WMF Labs/Tools allows users to view IPs of other toolforge users as Normal priority.
Thu, Aug 17, 6:39 PM · Privacy, cloud-services-team (Kanban), Cloud-Services, Security
chasemp added a comment to T172911: Add @Niedzielski to the reading-web-staging group in horizon.

floating IP is an unrelated thing to instance creation.

Thu, Aug 17, 6:30 PM · Horizon, Marvin
chasemp added a comment to T172650: "last" command on WMF Labs/Tools allows users to view IPs of other toolforge users.

A general notice about the nature of shared platforms that includes the ability for other users to gather connected IP addresses as well as determine status of users and such seems worthwhile.

Thu, Aug 17, 6:19 PM · Privacy, cloud-services-team (Kanban), Cloud-Services, Security
chasemp added a comment to T173526: Toolforge intermittent Puppet failures for puppet-enc.

@Andrew I'm wondering if could be related to new puppet masters in any way? Seems roughly I didn't notice this until all that shook out.

Thu, Aug 17, 5:54 PM · Cloud-Services
chasemp triaged T173526: Toolforge intermittent Puppet failures for puppet-enc as Normal priority.
Thu, Aug 17, 5:53 PM · Cloud-Services
chasemp created T173526: Toolforge intermittent Puppet failures for puppet-enc.
Thu, Aug 17, 5:53 PM · Cloud-Services
chasemp added a comment to T127524: Phabricator data dump hasn't run automatically since the Feb 17 2016 upgrade..

Thank you very much chasemp. The goal is just to silence the cron job one way or another then.

Thu, Aug 17, 4:15 PM · Patch-For-Review, Release-Engineering-Team (Kanban), Phabricator
chasemp triaged T171473: labvirt1015 crashes as High priority.
Thu, Aug 17, 4:10 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations
chasemp moved T166845: monitor some things on all Cloud instances (discussion) from Inbox to Needs discussion on the cloud-services-team (Kanban) board.
Thu, Aug 17, 3:57 PM · cloud-services-team (Kanban), Cloud-Services, Cloud-VPS
chasemp closed T169820: Add `wikitech-grep` to puppet as Resolved.
Thu, Aug 17, 3:57 PM · Patch-For-Review, cloud-services-team (Kanban), wikitech.wikimedia.org
chasemp moved T171473: labvirt1015 crashes from Inbox to Needs discussion on the cloud-services-team (Kanban) board.
Thu, Aug 17, 3:57 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations
chasemp edited projects for T171473: labvirt1015 crashes, added: cloud-services-team (Kanban); removed cloud-services-team.
Thu, Aug 17, 3:56 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations
chasemp added a comment to T171473: labvirt1015 crashes.

@Cmjohnson seems like a definite hardware failure to me man, we haven't even put this back in service. Next steps?

Thu, Aug 17, 3:55 PM · cloud-services-team (Kanban), DC-Ops, ops-eqiad, Operations
chasemp triaged T173511: Implement technical details and process for "datasets_p" on wikireplica hosts as Normal priority.
Thu, Aug 17, 3:44 PM · cloud-services-team (Kanban), Analytics, Research
chasemp moved T173511: Implement technical details and process for "datasets_p" on wikireplica hosts from Inbox to Needs discussion on the cloud-services-team (Kanban) board.
Thu, Aug 17, 3:43 PM · cloud-services-team (Kanban), Analytics, Research
chasemp edited projects for T173511: Implement technical details and process for "datasets_p" on wikireplica hosts, added: cloud-services-team (Kanban); removed cloud-services-team.
Thu, Aug 17, 3:42 PM · cloud-services-team (Kanban), Analytics, Research

Wed, Aug 16

chasemp added a comment to T127524: Phabricator data dump hasn't run automatically since the Feb 17 2016 upgrade..

@chasemp ^ In https://github.com/wikimedia/phabricator-tools/blob/master/wmfphablib/rtlib.py can't find rtppl in "from rtppl import ppl as users" . Do you know where rtppl is?

Wed, Aug 16, 8:08 PM · Patch-For-Review, Release-Engineering-Team (Kanban), Phabricator
chasemp added a comment to T170977: Post-migration issues and priorities.

@jsn.sherman I've been noticing your communication style and efforts coordinated through Phabricator lately. Really impressed. As part of the cloud-services-team, thank you for the great work you are doing on the platform. Let us know if we can ever help. That's all :)

Wed, Aug 16, 3:55 PM · Library-Card-Platform
chasemp added a comment to T173402: labsdb1003 BBU failing.

Thanks man, buys us a bit more time

Wed, Aug 16, 1:32 PM · cloud-services-team

Sun, Aug 13

chasemp added a comment to T171829: Prepare and check storage layer for hi.wikiversity.

We are all either at wikimania or on vacation, I expect this will be addressed in the coming week. :)

Sun, Aug 13, 1:28 PM · Data-Services, DBA

Sat, Aug 12

chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.

> lighttpd 26283 tools.admin 5w REG 0,30 599081 80122951 /mnt/nfs/labstore-secondary-tools-project/admin/access.log (nfs-tools-project.svc.eqiad.wmnet:/project/tools/project)

Sat, Aug 12, 7:31 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.

https://gerrit.wikimedia.org/r/#/c/346177/

Sat, Aug 12, 7:10 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.
root@labstore1004:~# du -b /srv/tools/shared/tools/project/admin/access.log
287995	/srv/tools/shared/tools/project/admin/access.log
root@labstore1004:~# truncate --size 0 /srv/tools/shared/tools/project/admin/access.log
root@labstore1004:~# du -b /srv/tools/shared/tools/project/admin/access.log
0	/srv/tools/shared/tools/project/admin/access.log
root@labstore1004:~# du -b /srv/tools/shared/tools/project/admin/access.log
307427	/srv/tools/shared/tools/project/admin/access.log
Sat, Aug 12, 7:09 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.
tools.admin@tools-bastion-03:~$ du -b access.log
2288823046	access.log
tools.admin@tools-bastion-03:~$ ls -i access.log
80122951 access.log
tools.admin@tools-bastion-03:~$ webservice stop
Stopping webservice.
tools.admin@tools-bastion-03:~$ > acc
access.log    access.log.5
tools.admin@tools-bastion-03:~$ > access.log
tools.admin@tools-bastion-03:~$ webservice start
Starting webservice.
tools.admin@tools-bastion-03:~$ du -b access.log && sleep 60
0	access.log
tools.admin@tools-bastion-03:~$ du -b access.log
39680	access.log
Sat, Aug 12, 6:54 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.
tools.admin@tools-bastion-03:~$ truncate --size 0 access.log
tools.admin@tools-bastion-03:~$ du -b access.log
0	access.log
tools.admin@tools-bastion-03:~$ du -b access.log
2288733107	access.log
Sat, Aug 12, 6:49 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.

access.log

tools.admin@tools-bastion-03:~$ du -b access.log
0 access.log

Sat, Aug 12, 6:41 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.

du -b access.log
2288534827 access.log

Sat, Aug 12, 6:39 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.

So python still thinks the original access.log is the 2.2G from pre rotation

Sat, Aug 12, 6:17 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.

2017-08-12 18:01:08,214 DEBUG /srv/tools/shared/tools/project/admin/access.log
2017-08-12 18:01:08,214 DEBUG /srv/tools/shared/tools/project/admin/access.log is 2287903326 bytes
2017-08-12 18:01:08,214 WARNING /srv/tools/shared/tools/project/admin/access.log is larger than 10000000

Sat, Aug 12, 6:10 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.
2017-08-12 17:56:41,618 DEBUG Namespace(config='/etc/logcleanup-config.yaml', debug=True, dir=None, end_with=None, max_copytruncate=None, min_rotate_size=None, rotation_day=None, tail_lines=None)
2017-08-12 17:56:41,621 DEBUG {'rotation_day': 'wednesday', 'end_with': ['log', 'err', 'out'], 'max_copytruncate': 10000000, 'min_rotate_size': 1000, 'debug': True, 'config': '/etc/logcleanup-config.yaml', 'tail_lines': 10000, 'dir': ['/srv/tools/shared/tools/project/admin']}
2017-08-12 17:56:41,621 DEBUG found 1 valid paths
2017-08-12 17:56:41,621 DEBUG Found 5 valid files from
2017-08-12 17:56:41,621 DEBUG /srv/tools/shared/tools/project/admin/toolhistory.err
2017-08-12 17:56:41,621 DEBUG /srv/tools/shared/tools/project/admin/toolhistory.err is 488563 bytes
2017-08-12 17:56:41,622 DEBUG /srv/tools/shared/tools/project/admin/error.log
2017-08-12 17:56:41,622 DEBUG /srv/tools/shared/tools/project/admin/error.log is 51244464 bytes
2017-08-12 17:56:41,622 WARNING /srv/tools/shared/tools/project/admin/error.log is larger than 10000000
2017-08-12 17:56:42,872 DEBUG /srv/tools/shared/tools/project/admin/error.log tailed to /srv/tools/shared/tools/project/admin/error.log.1
2017-08-12 17:56:42,976 INFO truncate /srv/tools/shared/tools/project/admin/error.log
2017-08-12 17:56:42,977 DEBUG /srv/tools/shared/tools/project/admin/toolhistory.out
2017-08-12 17:56:42,977 DEBUG /srv/tools/shared/tools/project/admin/toolhistory.out is 2235922 bytes
2017-08-12 17:56:42,977 DEBUG /srv/tools/shared/tools/project/admin/access.log
2017-08-12 17:56:42,977 DEBUG /srv/tools/shared/tools/project/admin/access.log is 2287822262 bytes
2017-08-12 17:56:42,978 WARNING /srv/tools/shared/tools/project/admin/access.log is larger than 10000000
2017-08-12 17:56:43,031 DEBUG /srv/tools/shared/tools/project/admin/access.log tailed to /srv/tools/shared/tools/project/admin/access.log.1
2017-08-12 17:56:44,262 INFO truncate /srv/tools/shared/tools/project/admin/access.log
2017-08-12 17:56:44,263 DEBUG /srv/tools/shared/tools/project/admin/service.log
2017-08-12 17:56:44,264 DEBUG /srv/tools/shared/tools/project/admin/service.log is 332 bytes
2017-08-12 17:56:44,264 DEBUG /srv/tools/shared/tools/project/admin/service.log is too small to rotate
2017-08-12 17:56:44,264 DEBUG processed 5 logs
Sat, Aug 12, 5:57 PM · Patch-For-Review, Cloud-Services
chasemp added a comment to T152235: Simple logrotate service for users of Tools as stopgap before central logging.
2.2G	/srv/tools/shared/tools/project/admin/access.log
0	/srv/tools/shared/tools/project/admin/admin
123M	/srv/tools/shared/tools/project/admin/archived-packages
12K	/srv/tools/shared/tools/project/admin/bin
49M	/srv/tools/shared/tools/project/admin/error.log
4.0K	/srv/tools/shared/tools/project/admin/logs
0	/srv/tools/shared/tools/project/admin/public_html
4.0K	/srv/tools/shared/tools/project/admin/replica.my.cnf
4.0K	/srv/tools/shared/tools/project/admin/service.log
4.0K	/srv/tools/shared/tools/project/admin/service.manifest
12M	/srv/tools/shared/tools/project/admin/tool-admin-web
484K	/srv/tools/shared/tools/project/admin/toolhistory.err
2.2M	/srv/tools/shared/tools/project/admin/toolhistory.out
4.0K	/srv/tools/shared/tools/project/admin/toolinfo.json
60M	/srv/tools/shared/tools/project/admin/var
Sat, Aug 12, 3:15 PM · Patch-For-Review, Cloud-Services

Wed, Aug 9

chasemp triaged T172899: Require a Phabricator account as a prerequisite to getting Toolforge access as Normal priority.
Wed, Aug 9, 3:23 PM · Striker

Mon, Aug 7

chasemp added a comment to T171746: Determining the plan for the maps-test cluster.

Let's chat folks! We can make special disk arrangements potentially and maybe work something out. I'm not sure how much is already budgeted here that we would reallocate for a "Cloud" solution but there are options.

Mon, Aug 7, 8:00 PM · Maps-Sprint, Discovery, Maps
chasemp added a comment to T172650: "last" command on WMF Labs/Tools allows users to view IPs of other toolforge users.

This probably needs some more serious discussion about the nature of tenants within Toolforge. Offhand we could set perms on utilities of this kind that do not allow everyone to run them, but roots/admins obviously have need. But their is a long tail of information available about Tool owners to other Tool owners within the ecosystem and I'm not sure our comfort level with that has ever been well defined.

Mon, Aug 7, 7:13 PM · Privacy, cloud-services-team (Kanban), Cloud-Services, Security
chasemp added a comment to T168751: Wikimania Hack Volunteer Group: Help Desk.

Telegram seems ok to me, unless someone from https://docs.google.com/spreadsheets/d/1Eo4cAGxw6bjy2fXbrF8UJE_8SL3yiVJ_-2DpBKlnbqo/edit#gid=0 objects. I'm ok with @Rfarrand just telling us what to use also so gchat works for me :)

Mon, Aug 7, 3:33 PM · Wikimania-Hackathon-2017
chasemp added a comment to T168751: Wikimania Hack Volunteer Group: Help Desk.

I mixed up signal and telegram :)

Mon, Aug 7, 3:28 PM · Wikimania-Hackathon-2017
chasemp added a comment to T168751: Wikimania Hack Volunteer Group: Help Desk.

I haven't used signal yet, but if it's otherwise an official medium for the event it probably makes sense. I'm open to anything but in quick succession:

Mon, Aug 7, 3:05 PM · Wikimania-Hackathon-2017
chasemp added a comment to T168751: Wikimania Hack Volunteer Group: Help Desk.

Maybe a specific irc, hangout, or signals chat group for folks who are interested in help desk coverage? Esp for those of us in the random category it would help to keep tabs on asks for coverage and would help ease an irc/phab/gchat/hangout split.

Mon, Aug 7, 2:49 PM · Wikimania-Hackathon-2017

Sun, Aug 6

chasemp added a comment to T172628: conf2002 etcdmirror-conftool-eqiad-wmnet died.

I think the relevant portion is probably "...or if the lag is large enough that we're losing etcd events"

Sun, Aug 6, 1:44 AM · Operations
chasemp raised the priority of T172628: conf2002 etcdmirror-conftool-eqiad-wmnet died from Normal to Unbreak Now!.
Sun, Aug 6, 1:38 AM · Operations
chasemp added a comment to T172628: conf2002 etcdmirror-conftool-eqiad-wmnet died.

Seems like this is dying really soon post restart

Sun, Aug 6, 1:37 AM · Operations
chasemp triaged T172628: conf2002 etcdmirror-conftool-eqiad-wmnet died as Normal priority.
Sun, Aug 6, 1:32 AM · Operations
chasemp created T172628: conf2002 etcdmirror-conftool-eqiad-wmnet died.
Sun, Aug 6, 1:31 AM · Operations

Fri, Aug 4

chasemp triaged T172567: Data missing from labs replica of enwiki.imagelinks as Normal priority.

@Reedy, any difference if you hit the in-progress new labsdb cluster @ labsdb-web.eqiad.wmnet?

Fri, Aug 4, 9:07 PM · Data-Services
chasemp awarded T160996: Creation of a Program Committee for Wikimedia developer events (a Wikimania Hackathon session) a Orange Medal token.
Fri, Aug 4, 12:29 PM · Wikimania-Hackathon-2017, Developer-Relations (Jul-Sep 2017)

Thu, Aug 3

chasemp updated subscribers of T171494: Refactor OpenStack Puppet to account for Neutron.

Issues from earlier and resolutions:

Thu, Aug 3, 9:43 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp added a comment to T171494: Refactor OpenStack Puppet to account for Neutron.
  • labstores suddently got mitaka clientlibs?

:* they were set to mitaka since fa97c633fd8 but were not getting cloudrepo to match prior (I think)

  • shinken role wants to see mitaka version
    • modules/shinken/manifests/shinkengen.pp
  • tools-checker-01/02 have open stack::clientlib….from where?
  • openstack2::cloudrepo pulls from openstack module
  • novaproxy-02.project-proxy.eqiad.wmflabs has openstack::clientlib
    • novaproxy-01.project-proxy.eqiad.wmflabs does /not/
  • util-abogott.testlabs.eqiad.wmflabs has openstack::clientlib
Thu, Aug 3, 2:30 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)

Wed, Aug 2

chasemp added a comment to T106937: Monitor [[Special:ListFiles]] for non 200 HTTP statuses in thumbnails.

@fgiunchedi ack, we don't have a ton of checks running w/ concurrency but @15m intervals seems sane. Let's let it ride for awhile and check up with ongoing points usage.

Wed, Aug 2, 6:26 PM · User-fgiunchedi, media-storage, Commons, Operations, monitoring
chasemp added a comment to T170492: figure out if nodepool is overwhelming rabbitmq and/or nova.

Also, not a terrible idea as we start forcing rabbit back into physical RAM for us to consider giving this its own server.

Wed, Aug 2, 6:23 PM · cloud-services-team (Kanban), Release-Engineering-Team (Watching / External), Nodepool, Cloud-VPS, Continuous-Integration-Infrastructure, Patch-For-Review
chasemp added a comment to T170492: figure out if nodepool is overwhelming rabbitmq and/or nova.

A few thoughts on this phenom. I'm not sure if rabbit components swapping is really a total antipattern here but it's worth talking about for a few reasons. 1) clearly rabbit falls over sick at various points 2) rabbit has java-esque memory behaviors 3) it's the most common thread I've seen that, at least to this point, may correlate. We haven't been watching it very well though.

Wed, Aug 2, 6:21 PM · cloud-services-team (Kanban), Release-Engineering-Team (Watching / External), Nodepool, Cloud-VPS, Continuous-Integration-Infrastructure, Patch-For-Review
chasemp added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

ping @madhuvishy hopefully will have some time to read up on the manuals :)

Wed, Aug 2, 4:24 PM · Patch-For-Review, ops-eqiad, Operations, Cloud-Services
chasemp awarded Blog Post: Toolforge provides proxied mirrors of cdnjs and now fontcdn, for your usage and user-privacy a Love token.
Wed, Aug 2, 2:56 PM · Toolforge-standards-committee, Cloud-VPS, Toolforge
chasemp added a comment to T170492: figure out if nodepool is overwhelming rabbitmq and/or nova.
/home/rush# sudo bash swap_stat.sh
inet_gethost (10199) 68 kB
inet_gethost (10200) 56 kB
screen (16619) 4 kB
cat: /proc/17560/smaps: No such file or directory
rabbitmq-server (9838) 88 kB
beam.smp (9849) 316992 kB
Wed, Aug 2, 2:55 PM · cloud-services-team (Kanban), Release-Engineering-Team (Watching / External), Nodepool, Cloud-VPS, Continuous-Integration-Infrastructure, Patch-For-Review
chasemp added a comment to T167984: rack/setup/install labstore100[67].wikimedia.org.

@chasemp Do you know the raid cfg you want? The server has (12) 3.5 6Tb disks and (2) 2.5" disk, the disk shelf has (12) 3.5" 6TB disks. I would think the 2 smallers disk are raid 1 and then raid 10 for the other 2. Please confirm.

Wed, Aug 2, 2:45 PM · Patch-For-Review, ops-eqiad, Operations, Cloud-Services

Tue, Aug 1

chasemp added a comment to T170492: figure out if nodepool is overwhelming rabbitmq and/or nova.

I did a small bit of poking today. It seems we are using swap on labcontrol1001 and it's either a common cause or a root symptom for at least the rabbitmq portion I'm guessing.

Tue, Aug 1, 11:04 PM · cloud-services-team (Kanban), Release-Engineering-Team (Watching / External), Nodepool, Cloud-VPS, Continuous-Integration-Infrastructure, Patch-For-Review
chasemp added a comment to T172186: Increased quota for analytics project in Cloud.

+1

Tue, Aug 1, 3:24 PM · Cloud-VPS (Quota-requests)
chasemp added a comment to T172034: Request creation of pluggableauth VPS project.

+1

Tue, Aug 1, 3:23 PM · Cloud-VPS (Project-requests)
chasemp triaged T172034: Request creation of pluggableauth VPS project as Normal priority.
Tue, Aug 1, 3:23 PM · Cloud-VPS (Project-requests)
chasemp renamed T172186: Increased quota for analytics project in Cloud from Increase quota for analytics project in Cloud to Temporary increased quota for analytics project in Cloud.
Tue, Aug 1, 3:20 PM · Cloud-VPS (Quota-requests)
chasemp added a comment to T172186: Increased quota for analytics project in Cloud.

I'm reading this as requesting total capacity increase on top of existing for 4 medium sized instances.

Tue, Aug 1, 3:16 PM · Cloud-VPS (Quota-requests)
chasemp triaged T172186: Increased quota for analytics project in Cloud as Normal priority.
Tue, Aug 1, 3:15 PM · Cloud-VPS (Quota-requests)
chasemp edited Description on Cloud-VPS.
Tue, Aug 1, 3:14 PM
chasemp edited Description on Cloud-VPS.
Tue, Aug 1, 3:13 PM

Mon, Jul 31

chasemp added a comment to T171494: Refactor OpenStack Puppet to account for Neutron.

Comments from gerrit @Andrew and I talked about on IRC:

Mon, Jul 31, 2:06 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)

Fri, Jul 28

chasemp renamed T171494: Refactor OpenStack Puppet to account for Neutron from Refactor openstack Puppet to account for Neutron to Refactor OpenStack Puppet to account for Neutron.
Fri, Jul 28, 10:41 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp added a comment to T171494: Refactor OpenStack Puppet to account for Neutron.

Deployed https://gerrit.wikimedia.org/r/#/c/368321/ and rolled it back yesterday, nothing blew up but I figured out I had been testing for some portion of my PoC with the labs hiera tree set. I originally tested value lookup for labs as some params are shared between labs and prod and I was looking at the tree overlap. This was misleading however for a few reason: common has expand_path in prod but not labs creating completely different lookup behavior, and though we do put values in common.yaml at the top of the hiera tree for prod now it is only looked at via hiera_hash() lookups scattered throughout the wmcs puppet code for openstack. That means common.yaml is seen for prod purposes via hiera_hash() lookups and from labs instances via normal lookups. That has the potential to be very confusing and misleading since anything falling under the role() function will not inspect this file in the same way using the nuyaml backend.

Fri, Jul 28, 6:13 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp created P5813 (An Untitled Masterwork).
Fri, Jul 28, 12:18 AM

Thu, Jul 27

chasemp added a comment to T167559: Create a detailed migration plan for implementing Neutron as our OpenStack SDN layer.

groupings at the moment

Thu, Jul 27, 10:54 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp updated the task description for T171494: Refactor OpenStack Puppet to account for Neutron.
Thu, Jul 27, 9:12 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp added a comment to T171494: Refactor OpenStack Puppet to account for Neutron.

Prod puppet backends and tree:

Thu, Jul 27, 6:25 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)

Tue, Jul 25

chasemp added a comment to T171538: Degraded RAID on labsdb1001.

thanks you @Cmjohnson

Tue, Jul 25, 7:19 PM · Patch-For-Review, cloud-services-team (Kanban), Data-Services, ops-eqiad, Operations
chasemp added a comment to T171618: Create a "state of the cloud" monthly report.

/me waves to @Jprorama

Tue, Jul 25, 5:52 PM · Cloud-Services
chasemp added a comment to T171596: edge MASK lightning.
- F8848088 (PhabricatorFile) F8848088: EDGE MASK edge lighting rounded corners of S8_v1.45_apkpure.com.apk
Tue, Jul 25, 3:00 PM · Trash
chasemp closed T171591: Requesting access to Toolforge for fajr18 as Resolved.

@chasemp I've found https://toolsadmin.wikimedia.org/tools/membership/status/51 and IIRC I saw this account registering yesterday on Wikitech. Maybe this is a duplicate of the toolsadmin request?

Tue, Jul 25, 2:55 PM · Toolforge
chasemp renamed T171591: Requesting access to Toolforge for fajr18 from Requesting access to RESOURCE for fajr18 to Requesting access to Toolforge for fajr18.
Tue, Jul 25, 2:25 PM · Toolforge
chasemp triaged T171591: Requesting access to Toolforge for fajr18 as Normal priority.

I notice this account already exists in LDAP:

Tue, Jul 25, 2:24 PM · Toolforge

Mon, Jul 24

chasemp added a comment to T171538: Degraded RAID on labsdb1001.
# cat /proc/mdstat
Personalities :
unused devices: <none>
Mon, Jul 24, 11:22 PM · Patch-For-Review, cloud-services-team (Kanban), Data-Services, ops-eqiad, Operations
chasemp added a project to T171538: Degraded RAID on labsdb1001: cloud-services-team (Kanban).
Mon, Jul 24, 11:15 PM · Patch-For-Review, cloud-services-team (Kanban), Data-Services, ops-eqiad, Operations
chasemp assigned T171538: Degraded RAID on labsdb1001 to Cmjohnson.

I think this must be one of the two RAID1 drives for the OS itself rather than a drive in the RAID0 data array. We should really get this changed out tomorrow if at all possible then. We are on borrowed time :)

Mon, Jul 24, 11:13 PM · Patch-For-Review, cloud-services-team (Kanban), Data-Services, ops-eqiad, Operations
chasemp added a comment to T171494: Refactor OpenStack Puppet to account for Neutron.

Due to the intermixed nature of some inherent dependencies (our model of include for openstack::repo at the module level, and the role level indiscriminately as an example) there will be periods of bad state for certain host roles. i.e. we need to pull out the dependencies and consolidate and then refactor up some things from module to profile level and use require or some such. I have communicated this to the Cloud-Services team but wanted to make a note here explicitly. I will try to keep a narrative on task of current state.

Mon, Jul 24, 6:01 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp created T171494: Refactor OpenStack Puppet to account for Neutron.
Mon, Jul 24, 5:39 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp closed T168892: rack/setup/install labtestservices2002.wikimedia.org as Resolved.

ongoing implementation tracked in T167559

Mon, Jul 24, 5:37 PM · Patch-For-Review, ops-codfw, Cloud-VPS, Operations
chasemp added a comment to T171188: Move the main WMCS puppetmaster into the Labs realm.

My understanding of this is we are looking at #1 as the current compromise short of moving services into the the Labs realm directly, though I believe in this case making the masters themselves instances is the best eventual outcome. A few decent sized unknowns for me are: we have one base image that expects an external puppetmaster (even for project masters) and would need to figure out some special bootstrap process for the masters themselves (and feel really sure its not going to be broken in the large intervals we come back around to it), and we haven't thought through managing this puppetmaster within the context of an instance at all. I don't feel like we have the bandwidth to bite this off directly right now. My vote is pursuing the course of action in-flight to decouple the puppetmaster from labcontrol, put in the public VLAN with the new hardware, firewall off from non-instances, and make notes for portions of this process that would effect a next-stage of converting to an instance. I think most of the in progress work here needs to be done for either outcome.

Mon, Jul 24, 5:32 PM · Puppet, Cloud-VPS, Operations
chasemp closed T168894: rack/setup/install labtestcontrol2003.wikimedia.org as Resolved.

closing this as further implementation will be tracked in other tasks

Mon, Jul 24, 3:07 PM · Patch-For-Review, ops-codfw, Cloud-VPS, Operations
chasemp closed T168893: rack/setup/install labtestservices2003.wikimedia.org as Resolved.

closing this as further implementation will be tracked in other tasks

Mon, Jul 24, 3:07 PM · Patch-For-Review, ops-codfw, Cloud-VPS, Operations
chasemp added a comment to T106937: Monitor [[Special:ListFiles]] for non 200 HTTP statuses in thumbnails.

@fgiunchedi it depends on what we want to watch move. We already have a number of emulated/chrome checks that could double as thumbnail canaries. If there is a particular page(s) that would demonstrate this failure early then an additional check(s) makes sense to me. Right now it's more or less all project homepages. One good note there is we do run cached and uncached checks.

Mon, Jul 24, 2:04 PM · User-fgiunchedi, media-storage, Commons, Operations, monitoring

Jul 19 2017

chasemp created P5766 (An Untitled Masterwork).
Jul 19 2017, 8:58 PM
chasemp added a comment to T170843: Determine where to host zim files for the Android app.

I'm confused on if this is archival content or something actively used by the app?

Jul 19 2017, 8:54 PM · Operations, Traffic, Reading-Infrastructure-Team-Backlog (Kanban), Wikipedia-Android-App-Backlog, Android-app-feature-Compilations

Jul 17 2017

chasemp updated the task description for T167559: Create a detailed migration plan for implementing Neutron as our OpenStack SDN layer.
Jul 17 2017, 5:07 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp updated the task description for T167559: Create a detailed migration plan for implementing Neutron as our OpenStack SDN layer.
Jul 17 2017, 5:05 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp added a comment to T167559: Create a detailed migration plan for implementing Neutron as our OpenStack SDN layer.

As part of the larger T167293 we know we need to move instances ultimately to an different model that is not compatible with our current setup. Separating nova-compute and nova-api from nova-network means configurations that are mutually exclusive within nova.conf such as network_api_class = nova.network.neutronv2.api.API. Neutron itself has a separate model where ports, subnets, networks, metadata-proxy, dhcpd, and tenant routers are all first class objects and independent instead of loosely attached to a tenant.

Jul 17 2017, 2:49 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp updated the task description for T167559: Create a detailed migration plan for implementing Neutron as our OpenStack SDN layer.
Jul 17 2017, 1:39 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp updated the task description for T167559: Create a detailed migration plan for implementing Neutron as our OpenStack SDN layer.
Jul 17 2017, 1:38 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp updated the task description for T167559: Create a detailed migration plan for implementing Neutron as our OpenStack SDN layer.
Jul 17 2017, 1:37 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)
chasemp closed T153099: Initial OpenStack Neutron PoC deployment in Labtest as Resolved.

Resolving in favor of T167559 which will have more details and hopefully some authoritative plan. I don't want to split commentary across the two tasks at this point.

Jul 17 2017, 1:35 PM · cloud-services-team (Kanban), Cloud-Services, Operations
chasemp closed T153099: Initial OpenStack Neutron PoC deployment in Labtest, a subtask of T167293: Nova-network to Neutron migration, as Resolved.
Jul 17 2017, 1:35 PM · Cloud-VPS, Epic, Cloud-Services
chasemp added a subtask for T167293: Nova-network to Neutron migration: T167559: Create a detailed migration plan for implementing Neutron as our OpenStack SDN layer.
Jul 17 2017, 1:34 PM · Cloud-VPS, Epic, Cloud-Services
chasemp added a parent task for T167559: Create a detailed migration plan for implementing Neutron as our OpenStack SDN layer: T167293: Nova-network to Neutron migration.
Jul 17 2017, 1:34 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18)