chasemp (Chase)Administrator
Lead Operations Engineer (Wikimedia Cloud Services)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Sep 16 2014, 11:39 AM (169 w, 4 d)
Roles
Administrator
Availability
Available
IRC Nick
chasemp
LDAP User
Rush
MediaWiki User
CPettet (WMF)

Recent Activity

Yesterday

chasemp updated the task description for T183035: labtestvirt2003 does not survive reboot on normal labvirt kernel of 4.4.0-81-generic.
Fri, Dec 15, 9:12 PM · Cloud-VPS, cloud-services-team
chasemp updated subscribers of T183035: labtestvirt2003 does not survive reboot on normal labvirt kernel of 4.4.0-81-generic.

Handing to @Andrew to follow in my footsteps to repo so as to prove I'm not crazy :)

Fri, Dec 15, 9:08 PM · Cloud-VPS, cloud-services-team
chasemp added a comment to T183035: labtestvirt2003 does not survive reboot on normal labvirt kernel of 4.4.0-81-generic.

I rebooted labtestvirt2002 which is on 4.4.0-81-generic and it came back fine. However, racktables says this is totally and completely different hardware. Labtestvirt2003 is actually fairly new T166237.

Fri, Dec 15, 9:08 PM · Cloud-VPS, cloud-services-team
chasemp updated the task description for T183035: labtestvirt2003 does not survive reboot on normal labvirt kernel of 4.4.0-81-generic.
Fri, Dec 15, 9:01 PM · Cloud-VPS, cloud-services-team
chasemp triaged T183035: labtestvirt2003 does not survive reboot on normal labvirt kernel of 4.4.0-81-generic as High priority.
Fri, Dec 15, 9:00 PM · Cloud-VPS, cloud-services-team
chasemp created T183035: labtestvirt2003 does not survive reboot on normal labvirt kernel of 4.4.0-81-generic.
Fri, Dec 15, 9:00 PM · Cloud-VPS, cloud-services-team
chasemp triaged T182997: Heavy queries from s51187, unlikely to finish? as Normal priority.
Fri, Dec 15, 7:48 PM · XTools
chasemp created P6475 (An Untitled Masterwork).
Fri, Dec 15, 7:45 PM
chasemp added a comment to T182867: "Login to Wikidata as QuickStatementsBot from a computer you have not recently used".

I started getting similar messages at the same time for a bot for which I am the contact on enwiki. I disabled the notification itself in preferences. Possibly default changed? The k8s incident T182722: DNS resolution failing from webservices running on Kubernetes seems in no way related to this.

Fri, Dec 15, 2:40 PM · Community-Tech, User-notice, MediaWiki-Email, MediaWiki-User-login-and-signup, Toolforge
chasemp created P6474 (An Untitled Masterwork).
Fri, Dec 15, 2:01 PM

Thu, Dec 14

chasemp added a comment to T150850: Decouple roles from mariadb.pp into their own file.

This should be fully fixed now? No cloud issues?

Thu, Dec 14, 3:12 PM · Patch-For-Review, Epic, DBA
chasemp added a comment to T180809: tools cluster: pending linux kernel upgrades.

We recently fought with https://phabricator.wikimedia.org/T182722 which involved rebooting workers. We had been sitting on pending kernel updates for Debian instances in T180809 because WMF unattended pulled in new kernels. At the moment the workers are sitting on 4.9.0-0.bpo.4-amd64 now and all other Debian instances in Tools are sitting on 4.4.0-3-amd64. Considering the historical virtio issues and the nightmare of debug I feel like this reinforces our strategy outlined in https://phabricator.wikimedia.org/T181647 to make make managing updates explicit and ongoing for Toolforge (and novaproxy or other WMCS managed resources).

Thu, Dec 14, 2:23 PM · Tools, cloud-services-team
chasemp updated the task description for T181647: create 'attended' upgrade workflow for cloud with Toolforge as canonical case.
Thu, Dec 14, 2:08 PM · Patch-For-Review, Toolforge, cloud-services-team
chasemp added a comment to T181647: create 'attended' upgrade workflow for cloud with Toolforge as canonical case.

Change 394200 merged by Rush:
[operations/puppet@production] cloud: setup for attended upgrade process

https://gerrit.wikimedia.org/r/394200

Thu, Dec 14, 2:07 PM · Patch-For-Review, Toolforge, cloud-services-team
chasemp added a comment to P6464 attended workflow in tools change.

root@shinken-01:~# puppet agent --test

Thu, Dec 14, 2:07 PM
chasemp added a comment to P6464 attended workflow in tools change.

root@novaproxy-01:~# puppet agent --test

Thu, Dec 14, 2:06 PM
chasemp created P6464 attended workflow in tools change.
Thu, Dec 14, 2:06 PM
chasemp added a comment to T150850: Decouple roles from mariadb.pp into their own file.

The powerdns authoritative servers have database backends

Thu, Dec 14, 1:50 PM · Patch-For-Review, Epic, DBA

Wed, Dec 13

chasemp closed T182781: Puppet broken on labstore1004 as Resolved.
Wed, Dec 13, 11:22 PM · Operations, cloud-services-team
chasemp reopened T60865: [Epic] Toolserver.org tools that have not been migrated as "Open".
Wed, Dec 13, 11:11 PM · Epic, Tools
chasemp reopened T60865: [Epic] Toolserver.org tools that have not been migrated, a subtask of T60788: Toolserver migration to Tools (tracking), as Open.
Wed, Dec 13, 11:11 PM · User-bd808, Cloud-Services, Tracking, Toolforge
chasemp triaged T182820: Please upgrade python on the newest version on cloud vps as Low priority.

I think it's possible you mean on Toolforge itself since that is the Toolforge bastion rather than all of VPS. It's unclear what you intend as the 'newest' version of Python, as such it's unclear if you should use /usr/bin/python3 as your interpreter.

Wed, Dec 13, 8:48 PM · Cloud-VPS
chasemp added a comment to T176361: Run performance tests using local proxy.

@chasemp do you expect Google Cloud to have the same properties as AWS in that respect? Because I'm benchmarking GCE at the moment for this since I got a bunch of free credit there.

Wed, Dec 13, 5:58 PM · Performance-Team
chasemp added a comment to T176361: Run performance tests using local proxy.

@chasemp When you say "due to reservations", are you referring to underlying resource reservations (eg, core pinning on the underlying CPUs)? Or are you referring to reservations in the AWS reserved instance sense?

Wed, Dec 13, 3:35 PM · Performance-Team
chasemp added a comment to T182781: Puppet broken on labstore1004.
root@labstore1005:~# aptitude search runit
p   r-cran-runit                                  - GNU R package providing unit testing framework
p   runit                                         - system-wide service supervision
root@labstore1005:~# aptitude search vblade-persist
p   vblade-persist                                - create/manage supervised AoE exports
Wed, Dec 13, 3:24 PM · Operations, cloud-services-team
chasemp updated subscribers of T182781: Puppet broken on labstore1004.

I don't understand why this behavior started recently as it appears runit would have been installed since sept 6 (thanks @akosiaris )

Wed, Dec 13, 3:24 PM · Operations, cloud-services-team
chasemp added a comment to T182781: Puppet broken on labstore1004.
root@labstore1004:~# apt-get remove --purge runit
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libapr1 libconfuse-common libconfuse0 libpgm-5.1-0 libsodium13 libzmq3 python-dateutil python-jinja2
  python-m2crypto python-markupsafe python-zmq vblade
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  runit*
0 upgraded, 0 newly installed, 1 to remove and 54 not upgraded.
After this operation, 393 kB disk space will be freed.
Do you want to continue? [Y/n] Y
(Reading database ... 64240 files and directories currently installed.)
Removing runit (2.1.2-3) ...
Removing SV inittab entry...
Purging configuration files for runit (2.1.2-3) ...
Processing triggers for man-db (2.7.0.2-5) ...
Wed, Dec 13, 3:21 PM · Operations, cloud-services-team
chasemp added a comment to T182781: Puppet broken on labstore1004.
root@labstore1004:~# apt-get remove --purge vblade-persist
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libapr1 libconfuse-common libconfuse0 libpgm-5.1-0 libsodium13 libzmq3 python-dateutil python-jinja2
  python-m2crypto python-markupsafe python-zmq runit vblade
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  vblade-persist*
0 upgraded, 0 newly installed, 1 to remove and 54 not upgraded.
After this operation, 98.3 kB disk space will be freed.
Do you want to continue? [Y/n] Y
(Reading database ... 64253 files and directories currently installed.)
Removing vblade-persist (0.6-2) ...
Processing triggers for man-db (2.7.0.2-5) ...
Wed, Dec 13, 3:20 PM · Operations, cloud-services-team
chasemp added a comment to T182781: Puppet broken on labstore1004.

root@labstore1004:~# aptitude why runit
i vblade-persist Depends runit (>= 1.8.0-2)

Wed, Dec 13, 3:12 PM · Operations, cloud-services-team
chasemp updated the task description for T182781: Puppet broken on labstore1004.
Wed, Dec 13, 3:10 PM · Operations, cloud-services-team
chasemp triaged T182781: Puppet broken on labstore1004 as High priority.
Wed, Dec 13, 3:09 PM · Operations, cloud-services-team
chasemp created T182781: Puppet broken on labstore1004.
Wed, Dec 13, 3:09 PM · Operations, cloud-services-team
chasemp renamed T182779: labcontrol1002 puppet broken due to prometheus rabbit exporter from labcontrol* puppet broken due to prometheus rabbit exporter to labcontrol1002 puppet broken due to prometheus rabbit exporter .
Wed, Dec 13, 3:06 PM · cloud-services-team, Operations
chasemp triaged T182779: labcontrol1002 puppet broken due to prometheus rabbit exporter as Normal priority.
Wed, Dec 13, 3:05 PM · cloud-services-team, Operations
chasemp added a comment to T176361: Run performance tests using local proxy.

I'm wondering about the 95% cases in all three: cloud vps, aws, and metal. I thought about it for awhile and the best guess I have is that median is more consistent in the AWS case due to reservations. I would potentially expect metal to be the most performant and potentially variable depending, cloud vps to be the least performant and probably wildly variable atm, and aws to be middling performance but the most consistent if this is the case. In reflecting I can understand why median and consistency is the most important baseline here. We could explore running the test cases with the same intentional resource limits in all three cases and see if that brings results inline.

Wed, Dec 13, 2:51 PM · Performance-Team
chasemp closed T182663: labtestwiki memcache errors, lots of requests from parsoid to labtestweb2001 as Resolved.
Wed, Dec 13, 2:47 PM · Parsoid, cloud-services-team, MediaWiki-General-or-Unknown
chasemp updated subscribers of T182722: DNS resolution failing from webservices running on Kubernetes.
Wed, Dec 13, 2:44 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
chasemp added a comment to T182722: DNS resolution failing from webservices running on Kubernetes.

Note ferm had been broken in Toolforge for a good long while looking for IPv6 addresses that cannot be fulfilled in that context but can in prod. https://phabricator.wikimedia.org/T179955#3831513 is relevant as well

Wed, Dec 13, 2:34 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
chasemp lowered the priority of T182722: DNS resolution failing from webservices running on Kubernetes from Unbreak Now! to Normal.
Wed, Dec 13, 2:30 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
chasemp updated the task description for T182722: DNS resolution failing from webservices running on Kubernetes.
Wed, Dec 13, 2:30 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
chasemp added a comment to T182722: DNS resolution failing from webservices running on Kubernetes.
  • https://gerrit.wikimedia.org/r/#/c/397879/ is applied
  • tools.wmflabs.org seems to be doing down intermittently
  • several Tools report connectivity issues
  • investigation turns up that some nodes are in a bad state completely
  • diagnose reboot or ordered service restart fixes
Wed, Dec 13, 2:23 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
chasemp closed T182769: tools-mail is failing to run puppet for more than a day as Resolved.
root@tools-mail:~# puppet agent --test
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for tools-mail.tools.eqiad.wmflabs
Info: Applying configuration version '1513173558'
Notice: Finished catalog run in 9.23 seconds
root@tools-mail:~#
Wed, Dec 13, 2:01 PM · Toolforge
chasemp added a comment to T182769: tools-mail is failing to run puppet for more than a day.
1392  2017-12-13 13:52:32 puppet agent --test
1393  2017-12-13 13:53:24 htop
1394  2017-12-13 13:53:48 service nslcd stop
1395  2017-12-13 13:53:50 htop
1396  2017-12-13 13:53:58 service nslcd start
1397  2017-12-13 13:54:00 htop
1398  2017-12-13 13:54:14 puppet agent --test
1399  2017-12-13 13:54:55 htop
1400  2017-12-13 13:58:00 service nscd stop
1401  2017-12-13 13:58:03 man nscd
1402  2017-12-13 13:58:10 nscd -i host
1403  2017-12-13 13:58:13 nscd -i user
1404* 2017-12-13 13:58:23 service nslcd start
1405  2017-12-13 13:58:28 service nscd status
1406  2017-12-13 13:58:34 service nscd start
Wed, Dec 13, 2:00 PM · Toolforge
chasemp triaged T182769: tools-mail is failing to run puppet for more than a day as Normal priority.
Wed, Dec 13, 1:57 PM · Toolforge

Tue, Dec 12

chasemp added a comment to T181647: create 'attended' upgrade workflow for cloud with Toolforge as canonical case.

Can you move that to be under https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin? 😄

Tue, Dec 12, 4:26 PM · Patch-For-Review, Toolforge, cloud-services-team
chasemp triaged T182142: Diffusion repository creation fails via toolsadmin as Normal priority.
Tue, Dec 12, 3:24 PM · Patch-For-Review, Diffusion, Striker
chasemp assigned T182663: labtestwiki memcache errors, lots of requests from parsoid to labtestweb2001 to Andrew.
Tue, Dec 12, 12:41 PM · Parsoid, cloud-services-team, MediaWiki-General-or-Unknown

Mon, Dec 11

chasemp added a comment to T166845: monitor some things on all Cloud instances (discussion).

Note: use cumin from labpuppetmaster*

Mon, Dec 11, 6:22 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
chasemp triaged T182604: tools-static is throwing space warnings due to cdnjs git repo size as High priority.
Mon, Dec 11, 4:34 PM · Toolforge
chasemp added a comment to T176361: Run performance tests using local proxy.

We ran this on spare bare metal servers Ops sourced for us (one without SSDs, one with an SSDs) and it still didn't give us test stability anywhere near AWS.

Mon, Dec 11, 3:11 PM · Performance-Team
chasemp added a comment to T176361: Run performance tests using local proxy.

@chasemp I think we need your help here and guide us in the right direction. Let me do a summary:

The long term goal for us is to run performance tests on commits (opening a browser, access a URL, collect metrics). The current solution we have is using a Docker container with Chrome, FFMPEG that records the browser screen so it can be analyzed (for example to get when first pixel is painted on the screen) and WebPageReplay which records the page and replay it to the browser so we get the same content when we do multiple runs and then add network filter to slow it down. We then do X runs and take the median of each metrics. When we do this on AWS the metrics is pretty constant. For example testing our desktop site recording a video on 30 fps gives a diff of 33 ms on a c4.large instance. For mobile (smaller screen) we can use 60 fps to get even better numbers.

If we do the same on VPS or bare metal servers we don't get the same stable metrics. Running on AWS is fine for replacing the testing we already do today with more stable metrics, but for the long term to goal we should be able to move out of AWS.

Mon, Dec 11, 1:29 PM · Performance-Team
chasemp added a comment to T177196: Port non-deprecated Diamond collectors to Prometheus.

I talked @fgiunchedi into enabling the client collector on tools-bastion-03 in https://gerrit.wikimedia.org/r/#/c/394571/ so we could see this work out on an instance with more data to expose. The tools worker in question is not busy at all and it's hard to get a great picture I'm seeing.

Mon, Dec 11, 1:06 PM · cloud-services-team (Kanban), Patch-For-Review, User-fgiunchedi, Goal, Operations

Tue, Dec 5

chasemp added a comment to T182070: tools-webgrid-lighttpd have ~ 90 procs stuck at 100% CPU time (mostly tools.jembot).

I stopped the webservice running under jembot for now as I'm unsure if this is an issue with this Tool or what but it had indeed leaked procs all over the webgrid nodes. I then purged processes running as the tools.jembot user.

Tue, Dec 5, 3:40 PM · Cloud-VPS, Toolforge

Fri, Dec 1

chasemp added a member for monitoring: chasemp.
Fri, Dec 1, 3:05 PM
chasemp added a comment to T177196: Port non-deprecated Diamond collectors to Prometheus.

I talked @fgiunchedi into enabling the client collector on tools-bastion-03 in https://gerrit.wikimedia.org/r/#/c/394571/ so we could see this work out on an instance with more data to expose. The tools worker in question is not busy at all and it's hard to get a great picture I'm seeing.

Fri, Dec 1, 2:22 PM · cloud-services-team (Kanban), Patch-For-Review, User-fgiunchedi, Goal, Operations

Thu, Nov 30

chasemp added a comment to P6408 labtest wth(eck).

root@labtestv1:~# puppet agent --test
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
debug2: channel 0: window 999399 sent adjust 49177
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: {"message":"Server Error: Evaluation Error: Error while evaluating a Function Call, hiera() can only be called using the 4.x function API. See Scope#call_function at /etc/puppet/modules/profile/manifests/base/certificates.pp:20:23 on node labtestv1.labtestproject.codfw.labtest","issue_kind":"RUNTIME_ERROR","stacktrace":["Warning: The 'stacktrace' property is deprecated and will be removed in a future version of Puppet. For security reasons, stacktraces are not returned with Puppet HTTP Error responses."]}
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
root@labtestv1:~#

Thu, Nov 30, 6:44 PM
chasemp created P6408 labtest wth(eck).
Thu, Nov 30, 6:28 PM
chasemp updated the task description for T181647: create 'attended' upgrade workflow for cloud with Toolforge as canonical case.
Thu, Nov 30, 6:15 PM · Patch-For-Review, Toolforge, cloud-services-team

Wed, Nov 29

chasemp added a comment to P6397 (An Untitled Masterwork).
root@puppet-compiler-other:~# CHANGE=392063 NODE=tools-puppetmaster-01.tools.eqiad.wmflabs BUILD_NUMBER=1002 puppet-compiler
[ 2017-11-29T22:23:20 ] INFO: Working on change 392063
[ 2017-11-29T22:23:20 ] INFO: No host list provided, generating the nodes list
[ 2017-11-29T22:23:20 ] INFO: Walking dir /var/lib/catalog-differ/puppet/yaml/facts
[ 2017-11-29T22:23:20 ] INFO: Refreshing the common repos from upstream if needed
[ 2017-11-29T22:23:22 ] INFO: Creating directories under /srv/jenkins-workspace/puppet-compiler
M	modules/cdh
Note: checking out 'FETCH_HEAD'.
Wed, Nov 29, 10:23 PM
chasemp created P6397 (An Untitled Masterwork).
Wed, Nov 29, 10:20 PM
chasemp added a comment to T181551: Puppet w/environments breaks the role/profile filters on Horizon.

Just a note to talk about this arena at our offsite next week :)

Wed, Nov 29, 9:18 PM · cloud-services-team (Kanban), Horizon
chasemp added a comment to T177880: Automatically run maintain-views and and maintain-meta_p when config changes on cloud replicas.

I don't think this is such a good idea for now because of issues where view management collides with table locking from actual users, and the per wiki grants needed and race conditions therein. If we were to automate this I think it would have to sanely be integrated into an automatic pool/depool of labsdb backends at the haproxy layer and have some backoff check for grants first that was an expected condition.

Wed, Nov 29, 8:58 PM · cloud-services-team (Kanban), Data-Services
chasemp updated the task description for T181647: create 'attended' upgrade workflow for cloud with Toolforge as canonical case.
Wed, Nov 29, 7:53 PM · Patch-For-Review, Toolforge, cloud-services-team
chasemp updated the task description for T181647: create 'attended' upgrade workflow for cloud with Toolforge as canonical case.
Wed, Nov 29, 5:12 PM · Patch-For-Review, Toolforge, cloud-services-team
chasemp updated the task description for T181647: create 'attended' upgrade workflow for cloud with Toolforge as canonical case.
Wed, Nov 29, 5:00 PM · Patch-For-Review, Toolforge, cloud-services-team
chasemp triaged T181647: create 'attended' upgrade workflow for cloud with Toolforge as canonical case as High priority.
Wed, Nov 29, 4:59 PM · Patch-For-Review, Toolforge, cloud-services-team
chasemp created T181647: create 'attended' upgrade workflow for cloud with Toolforge as canonical case.
Wed, Nov 29, 4:58 PM · Patch-For-Review, Toolforge, cloud-services-team

Tue, Nov 28

chasemp added a comment to T161554: Provide large disk space to WikiBrain for memory-mapped file.

Hi all, I am reopening this. Hooray :)

I have done the work in T174796 to prepare the dataset and am hoping to use these VMs but having some problems. I think the first one is when I look at the Wikibrain Openstack Dashboard I don't see myself listed as an Admin so I can't launch an instance. How should I go about doing this?

Thanks!

Tue, Nov 28, 7:47 PM · Cloud-VPS (Project-requests), artificial-intelligence
chasemp added a comment to T181369: Request creation of collaborate VPS project.

Can we sort out some specific use cases for this?

Tue, Nov 28, 4:41 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
chasemp created T181523: labtest puppetmaster is not working for clients.
Tue, Nov 28, 4:22 PM · Cloud-VPS, Epic
chasemp added a comment to T171494: Refactor OpenStack Puppet to account for Neutron.

This is mostly at a stopping point. There is more hiera cleanup left behind (for instance labstore* and ldap things) but that is not blocking the further work at the moment it feels more urgent to move forward. A big remaining piece here is to test failover for our three scenarios (labcontrol, labservices, labnet) and make sure that functions as we expect and to update docs https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Fail-over. I will sync up with @Andrew for this.

Tue, Nov 28, 4:16 PM · cloud-services-team (Kanban), Cloud-VPS, Patch-For-Review
chasemp added a comment to T168486: Begin migrating customer-facing Dumps endpoints to Cloud Services.

small note new ports appeared accessible via instances which are intended but were noticed :)

Tue, Nov 28, 4:08 PM · Datasets-General-or-Unknown, cloud-services-team (FY2017-18), Goal
chasemp added a comment to T179628: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser.

Thanks @jcrespo for explaining. Meta point I see here that we keep coming back to and that any variance from has created large hurdles is Replica databases are supposed to be read only. +1

Tue, Nov 28, 1:48 PM · DBA, Data-Services
chasemp added a comment to T180916: Puppet flapping on mounting /mnt/nfs/labstore-secondary-project failures ("Device busy or already mounted").

We have a similar or the same issue in Toolforge. I don't have an answer right now but historically we thought it was due to load on exec/bastions causing local load and NFS unavailability outright. I don't believe that explains things anymore. I have seen this happen on Toolforge instances with roles that don't see direct impact from user load. I don't think the mounting-already-mounted is the entire story. It's true that in some of the failures (for us) that surfaces but AFAICT it is a legit response from a potentially false negative catalyst when the mount appears unhealthy. I don't know if it's the checking code or the actual state of things. That is one aspect, we also see it where Puppet has its own verifications that it is working (it looks for specific content only on NFS) on an NFS mount and not local storage and those sometimes flake out inconsistently causing similar flapping.

Tue, Nov 28, 1:38 PM · Data-Services, Cloud-VPS

Mon, Nov 27

chasemp added a comment to T155678: Provide an easy to use support system for contributors to ask technical questions .

There are many orgs. that use Discourse as a Q&A system, and help new developers navigate documentation/code. Here are several that jumped out to me, and there are many open source projects using this platform. Here are some of the boards I found, if people would like to see how others orgs. are using this:

Mozilla: https://discourse.mozilla.org/
Twitter dev: https://twittercommunity.com/
Glowforge: https://community.glowforge.com/
Discourse: https://meta.discourse.org/
Atom: https://discuss.atom.io/

More: https://www.discourse.org/customers

Mon, Nov 27, 3:53 PM · User-Tgr, Wikimedia-General-or-Unknown, Developer-Relations (Oct-Dec 2017), TCB-Team

Sun, Nov 26

chasemp added a comment to T168142: Cleanup phabricator.wikimedia.org uploaded files, WP zero abuse.

I banned them but had not had a chance to purge the files. Thanks.

Sun, Nov 26, 6:51 PM · Patch-For-Review, Wikimedia-Incident, Wikimedia-Site-requests, Phabricator

Wed, Nov 22

chasemp closed T171473: labvirt1015 crashes as Resolved.

https://gerrit.wikimedia.org/r/#/c/392514/

Wed, Nov 22, 2:58 PM · cloud-services-team (Kanban), DC-Ops, Operations, ops-eqiad
chasemp created U17 cloudclinic.
Wed, Nov 22, 2:43 PM

Tue, Nov 21

chasemp reassigned T178661: Drop wb_entity_per_page views in Wiki Replicas from chasemp to Andrew.
Tue, Nov 21, 9:21 PM · cloud-services-team (Kanban), Data-Services, Wikidata-Former-Sprint-Board, Wikidata
chasemp assigned T180421: Request creation of PartnerMetrics VPS project to Andrew.
Tue, Nov 21, 9:19 PM · Cloud-VPS (Project-requests)

Mon, Nov 20

chasemp added a comment to T156174: Rewrite /usr/local/bin/crontab in python; fix bugs.

Note: crons backed up on tools-cron-01 prior to merge at tools-cron-01:/var/spool/cron/crontabs# ls /root/20112017/crontabs/. Old crontab script is backed up locally on tools-bastion-03 at tools-bastion-03:~# cp /usr/local/bin/crontab /home/rush

Mon, Nov 20, 9:01 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
chasemp closed T180992: test as Invalid.
Mon, Nov 20, 8:27 PM · Phabricator
chasemp added a comment to T180992: test.

'sworkin?

Mon, Nov 20, 8:27 PM · Phabricator
chasemp closed T180993: testing task creation via emailv2 as Invalid.
Mon, Nov 20, 8:26 PM
chasemp added a comment to T180993: testing task creation via emailv2.

ackv2

Mon, Nov 20, 8:25 PM
chasemp added a comment to T180993: testing task creation via emailv2.

synv2

Mon, Nov 20, 8:25 PM
chasemp added a comment to T180993: testing task creation via emailv2.

ack

Mon, Nov 20, 8:12 PM
chasemp added a comment to T180993: testing task creation via emailv2.

ping

Mon, Nov 20, 8:12 PM
chasemp created T180993: testing task creation via emailv2.
Mon, Nov 20, 8:09 PM
chasemp added a comment to T180992: test.

Email in may have a problem in a some cases atm. debugging.

Mon, Nov 20, 8:06 PM · Phabricator
chasemp added a comment to T180992: test.

Only testing email :)

Mon, Nov 20, 8:04 PM · Phabricator
chasemp created T180992: test.
Mon, Nov 20, 8:01 PM · Phabricator
chasemp created P6354 (An Untitled Masterwork).
Mon, Nov 20, 3:45 PM
chasemp reopened T178661: Drop wb_entity_per_page views in Wiki Replicas, a subtask of T177601: Deploy dropping wb_entity_per_page table, as Open.
Mon, Nov 20, 1:58 PM · Blocked-on-schema-change, MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), Patch-For-Review, DBA, Wikidata-Former-Sprint-Board, Wikidata, MediaWiki-extensions-WikibaseRepository
chasemp reopened T178661: Drop wb_entity_per_page views in Wiki Replicas as "Open".

Ah, dang. I did so before seeing the last comment.

Mon, Nov 20, 1:58 PM · cloud-services-team (Kanban), Data-Services, Wikidata-Former-Sprint-Board, Wikidata
chasemp closed T178661: Drop wb_entity_per_page views in Wiki Replicas as Resolved.

Thank you @Marostegui

Mon, Nov 20, 1:58 PM · cloud-services-team (Kanban), Data-Services, Wikidata-Former-Sprint-Board, Wikidata
chasemp closed T178661: Drop wb_entity_per_page views in Wiki Replicas, a subtask of T177601: Deploy dropping wb_entity_per_page table, as Resolved.
Mon, Nov 20, 1:58 PM · Blocked-on-schema-change, MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), Patch-For-Review, DBA, Wikidata-Former-Sprint-Board, Wikidata, MediaWiki-extensions-WikibaseRepository
chasemp closed T180564: Execution of maintain-view can create pileups due to metadata locking (was: Queries to wikidatawiki_p.wb_items_per_site on *.web.db.svc.eqiad.wmflabs are timing out) as Resolved.
Mon, Nov 20, 1:46 PM · Patch-For-Review, Data-Services
chasemp added a comment to T178661: Drop wb_entity_per_page views in Wiki Replicas.

New wikireplica's:

Mon, Nov 20, 1:46 PM · cloud-services-team (Kanban), Data-Services, Wikidata-Former-Sprint-Board, Wikidata

Fri, Nov 17

chasemp added a comment to P6347 heira demo.
./hiera_lookup --site=eqiad --fqdn=acamar.wikimedia.org --roles=dnsrecursor profile::bird::neighbors_list -v
DEBUG: 2017-11-17 14:20:52 -0600: Using Hiera 1.x backend API to access instance of class Hiera::Backend::Nuyaml_backend. Lookup recursion will not be detected
DEBUG: 2017-11-17 14:20:52 -0600: Looking up profile::bird::neighbors_list
DEBUG: 2017-11-17 14:20:52 -0600: Loading info from hosts/acamar for profile::bird::neighbors_list
DEBUG: 2017-11-17 14:20:52 -0600: The source is: hosts/acamar
DEBUG: 2017-11-17 14:20:52 -0600: Cannot find datafile /Users/cpettet/git/wmf/puppet/hieradata/hosts/acamar.yaml, skipping
DEBUG: 2017-11-17 14:20:52 -0600: Searching for profile::bird::neighbors_list in
DEBUG: 2017-11-17 14:20:52 -0600: Loading info from regex/acamar.wikimedia.org for profile::bird::neighbors_list
DEBUG: 2017-11-17 14:20:52 -0600: Regex match going on - using regex.yaml
DEBUG: 2017-11-17 14:20:52 -0600: Searching for profile::bird::neighbors_list in /Users/cpettet/git/wmf/puppet/hieradata/regex.yaml
DEBUG: 2017-11-17 14:20:52 -0600: Loading file /Users/cpettet/git/wmf/puppet/hieradata/regex.yaml
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_17_to_31 for matches to '(?-mix:^elastic10(1[789]|2[0-9]|30|31)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_32_to_52 for matches to '(?-mix:^elastic10(3[2-9]|4[0-9]|5[012])\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_codfw_rack_a5 for matches to '(?-mix:^elastic20(01|02|03|25)\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_codfw_rack_a8 for matches to '(?-mix:^elastic20(04|05|06|26|27)\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_codfw_rack_b5 for matches to '(?-mix:^elastic20(07|08|09|28)\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_codfw_rack_b8 for matches to '(?-mix:^elastic20(10|11|12|29|30)\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_codfw_rack_c1 for matches to '(?-mix:^elastic20(13|14|15|31)\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_codfw_rack_c5 for matches to '(?-mix:^elastic20(16|17|18|32|33)\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_codfw_rack_d1 for matches to '(?-mix:^elastic20(19|20|21|34)\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_codfw_rack_d5 for matches to '(?-mix:^elastic20(22|23|24|35|36)\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_a3 for matches to '(?-mix:^(elastic10(30|31|32|33|34|35)|relforge1001)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_a6 for matches to '(?-mix:^elastic10(44|45|48)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_b3 for matches to '(?-mix:^elastic10(36|37|38|39)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_b4 for matches to '(?-mix:^elastic10(49|50)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_b6 for matches to '(?-mix:^elastic10(28|46|47)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_c4 for matches to '(?-mix:^elastic10(21|22|29)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_c5 for matches to '(?-mix:^(elastic10(40|41|42|43)|relforge1002)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_c7 for matches to '(?-mix:^elastic10(51|52)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_d3 for matches to '(?-mix:^elastic10(17|18|19|20)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label es_eqiad_rack_d4 for matches to '(?-mix:^elastic10(23|24|25|26|27)\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label mysql_eqiad for matches to '(?-mix:^(db1[01][0-9][0-9]|dbstore100[1-2]|dbproxy10[01][0-9]|es101[1-9]|pc100[4-6]|labsdb10[01][0-9])\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label mysql_codfw for matches to '(?-mix:^(db2[01][0-9][0-9]|dbstore200[1-2]|es200[1-4]|pc200[4-6]|es201[1-9])\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label labvirt for matches to '(?-mix:^labvirt10[0-9][0-9]\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label labstore for matches to '(?-mix:^labstore.*\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label labs for matches to '(?-mix:^lab((net|nodepool|mon)100[1-9]\.eqiad\.wmnet|(services|control)100[1-9]\.wikimedia\.org)$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label labtestvirt for matches to '(?-mix:^labtestvirt20[0-9][0-9]\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label labtest for matches to '(?-mix:^labtest)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label swift_be_codfw_dell for matches to '(?-mix:^ms-be201[3-5]\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label swift_be_codfw_hp for matches to '(?-mix:^ms-be20(1[6-9]|2[0-9]|3[0-9])\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label swift_be_eqiad_dell for matches to '(?-mix:^ms-be101[3-5]\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label swift_be_eqiad_hp for matches to '(?-mix:^ms-be10(1[6-9]|2[0-9]|3[0-9])\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label lvs_eqiad_backup for matches to '(?-mix:^lvs100[4-6]\.wikimedia\.org$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label lvs_codfw_backup for matches to '(?-mix:^lvs200[4-6]\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label lvs_esams_backup for matches to '(?-mix:^lvs300[34]\.esams\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label lvs_ulsfo_backup for matches to '(?-mix:^lvs400[34567]\.ulsfo\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label lvs_temp_bgp_disable for matches to '(?-mix:^lvs10(0[7-9]|1[0-2])\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label lvs_temp_bgp_disable_lvs4 for matches to '(?-mix:^lvs400[567]\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label maps_test for matches to '(?-mix:^maps-test200[1-4]\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label db_like_raid_policy for matches to '(?-mix:^(db|dbstore|es|pc|labsdb)[12]\d\d\d\.(eqiad|codfw)\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label db_without_role for matches to '(?-mix:^(db|dbstore|es|pc|labsdb)[12]\d\d\d\.(eqiad|codfw)\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label analytics_no_monitor_screens for matches to '(?-mix:^stat1\d\d\d\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label analytics_notifications_disabled for matches to '(?-mix:^db1046\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label syslog_tls_eqiad for matches to '(?-mix:^(analytics|aqs|conf|cp|db|dbproxy|druid|elastic|es|etcd|ganeti|kafka)1\d\d\d\.eqiad\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label smart_health_mpt for matches to '(?-mix:^maps-test)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label smart_health_codfw for matches to '(?-mix:^.*\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label smart_health_esams for matches to '(?-mix:^.*\.esams\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label smart_health_ulsfo for matches to '(?-mix:^.*\.ulsfo\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Scanning label puppet4_mediawiki_canary_appserver_codfw for matches to '(?-mix:^mw20(17|99)\.codfw\.wmnet$)' in 'acamar.wikimedia.org'
DEBUG: 2017-11-17 14:20:52 -0600: Loading info from eqiad for profile::bird::neighbors_list
DEBUG: 2017-11-17 14:20:52 -0600: The source is: eqiad
DEBUG: 2017-11-17 14:20:52 -0600: Cannot find datafile /Users/cpettet/git/wmf/puppet/hieradata/eqiad/profile/bird.yaml, skipping
DEBUG: 2017-11-17 14:20:52 -0600: Searching for profile::bird::neighbors_list in
DEBUG: 2017-11-17 14:20:52 -0600: Loading info from private/hosts/acamar for profile::bird::neighbors_list
DEBUG: 2017-11-17 14:20:52 -0600: The source is: hosts/acamar
DEBUG: 2017-11-17 14:20:52 -0600: Cannot find datafile /Users/cpettet/git/wmf/puppet/hieradata/hosts/acamar.yaml, skipping
DEBUG: 2017-11-17 14:20:52 -0600: Searching for profile::bird::neighbors_list in
DEBUG: 2017-11-17 14:20:52 -0600: Loading info from private/eqiad for profile::bird::neighbors_list
DEBUG: 2017-11-17 14:20:52 -0600: The source is: eqiad
DEBUG: 2017-11-17 14:20:52 -0600: Cannot find datafile /Users/cpettet/git/wmf/puppet/hieradata/eqiad/profile/bird.yaml, skipping
DEBUG: 2017-11-17 14:20:52 -0600: Searching for profile::bird::neighbors_list in
DEBUG: 2017-11-17 14:20:52 -0600: Loading info from common for profile::bird::neighbors_list
DEBUG: 2017-11-17 14:20:52 -0600: The source is: common
DEBUG: 2017-11-17 14:20:52 -0600: Cannot find datafile /Users/cpettet/git/wmf/puppet/hieradata/common/profile/bird.yaml, skipping
DEBUG: 2017-11-17 14:20:52 -0600: Searching for profile::bird::neighbors_list in
DEBUG: 2017-11-17 14:20:52 -0600: Loading info from private/common for profile::bird::neighbors_list
DEBUG: 2017-11-17 14:20:52 -0600: The source is: common
DEBUG: 2017-11-17 14:20:52 -0600: Cannot find datafile /Users/cpettet/git/wmf/puppet/hieradata/common/profile/bird.yaml, skipping
DEBUG: 2017-11-17 14:20:52 -0600: Searching for profile::bird::neighbors_list in
DEBUG: 2017-11-17 14:20:52 -0600: Using Hiera 1.x backend API to access instance of class Hiera::Backend::Role_backend. Lookup recursion will not be detected
DEBUG: 2017-11-17 14:20:52 -0600: Looking in hierarchy for role dnsrecursor
DEBUG: 2017-11-17 14:20:52 -0600: Cannot find datafile /Users/cpettet/git/wmf/puppet/hieradata/role/eqiad/dnsrecursor.yaml, skipping
DEBUG: 2017-11-17 14:20:52 -0600: Searching in file /Users/cpettet/git/wmf/puppet/hieradata/role/common/dnsrecursor.yaml for profile::bird::neighbors_list
DEBUG: 2017-11-17 14:20:52 -0600: Cannot find datafile /Users/cpettet/git/wmf/puppet/hieradata/role/eqiad/dnsrecursor.yaml, skipping
DEBUG: 2017-11-17 14:20:52 -0600: Searching in file /Users/cpettet/git/wmf/puppet/hieradata/role/common/dnsrecursor.yaml for profile::bird::neighbors_list
Fri, Nov 17, 8:21 PM