Page MenuHomePhabricator

Volans (Riccardo Coccioli)
Operations Software Engineer

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Feb 10 2016, 11:25 AM (175 w, 5 d)
Availability
Available
IRC Nick
volans
LDAP User
Volans
MediaWiki User
RCoccioli (WMF) [ Global Accounts ]

Recent Activity

Today

Volans moved T200706: rack/setup/install centrallog1001.eqiad.wmnet from Up next to In Dev/Progress on the Wikimedia-Logstash board.
Mon, Jun 24, 3:19 PM · User-herron, Wikimedia-Logstash, User-fgiunchedi, Operations
Volans assigned T214183: Setup graphs for power usage readings in Grafana to fgiunchedi.
Mon, Jun 24, 3:05 PM · DC-Ops, observability
Volans moved T218544: ms-be1043 sdk failed from In progress to Radar on the observability board.
Mon, Jun 24, 3:04 PM · observability, Operations-Software-Development, Operations, ops-eqiad
Volans added a comment to T226394: Telia IC-307235 reported down from the eqiad side.
volans@re0.cr1-codfw> show interfaces diagnostics optics xe-5/2/1
Physical interface: xe-5/2/1
    Laser bias current                        :  40.898 mA
    Laser output power                        :  0.7040 mW / -1.52 dBm
    Module temperature                        :  31 degrees C / 88 degrees F
    Module voltage                            :  3.2810 V
    Receiver signal average optical power     :  0.3757 mW / -4.25 dBm
    Laser bias current high alarm             :  Off
    Laser bias current low alarm              :  Off
    Laser bias current high warning           :  Off
    Laser bias current low warning            :  Off
    Laser output power high alarm             :  Off
    Laser output power low alarm              :  Off
    Laser output power high warning           :  Off
    Laser output power low warning            :  Off
    Module temperature high alarm             :  Off
    Module temperature low alarm              :  Off
    Module temperature high warning           :  Off
    Module temperature low warning            :  Off
    Module voltage high alarm                 :  Off
    Module voltage low alarm                  :  Off
    Module voltage high warning               :  Off
    Module voltage low warning                :  Off
    Laser rx power high alarm                 :  Off
    Laser rx power low alarm                  :  Off
    Laser rx power high warning               :  Off
    Laser rx power low warning                :  Off
    Laser bias current high alarm threshold   :  85.000 mA
    Laser bias current low alarm threshold    :  15.000 mA
    Laser bias current high warning threshold :  80.000 mA
    Laser bias current low warning threshold  :  20.000 mA
    Laser output power high alarm threshold   :  1.5840 mW / 2.00 dBm
    Laser output power low alarm threshold    :  0.1580 mW / -8.01 dBm
    Laser output power high warning threshold :  1.2580 mW / 1.00 dBm
    Laser output power low warning threshold  :  0.1990 mW / -7.01 dBm
    Module temperature high alarm threshold   :  78 degrees C / 172 degrees F
    Module temperature low alarm threshold    :  -13 degrees C / 9 degrees F
    Module temperature high warning threshold :  73 degrees C / 163 degrees F
    Module temperature low warning threshold  :  -8 degrees C / 18 degrees F
    Module voltage high alarm threshold       :  3.700 V
    Module voltage low alarm threshold        :  2.900 V
    Module voltage high warning threshold     :  3.600 V
    Module voltage low warning threshold      :  3.000 V
    Laser rx power high alarm threshold       :  1.7783 mW / 2.50 dBm
    Laser rx power low alarm threshold        :  0.0100 mW / -20.00 dBm
    Laser rx power high warning threshold     :  1.5849 mW / 2.00 dBm
    Laser rx power low warning threshold      :  0.0158 mW / -18.01 dBm
Mon, Jun 24, 1:32 PM · Operations, netops
Volans added a comment to T226394: Telia IC-307235 reported down from the eqiad side.
volans@re0.cr1-eqiad> show interfaces diagnostics optics xe-4/2/0
Physical interface: xe-4/2/0
    Laser bias current                        :  39.156 mA
    Laser output power                        :  0.7330 mW / -1.35 dBm
    Module temperature                        :  40 degrees C / 104 degrees F
    Module voltage                            :  3.3000 V
    Receiver signal average optical power     :  0.0002 mW / -36.99 dBm
    Laser bias current high alarm             :  Off
    Laser bias current low alarm              :  Off
    Laser bias current high warning           :  Off
    Laser bias current low warning            :  Off
    Laser output power high alarm             :  Off
    Laser output power low alarm              :  Off
    Laser output power high warning           :  Off
    Laser output power low warning            :  Off
    Module temperature high alarm             :  Off
    Module temperature low alarm              :  Off
    Module temperature high warning           :  Off
    Module temperature low warning            :  Off
    Module voltage high alarm                 :  Off
    Module voltage low alarm                  :  Off
    Module voltage high warning               :  Off
    Module voltage low warning                :  Off
    Laser rx power high alarm                 :  Off
    Laser rx power low alarm                  :  On
    Laser rx power high warning               :  Off
    Laser rx power low warning                :  On
    Laser bias current high alarm threshold   :  85.000 mA
    Laser bias current low alarm threshold    :  15.000 mA
    Laser bias current high warning threshold :  80.000 mA
    Laser bias current low warning threshold  :  20.000 mA
    Laser output power high alarm threshold   :  1.5840 mW / 2.00 dBm
    Laser output power low alarm threshold    :  0.1580 mW / -8.01 dBm
    Laser output power high warning threshold :  1.2580 mW / 1.00 dBm
    Laser output power low warning threshold  :  0.1990 mW / -7.01 dBm
    Module temperature high alarm threshold   :  78 degrees C / 172 degrees F
    Module temperature low alarm threshold    :  -13 degrees C / 9 degrees F
    Module temperature high warning threshold :  73 degrees C / 163 degrees F
    Module temperature low warning threshold  :  -8 degrees C / 18 degrees F
    Module voltage high alarm threshold       :  3.700 V
    Module voltage low alarm threshold        :  2.900 V
    Module voltage high warning threshold     :  3.600 V
    Module voltage low warning threshold      :  3.000 V
    Laser rx power high alarm threshold       :  1.7783 mW / 2.50 dBm
    Laser rx power low alarm threshold        :  0.0100 mW / -20.00 dBm
    Laser rx power high warning threshold     :  1.5849 mW / 2.00 dBm
    Laser rx power low warning threshold      :  0.0158 mW / -18.01 dBm
Mon, Jun 24, 1:31 PM · Operations, netops
Volans added a comment to T226331: Upgrade Netbox to 2.6.0.

FYI this has a new required dependency of Redis, so we should check with serviceops if we could use an existing Redis installation.
It also has some changes in the API and we should check if existing scripts needs to be updated.

Mon, Jun 24, 10:34 AM · netbox

Mon, Jun 17

ema awarded T221212: spicerack/cookbook: add additional arguments IRC/SAL logging a Love token.
Mon, Jun 17, 2:33 PM · Patch-For-Review, Operations-Software-Development, Operations

Thu, Jun 13

Volans updated subscribers of T210723: Address recurrent service check time out for "HP RAID" on swift backend hosts.
Thu, Jun 13, 11:06 AM · Patch-For-Review, User-fgiunchedi, Operations, observability

Wed, Jun 5

Volans added a comment to T225140: Icinga alerts that should open tasks instead of alerting.

FYI The current raid_handler.py could be adapted or (ideally) its generic parts extracted to be able to easily add other handlers for different types of checks. Both the state (WARNING, CRITICAL, etc..) and the state type (HARD, SOFT) can be passed to the handler that can decide what to do based on those.

Wed, Jun 5, 11:02 PM · observability

Tue, Jun 4

herron awarded T221212: spicerack/cookbook: add additional arguments IRC/SAL logging a Like token.
Tue, Jun 4, 6:21 PM · Patch-For-Review, Operations-Software-Development, Operations
Volans placed T223835: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs up for grabs.

I'm leaving it back to the current clinic duty (@fsero) at this point given that it needs to be re-worked, it's not just a follow-up.

Tue, Jun 4, 10:13 AM · Patch-For-Review, DNS, Traffic, Wikimedia-Apache-configuration, Operations, Matrix

Mon, Jun 3

Volans added a hashtag to Operations-Software-Development: #debmonitor.
Mon, Jun 3, 6:08 PM
Volans closed T223835: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs as Resolved.

Change is live:

L| 0 ~$ dig @ns0.wikimedia.org SRV _matrix._tcp.wikimedia.org
Mon, Jun 3, 11:22 AM · Patch-For-Review, DNS, Traffic, Wikimedia-Apache-configuration, Operations, Matrix
Volans closed T223835: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs, a subtask of T215042: Set up a hosted Matrix.org / Riot instance on modular.im, as Resolved.
Mon, Jun 3, 11:22 AM · Matrix, User-Tgr
Volans added a comment to T223835: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs.

Both records are actually up now:

$ dig +trace wikimedia.modular.im
[...SNIP...]
wikimedia.modular.im.	300	IN	A	52.56.197.133
;; Received 65 bytes from 173.245.58.183#53(laura.ns.cloudflare.com) in 9 ms
Mon, Jun 3, 11:07 AM · Patch-For-Review, DNS, Traffic, Wikimedia-Apache-configuration, Operations, Matrix

Fri, May 31

Volans added a comment to T223938: facter 3: add timeout to custom facts external calls.

I'm for option 2 as well, 3 breaks the contract that all facts are available at first puppet run and 1 doesn't really solve much in the long run, just the cron spam.

Fri, May 31, 12:32 PM · Patch-For-Review, Puppet, Operations
Volans triaged T224727: Cron spam from phab1001 delete of temporary files as High priority.
Fri, May 31, 12:25 PM · Operations
Volans created T224727: Cron spam from phab1001 delete of temporary files.
Fri, May 31, 12:25 PM · Operations

Thu, May 30

Volans updated subscribers of T224691: labmon / prometheus - query error - monitoring artifacts - Icinga UNKNOWN.
Thu, May 30, 10:30 PM · observability, Cloud-Services
Volans lowered the priority of T224661: cron-spam to root@: lsof stderr generates large emails on boron from wmf-auto-restart from High to Normal.

Bandaid applied, no output from running wmf-auto-restart on boron. It should be good for now. Leaving open for a better long-term solution.

Thu, May 30, 11:18 AM · Operations
Volans updated subscribers of T224661: cron-spam to root@: lsof stderr generates large emails on boron from wmf-auto-restart.
Thu, May 30, 11:10 AM · Operations
Volans triaged T224661: cron-spam to root@: lsof stderr generates large emails on boron from wmf-auto-restart as High priority.
Thu, May 30, 11:09 AM · Operations
Volans created T224661: cron-spam to root@: lsof stderr generates large emails on boron from wmf-auto-restart.
Thu, May 30, 11:09 AM · Operations
Volans added a comment to P5608 Update production known hosts.

Paste diff is not that smart, I just added a check for the main DYNA record to silently skip its CNAMEs without spamming stderr.

Thu, May 30, 8:51 AM
Volans edited P5608 Update production known hosts.
Thu, May 30, 8:50 AM

Wed, May 29

Volans added a project to T224559: Migrate Failoid hosts to Stretch/Buster: Traffic.
Wed, May 29, 11:32 AM · Traffic, serviceops, Operations
Volans added a comment to T224559: Migrate Failoid hosts to Stretch/Buster.

+1 on the naming and +1 on buster, they just have firewall rules, so should be pretty straightforward and easy to do.

Wed, May 29, 11:29 AM · Traffic, serviceops, Operations

Tue, May 28

Volans assigned T224517: netbox / netmon1002: netbox report related service units failed to crusnov.
Tue, May 28, 10:02 PM · observability, netbox, Operations
Volans closed T223496: Requesting access to machines [stat1004, stat1005 (now stat1007), and stat1006] and groups for iflorez as Resolved.

I've asked @elukey to sync the account to HUE as I don't have access myself.

Tue, May 28, 4:40 PM · Operations, SRE-Access-Requests
Volans updated subscribers of T223496: Requesting access to machines [stat1004, stat1005 (now stat1007), and stat1006] and groups for iflorez.

Added @Iflorez to the wmf LDAP group as agreed with @MoritzMuehlenhoff

Tue, May 28, 4:37 PM · Operations, SRE-Access-Requests
Volans closed T192830: Requesting access to production for SWAT deploy for Urbanecm as Resolved.

Glad to hear, resolving then.

Tue, May 28, 11:50 AM · User-zeljkofilipin, Release-Engineering-Team (Kanban), User-greg, User-Urbanecm, Operations, SRE-Access-Requests
Volans triaged T224456: Degraded RAID on db2035 as Normal priority.
Tue, May 28, 9:11 AM · DBA, Operations, ops-codfw
Volans closed T224406: Incorrect icinga settings for mobrovac as Resolved.

And with the above patch merged it should all be resolved. Reopen if needed.

Tue, May 28, 8:56 AM · Core Platform Team Backlog (Watching / External), Services (watching), Operations, Icinga
Volans updated subscribers of T224448: Gerrit http threads stuck behind sendemail thread.

As suggested by @dcausse let's try to capture a jstack next time it happens:

sudo -u gerrit2 jstack $(pidof java)
Tue, May 28, 8:39 AM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops-radar, Gerrit
Volans added a comment to T224448: Gerrit http threads stuck behind sendemail thread.

Restarted gerrit because was stuck and showed the same behaviour of the above graph:
https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?panelId=16&fullscreen&orgId=1&from=1559022591237&to=1559031673504

Tue, May 28, 8:33 AM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops-radar, Gerrit
Volans created P8564 Gerrit jstack.
Tue, May 28, 8:27 AM

Mon, May 27

Volans added a comment to T221068: decom ms-be201[345].

I've put the state of those hosts in Netbox back to active as they are currently "active" for the spare::system role and decomissioning should be set once we run the decom script (and it will be done automatically by the script very soon) and the host is removed from puppet completely.
I've also updated the documentation to reduce confusion:
https://wikitech.wikimedia.org/w/index.php?title=Server_Lifecycle&type=revision&diff=1827408&oldid=1827206

Mon, May 27, 10:50 PM · decommission, ops-codfw, media-storage, User-fgiunchedi, Operations
Volans added a comment to T220590: Decom ms-be101[345].

I've put the state of those hosts in Netbox back to active as they are currently "active" for the spare::system role and decomissioning should be set once we run the decom script (and it will be done automatically by the script very soon) and the host is removed from puppet completely.
I've also updated the documentation to reduce confusion:
https://wikitech.wikimedia.org/w/index.php?title=Server_Lifecycle&type=revision&diff=1827408&oldid=1827206

Mon, May 27, 10:50 PM · decommission, User-fgiunchedi, media-storage, Operations
Volans edited projects for T224442: I can't load tools.wmflabs.org, added: cloud-services-team (Kanban); removed Traffic, Operations, DNS.
Mon, May 27, 7:57 PM · Toolforge, cloud-services-team (Kanban)
Volans added a comment to T224406: Incorrect icinga settings for mobrovac.

So after a bit of debugging with @mobrovac it seems that the alarm that is not notifying the team-services contact is the restbase endpoints health one that seems to be generated by the service::node Puppet define.
That define has support for adding custom contact groups, and I don't see any defined for the team-services in hieradata/.
According to @mobrovac this was working before so I'm wondering if anything changed recently elsewhere given that this part of the code hasn't AFAICT.

Mon, May 27, 3:01 PM · Core Platform Team Backlog (Watching / External), Services (watching), Operations, Icinga
Volans added a comment to T224406: Incorrect icinga settings for mobrovac.

@mobrovac can you retry actions on the Icinga UI?

Mon, May 27, 2:05 PM · Core Platform Team Backlog (Watching / External), Services (watching), Operations, Icinga
Volans added a comment to T224406: Incorrect icinga settings for mobrovac.

That still doesn't make sense. On that alias I do receive emails, so the alias part is ok. What is weird is that not all email notifs that used to come come any more. E.g. I get notifications about Cassandra being down on a node, but not about RESTBase. In the same vein, it seems I'm not receiving emails for SCB services either...

Mon, May 27, 12:02 PM · Core Platform Team Backlog (Watching / External), Services (watching), Operations, Icinga
Volans added a comment to T224406: Incorrect icinga settings for mobrovac.

@mobrovac

  • Regarding the notification AFAICT the RESTBase alerts notify the team-services group (services@). I don't see that alias defined in our exim configuration, so probably is managed by OIT. Could you check with them if that is still valid and includes the right people?
  • Regarding the UI actions the user mobrovac is properly authorized in Icinga configuration, could you check that you're logged in with the correct case of the username?
Mon, May 27, 11:51 AM · Core Platform Team Backlog (Watching / External), Services (watching), Operations, Icinga
Volans claimed T224406: Incorrect icinga settings for mobrovac.
Mon, May 27, 11:41 AM · Core Platform Team Backlog (Watching / External), Services (watching), Operations, Icinga
Volans added a comment to T192830: Requesting access to production for SWAT deploy for Urbanecm.

I've added urbanecm to the LDAP group nda as per request above given that it's needed to check logstash during deployments.

Mon, May 27, 11:25 AM · User-zeljkofilipin, Release-Engineering-Team (Kanban), User-greg, User-Urbanecm, Operations, SRE-Access-Requests
Volans added a comment to T192830: Requesting access to production for SWAT deploy for Urbanecm.

As per docs added Urbanecm to the wmf-deployment group in Gerrit.

Mon, May 27, 10:28 AM · User-zeljkofilipin, Release-Engineering-Team (Kanban), User-greg, User-Urbanecm, Operations, SRE-Access-Requests
Volans moved T224313: Requesting access to icinga for tonycepo from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Mon, May 27, 10:05 AM · observability, SRE-Access-Requests, Operations
Volans added a comment to T224393: db2091 rebooted unexpectedly.

Forgot to mention, nothing in syslog or journalctl for MySQL on s2/s4 units.

Mon, May 27, 9:33 AM · Patch-For-Review, Operations, DBA
Volans added a comment to T224393: db2091 rebooted unexpectedly.

Related documentation for the most useful messages:

Mon, May 27, 9:30 AM · Patch-For-Review, Operations, DBA
Volans triaged T224343: Add more bad words to fancycaptcha/badwords as Normal priority.
Mon, May 27, 9:18 AM · Operations, Wikimedia-Site-requests
Volans triaged T221770: Upgrade cloucontrol1003/1004 to stretch/mitaka as High priority.

cloudcontrol1003 is flapping its systemd degraded alert since 2019-05-25 21:46. The unit that fails is:

● designate_floating_ip_ptr_records_updater.service               loaded failed failed    Designate Floating IP PTR records updater
Mon, May 27, 8:24 AM · Patch-For-Review, Cloud-VPS, cloud-services-team (Kanban)
Volans added a comment to T220590: Decom ms-be101[345].

@fgiunchedi FYI we got some email to root@ from ms-be1014 with the following:

Cron <root@ms-be1014> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
Mon, May 27, 8:19 AM · decommission, User-fgiunchedi, media-storage, Operations
Volans renamed T224393: db2091 rebooted unexpectedly from db2091 mysql service stopped running to db2091 rebooted unexpectedly.
Mon, May 27, 8:11 AM · Patch-For-Review, Operations, DBA
Volans triaged T224399: exim paniclog on $HOST has non-zero size as Normal priority.
Mon, May 27, 8:07 AM · Operations
Volans created T224399: exim paniclog on $HOST has non-zero size.
Mon, May 27, 8:06 AM · Operations

May 24 2019

Volans added a comment to T223496: Requesting access to machines [stat1004, stat1005 (now stat1007), and stat1006] and groups for iflorez.

@Nuria all the pre-requisites are there, is this approved?

May 24 2019, 5:50 PM · Operations, SRE-Access-Requests
Volans closed T222910: Requesting access to deployment and analytics-privatedata-users for jfishback as Resolved.

All changes merged, confirmed that James can connect to few hosts.
As per docs added Jfishback to the wmf-deployment group in Gerrit.

May 24 2019, 4:55 PM · User-greg, SRE-Access-Requests, Operations, Security-Team
Volans closed T222910: Requesting access to deployment and analytics-privatedata-users for jfishback, a subtask of T220517: Onboarding James Fishback to Security Team as Privacy Engineer (April 15th), as Resolved.
May 24 2019, 4:55 PM · Security-Team
Volans added a comment to T224254: User alias redirecting to another user alias.

@HMarcus change applied:

-legalquestions:	legal, liaison
+legalquestions:	legal
May 24 2019, 4:46 PM · Mail, Operations
Volans added a comment to T192830: Requesting access to production for SWAT deploy for Urbanecm.

@Urbanecm is it ok to use the email you used to sign the NDA for the related patch in Puppet?
Keep in mind that that file is public.

May 24 2019, 10:45 AM · User-zeljkofilipin, Release-Engineering-Team (Kanban), User-greg, User-Urbanecm, Operations, SRE-Access-Requests
Volans added a comment to T220860: access for foks to labweb (in one way or another) (or make changePassword.php work on mwmaint hosts).

@jrbs Any update on this?

May 24 2019, 10:17 AM · Patch-For-Review, Operations, SRE-Access-Requests
Volans moved T223698: Request access to deployment cluster for Alaa Sarhan from Awaiting User Input to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.
May 24 2019, 10:15 AM · Release-Engineering-Team-TODO, Operations, Release-Engineering-Team, SRE-Access-Requests
Volans updated subscribers of T224254: User alias redirecting to another user alias.
May 24 2019, 9:43 AM · Mail, Operations
Volans merged task T224261: Broken disk on analytics1039 into T220880: Degraded RAID on analytics1039.
May 24 2019, 9:39 AM · Operations, ops-eqiad
Volans merged T224261: Broken disk on analytics1039 into T220880: Degraded RAID on analytics1039.
May 24 2019, 9:39 AM · ops-eqiad, Operations
Volans reopened T220880: Degraded RAID on analytics1039 as "Open".

Re-opening as the disk ended up in a failed state with 2 failed disks!
The automatic task was not opened because it was already in alarm in Icinga, so it didn't re-trigger. Here the status of megacli:

May 24 2019, 9:38 AM · ops-eqiad, Operations

May 23 2019

Volans triaged T224254: User alias redirecting to another user alias as Normal priority.

@HMarcus yes we have the current rule in the exim configuration:

legalquestions: legal, liaison
May 23 2019, 11:56 PM · Mail, Operations
Volans added a comment to T223496: Requesting access to machines [stat1004, stat1005 (now stat1007), and stat1006] and groups for iflorez.

@Dzahn I guess @georgina would be more appropriate, the expiration contact should be related to the contractors point of contact, not the group's owner.

May 23 2019, 11:46 PM · Operations, SRE-Access-Requests
Volans triaged T224236: include the 'Server:' response header in varnishkafka as Normal priority.
May 23 2019, 9:58 PM · Analytics-Kanban, User-Elukey, Traffic, Analytics, Operations
Volans removed a project from T224222: Some citoid requests aren't timing out and are pending indefinitely: Operations.

Given the latest updates removing Operations.

May 23 2019, 3:34 PM · Core Platform Team Backlog (Watching / External), RESTBase-API, Services (watching), Citoid
Volans removed a project from T224200: Cirrus query clicks cron job for dropping partitions older than 90 days have started failing: Operations.

Checked with @Ottomata, no need for Operations here, removing the tag.

May 23 2019, 3:15 PM · Discovery-Search (Current work), Analytics, Discovery, CirrusSearch, Analytics-Cluster
Volans added a comment to T222922: wmf7622 wont powercycle (cannot be allocated from spares).

@RobH @faidon @crusnov: I've made the changes to the Lifecycle page, please have a look:
https://wikitech.wikimedia.org/w/index.php?title=Server_Lifecycle&type=revision&diff=1827206&oldid=1826423

May 23 2019, 3:10 PM · Operations, ops-eqiad
Volans added a comment to T224033: Fix operations/puppet.git "rebase hell".

@hashar another question for you. If I have 2 CRs, chained one on top of another and I +2 both of them because I want to deploy them together, and the first one fails but the second one passes, would the second one be rebased on top of production branch and merged despite the fact that its parent was not merged?
If this is the case this would be a blocker IMHO.

May 23 2019, 10:59 AM · Continuous-Integration-Config, Operations
Volans triaged T224205: don't page all of SRE for phabricator 'phd' service not running as Normal priority.
May 23 2019, 10:38 AM · Patch-For-Review, Phabricator, observability
Volans added a comment to T224017: Slow query ApiQueryRevisions on enwiki .

I got slightly nerd-sniped into this and had a look. First I found that the optimizer choose a full table scan even without joins, as soon as we select a field from the revision table that is not included in the index.

May 23 2019, 10:30 AM · User-Marostegui, MW-1.34-notes (1.34.0-wmf.7; 2019-05-28), Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Kanban (Waiting for Review), DBA, MediaWiki-Database
Volans created P8557 enwiki revision table full table scan explain rows.
May 23 2019, 10:20 AM

May 22 2019

Volans added a comment to T222910: Requesting access to deployment and analytics-privatedata-users for jfishback.

I've verified with @JFishback_WMF that basic access works as expected.

May 22 2019, 5:36 PM · User-greg, SRE-Access-Requests, Operations, Security-Team
Volans added a comment to T222788: Request to be added to the ldap/wmde group.

@darthmon_wmde could you verify all works as expected? Feel free to resolve this task if there isn't any problem.

May 22 2019, 2:44 PM · Patch-For-Review, WMF-Legal, LDAP-Access-Requests, Operations, WMF-NDA-Requests
Volans assigned T222910: Requesting access to deployment and analytics-privatedata-users for jfishback to greg.

In the meanwhile I've sent patches for the conversion to shell access and the analytics-privatedata-users inclusion.
Assigning to @greg for approval for the deployment group.

May 22 2019, 2:43 PM · User-greg, SRE-Access-Requests, Operations, Security-Team
Volans added a comment to T192830: Requesting access to production for SWAT deploy for Urbanecm.

Pending approval from sponsor (@zeljkofilipin ) and deployment group owner (@greg )

May 22 2019, 2:24 PM · User-zeljkofilipin, Release-Engineering-Team (Kanban), User-greg, User-Urbanecm, Operations, SRE-Access-Requests
People defrocked Volans.
May 22 2019, 10:44 AM
Volans defrocked Bawolff.
May 22 2019, 10:29 AM
People empowered Volans as an administrator.
May 22 2019, 10:25 AM
Volans triaged T223835: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs as Normal priority.
May 22 2019, 9:27 AM · Patch-For-Review, DNS, Traffic, Wikimedia-Apache-configuration, Operations, Matrix
Volans triaged T224097: Make spicerack / cumin cluster aware as Normal priority.
May 22 2019, 9:23 AM · Operations-Software-Development
Volans triaged T224065: cloudvirt1028 - no PS redundancy as Normal priority.
May 22 2019, 9:03 AM · cloud-services-team (Kanban), Operations, ops-eqiad

May 21 2019

Volans added a comment to T222922: wmf7622 wont powercycle (cannot be allocated from spares).

I think it's fair to add transitions from pretty much any state to the failed state.

May 21 2019, 5:22 PM · Operations, ops-eqiad
Volans moved T223737: Increase Memory Limit for Scribunto from Backlog to Radar on the Operations board.
May 21 2019, 5:08 PM · serviceops, Performance-Team (Radar), Performance, Operations, Wikimedia-Site-requests
Volans triaged T224033: Fix operations/puppet.git "rebase hell" as Normal priority.
May 21 2019, 4:54 PM · Continuous-Integration-Config, Operations
Volans updated subscribers of T224033: Fix operations/puppet.git "rebase hell".

I'm assuming that in cases in which the rebase fails because of conflicts or the CI fails after the rebase Jenkins would vote -1 and the patch would be out of the merging queue. Correct me if I'm wrong.

May 21 2019, 4:54 PM · Continuous-Integration-Config, Operations
Volans added a comment to T222074: Icinga meta-monitoring: automatically sync contact list.
May 21 2019, 2:34 PM · observability, Operations
Volans added a comment to T222074: Icinga meta-monitoring: automatically sync contact list.
May 21 2019, 2:26 PM · observability, Operations
Volans added a comment to T222074: Icinga meta-monitoring: automatically sync contact list.
May 21 2019, 2:25 PM · observability, Operations
Volans closed T220297: Icinga process too many open files as Resolved.

Limit increased to 4096 and Icinga manually restarted on both 1001 and 2001.

May 21 2019, 12:25 PM · Patch-For-Review, observability, Operations
Volans claimed T223861: New WikiJournal_CoC@lists.wikimedia.org.

@Thomas_Shafee I've created the mailing list as requested. All other options were left to their default.

May 21 2019, 9:35 AM · Operations, Wikimedia-Mailing-lists
Volans triaged T223949: lvs2002 possible broken BBU as Normal priority.
May 21 2019, 9:16 AM · ops-codfw, Operations

May 20 2019

Volans triaged T223938: facter 3: add timeout to custom facts external calls as Normal priority.
May 20 2019, 9:01 PM · Patch-For-Review, Puppet, Operations
Volans created T223938: facter 3: add timeout to custom facts external calls.
May 20 2019, 9:01 PM · Patch-For-Review, Puppet, Operations
Volans triaged T223937: run-no-puppet: rewrite using puppet-common.sh as Normal priority.
May 20 2019, 8:49 PM · Operations
Volans triaged T223924: pybal logs into logstash as Normal priority.
May 20 2019, 8:39 PM · Operations, Wikimedia-Logstash