Volans (Riccardo Coccioli)
Operations Software Engineer

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Feb 10 2016, 11:25 AM (66 w, 5 d)
Availability
Available
IRC Nick
volans
LDAP User
Volans
MediaWiki User
RCoccioli (WMF)

Recent Activity

Sat, May 20

Volans moved T165842: Cumin: add a simple txt/json output from In Progress to In Code Review on the Operations-Software-Development board.
Sat, May 20, 10:37 AM · Patch-For-Review, Operations-Software-Development
Volans moved T165838: Cumin: add a simple interactive mode from In Progress to In Code Review on the Operations-Software-Development board.
Sat, May 20, 10:37 AM · Patch-For-Review, Operations-Software-Development
Volans moved T165842: Cumin: add a simple txt/json output from Backlog to In Progress on the Operations-Software-Development board.
Sat, May 20, 8:57 AM · Patch-For-Review, Operations-Software-Development
Volans created T165842: Cumin: add a simple txt/json output.
Sat, May 20, 8:56 AM · Patch-For-Review, Operations-Software-Development
Volans moved T165838: Cumin: add a simple interactive mode from Backlog to In Progress on the Operations-Software-Development board.
Sat, May 20, 8:44 AM · Patch-For-Review, Operations-Software-Development
Volans created T165838: Cumin: add a simple interactive mode.
Sat, May 20, 8:44 AM · Patch-For-Review, Operations-Software-Development

Wed, May 17

Volans moved T165583: Puppet compiler: sync facts from all workers from In Progress to In Code Review on the Operations-Software-Development board.
Wed, May 17, 11:17 AM · Patch-For-Review, Operations, Operations-Software-Development
Volans moved T165583: Puppet compiler: sync facts from all workers from Backlog to In Progress on the Operations-Software-Development board.
Wed, May 17, 10:46 AM · Patch-For-Review, Operations, Operations-Software-Development
Volans created T165583: Puppet compiler: sync facts from all workers.
Wed, May 17, 10:46 AM · Patch-For-Review, Operations, Operations-Software-Development
Volans updated subscribers of T163998: check_hpssacli should report on battery failures and cache disabled.

@faidon let me know if you want the Icinga RAID handler to open tasks also for warnings, these includes the above and the predictive drive failures for HP controllers.

Wed, May 17, 9:44 AM · Patch-For-Review, Operations, Monitoring

Tue, May 16

Volans added a comment to T156924: Allow integration of data from etcd into the MediaWiki configuration.

@aaron @tstarling @Joe: here is a minimal list of failure scenarios that I think we should test before getting this into production:

  • Etcd not listening (iptables REJECT)
  • Host not responding (iptables DROP)
  • Host unrechable (iptables REJECT with icmp-host-unreachable)
  • No DNS response for the SRV record (NXDOMAIN)
  • DNS SRV record(s) returns an invalid name (NXDOMAIN)
  • Etcd slow to respond (reaches the configured timeout)
  • High packet loss between MediaWiki and Etcd (i.e. when the master is in the other DC and there is an issue in the cross-DC connection)
Tue, May 16, 10:03 PM · Availability (Multiple-active-datacenters), MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), MediaWiki-Platform-Team, Patch-For-Review, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, Operations
Volans moved T163363: Switchdc: improvements from In Progress to Backlog on the Operations-Software-Development board.
Tue, May 16, 9:37 PM · Operations-Software-Development
Volans closed T161730: Cumin: do not auto-ucfirst when the query is a regex in PuppetDB backend as "Resolved".
Tue, May 16, 9:36 PM · Operations-Software-Development
Volans closed T164824: Cumin: only last ssh_option is set as "Resolved".
Tue, May 16, 9:36 PM · Operations-Software-Development
Volans closed T164827: Cumin: get live output when matching a single host as "Resolved".
Tue, May 16, 9:36 PM · Operations-Software-Development
Volans closed T162151: Cumin: PuppetDB, fail better if regex are used on resource parameters as "Resolved".
Tue, May 16, 9:35 PM · Operations-Software-Development
Volans closed T163196: Puppet facts around the primary network interface and IPv4/IPv6 address as "Resolved".
Tue, May 16, 9:32 PM · Operations

Wed, May 10

Volans merged task T164955: Degraded RAID on heze into T163087: Degraded RAID on heze.
Wed, May 10, 4:56 PM · Operations, ops-codfw
Volans merged T164955: Degraded RAID on heze into T163087: Degraded RAID on heze.
Wed, May 10, 4:56 PM · Operations, ops-codfw
Volans merged task T164841: Degraded RAID on elastic2020 into T164953: Degraded RAID on elastic2020.
Wed, May 10, 4:55 PM · Operations, ops-codfw
Volans merged T164841: Degraded RAID on elastic2020 into T164953: Degraded RAID on elastic2020.
Wed, May 10, 4:55 PM · Operations, ops-codfw

Tue, May 9

Volans updated subscribers of T164841: Degraded RAID on elastic2020.
Tue, May 9, 3:06 PM · Operations, ops-codfw
Volans moved T164838: Cumin: allow to specify a timeout per command from In Progress to In Code Review on the Operations-Software-Development board.
Tue, May 9, 2:54 PM · Patch-For-Review, Operations-Software-Development
Volans moved T164833: Cumin: allow to specify successful exit codes from In Progress to In Code Review on the Operations-Software-Development board.
Tue, May 9, 2:54 PM · Patch-For-Review, Operations-Software-Development
Volans moved T164838: Cumin: allow to specify a timeout per command from Backlog to In Progress on the Operations-Software-Development board.
Tue, May 9, 2:27 PM · Patch-For-Review, Operations-Software-Development
Volans created T164838: Cumin: allow to specify a timeout per command.
Tue, May 9, 2:27 PM · Patch-For-Review, Operations-Software-Development
Volans moved T164833: Cumin: allow to specify successful exit codes from Backlog to In Progress on the Operations-Software-Development board.
Tue, May 9, 1:20 PM · Patch-For-Review, Operations-Software-Development
Volans created T164833: Cumin: allow to specify successful exit codes.
Tue, May 9, 1:20 PM · Patch-For-Review, Operations-Software-Development
Volans moved T164827: Cumin: get live output when matching a single host from In Progress to In Code Review on the Operations-Software-Development board.
Tue, May 9, 11:27 AM · Operations-Software-Development
Volans moved T164827: Cumin: get live output when matching a single host from Backlog to In Progress on the Operations-Software-Development board.
Tue, May 9, 11:25 AM · Operations-Software-Development
Volans created T164827: Cumin: get live output when matching a single host.
Tue, May 9, 11:25 AM · Operations-Software-Development
Volans moved T164824: Cumin: only last ssh_option is set from In Progress to In Code Review on the Operations-Software-Development board.
Tue, May 9, 11:14 AM · Operations-Software-Development
Volans moved T164824: Cumin: only last ssh_option is set from Backlog to In Progress on the Operations-Software-Development board.
Tue, May 9, 11:13 AM · Operations-Software-Development
Volans created T164824: Cumin: only last ssh_option is set.
Tue, May 9, 11:12 AM · Operations-Software-Development

Mon, May 8

Volans added a comment to T163087: Degraded RAID on heze.

@Papaul thanks for letting me know. I understand the problem, given the particular nature of the haze host, although after a quick check I didn't see a way to get the physical location of the drive from megacli. If you know an easy way to get this information I can modify the script to check/include it when available.

Mon, May 8, 10:20 PM · Operations, ops-codfw
Volans closed T164396: Switchdc dnsdisc: add retries to check_record as "Resolved".
Mon, May 8, 9:58 AM · Patch-For-Review, Operations-Software-Development
Volans closed T164396: Switchdc dnsdisc: add retries to check_record, a subtask of T163363: Switchdc: improvements, as "Resolved".
Mon, May 8, 9:58 AM · Operations-Software-Development
Volans closed T164403: Switchdc t09 start maintenance: clear systemctl failed state as "Resolved".
Mon, May 8, 9:51 AM · Patch-For-Review, Operations-Software-Development
Volans closed T164403: Switchdc t09 start maintenance: clear systemctl failed state, a subtask of T163363: Switchdc: improvements, as "Resolved".
Mon, May 8, 9:51 AM · Operations-Software-Development
Volans closed T164400: Switchdc t05 traffic: be explicit on automatic validation as "Resolved".
Mon, May 8, 9:50 AM · Patch-For-Review, Operations-Software-Development
Volans closed T164400: Switchdc t05 traffic: be explicit on automatic validation, a subtask of T163363: Switchdc: improvements, as "Resolved".
Mon, May 8, 9:50 AM · Operations-Software-Development

Fri, May 5

Volans added a comment to T164587: cumin could use randomization/splay options.

@BBlack Thanks for opening this feature request, because right now it's totally implementation dependent and actually I realized this is neither clear nor explained in the docs / readme.

Fri, May 5, 3:15 PM · Operations, Operations-Software-Development

Thu, May 4

Volans created T164444: Installer assumes eth0 is the used interface.
Thu, May 4, 12:39 AM · Operations

Wed, May 3

Volans closed T160178: MediaWiki Datacenter Switchover automation as "Resolved".

Resolving this after a successful MediaWiki switchover to codfw and switchback to eqiad using the automation software Switchdc (operations-switchdc on gerrit). The tracking task for improvements is T163363.

Wed, May 3, 5:20 PM · Availability (Multiple-active-datacenters), Patch-For-Review, DC-Switchover-Prep-Q3-2016-17, Epic, Operations
Volans closed T160178: MediaWiki Datacenter Switchover automation, a subtask of T154658: Prepare and improve the datacenter switchover procedure, as "Resolved".
Wed, May 3, 5:20 PM · Availability (Multiple-active-datacenters), DC-Switchover-Prep-Q3-2016-17, Epic, Operations
Volans closed T160178: MediaWiki Datacenter Switchover automation, a subtask of T156100: DNS: dynamically generate entries for service discovery, as "Resolved".
Wed, May 3, 5:20 PM · Availability (Multiple-active-datacenters), Patch-For-Review, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)
Volans closed T160178: MediaWiki Datacenter Switchover automation, a subtask of T156924: Allow integration of data from etcd into the MediaWiki configuration, as "Resolved".
Wed, May 3, 5:20 PM · Availability (Multiple-active-datacenters), MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), MediaWiki-Platform-Team, Patch-For-Review, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, Operations
Volans moved T164403: Switchdc t09 start maintenance: clear systemctl failed state from In Progress to In Code Review on the Operations-Software-Development board.
Wed, May 3, 5:09 PM · Patch-For-Review, Operations-Software-Development
Volans moved T164403: Switchdc t09 start maintenance: clear systemctl failed state from Backlog to In Progress on the Operations-Software-Development board.
Wed, May 3, 5:07 PM · Patch-For-Review, Operations-Software-Development
Volans created T164403: Switchdc t09 start maintenance: clear systemctl failed state.
Wed, May 3, 5:04 PM · Patch-For-Review, Operations-Software-Development
Volans moved T164400: Switchdc t05 traffic: be explicit on automatic validation from In Progress to In Code Review on the Operations-Software-Development board.
Wed, May 3, 4:48 PM · Patch-For-Review, Operations-Software-Development
Volans moved T164400: Switchdc t05 traffic: be explicit on automatic validation from Backlog to In Progress on the Operations-Software-Development board.
Wed, May 3, 4:47 PM · Patch-For-Review, Operations-Software-Development
Volans created T164400: Switchdc t05 traffic: be explicit on automatic validation.
Wed, May 3, 4:46 PM · Patch-For-Review, Operations-Software-Development
Volans moved T164396: Switchdc dnsdisc: add retries to check_record from In Progress to In Code Review on the Operations-Software-Development board.
Wed, May 3, 4:43 PM · Patch-For-Review, Operations-Software-Development
Volans moved T164396: Switchdc dnsdisc: add retries to check_record from Backlog to In Progress on the Operations-Software-Development board.
Wed, May 3, 4:42 PM · Patch-For-Review, Operations-Software-Development
Volans created T164396: Switchdc dnsdisc: add retries to check_record.
Wed, May 3, 4:42 PM · Patch-For-Review, Operations-Software-Development
Volans added a comment to T164177: switchdc: Improve wgReadOnly message.

@EddieGP I agree with you, I closed it because this one was targeting this specific rollout and switchdc and didn't want to left it open until next switch.

Wed, May 3, 12:05 PM · Patch-For-Review, Operations, codfw-rollout, Operations-Software-Development
Volans closed T164177: switchdc: Improve wgReadOnly message as "Resolved".
Wed, May 3, 11:49 AM · Patch-For-Review, Operations, codfw-rollout, Operations-Software-Development
Volans closed T164177: switchdc: Improve wgReadOnly message, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, May 3, 11:49 AM · Operations-Software-Development
Volans closed T163398: Switchdc: mediawiki, use etcd-driven config as "Resolved".

Tasks implemented and tested. They were lated reverted because etcd was not activated in MediaWiki. Resolving

Wed, May 3, 10:53 AM · Patch-For-Review, Operations-Software-Development
Volans closed T163398: Switchdc: mediawiki, use etcd-driven config, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, May 3, 10:53 AM · Operations-Software-Development
Volans moved T164177: switchdc: Improve wgReadOnly message from In Progress to In Code Review on the Operations-Software-Development board.
Wed, May 3, 10:52 AM · Patch-For-Review, Operations, codfw-rollout, Operations-Software-Development
Volans moved T164177: switchdc: Improve wgReadOnly message from Backlog to In Progress on the Operations-Software-Development board.
Wed, May 3, 10:52 AM · Patch-For-Review, Operations, codfw-rollout, Operations-Software-Development

Tue, May 2

Volans added a comment to T156924: Allow integration of data from etcd into the MediaWiki configuration.

thanks @tstarling!

Tue, May 2, 11:45 AM · Availability (Multiple-active-datacenters), MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), MediaWiki-Platform-Team, Patch-For-Review, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, Operations
Volans moved T163398: Switchdc: mediawiki, use etcd-driven config from In Progress to In Code Review on the Operations-Software-Development board.
Tue, May 2, 11:42 AM · Patch-For-Review, Operations-Software-Development
Volans claimed T163398: Switchdc: mediawiki, use etcd-driven config.
Tue, May 2, 10:21 AM · Patch-For-Review, Operations-Software-Development
Volans moved T163398: Switchdc: mediawiki, use etcd-driven config from Backlog to In Progress on the Operations-Software-Development board.
Tue, May 2, 10:21 AM · Patch-For-Review, Operations-Software-Development

Mon, May 1

Volans added a comment to T156924: Allow integration of data from etcd into the MediaWiki configuration.

Is there an easy way I could check which version and/or value of an Etcd-driven MW-config variable is actually loaded/cached by the running application?

Mon, May 1, 10:06 AM · Availability (Multiple-active-datacenters), MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), MediaWiki-Platform-Team, Patch-For-Review, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, Operations
Volans added a comment to T164177: switchdc: Improve wgReadOnly message.

The manual change + commit + deploy of the MW configuration might actually not be needed anymore, it depends on T163398. If that change lands in production before the switchback the related tasks in Switchdc will be updated to use conftool to change those values, hence that hardcoded part will go away anyway.

Mon, May 1, 9:52 AM · Patch-For-Review, Operations, codfw-rollout, Operations-Software-Development

Sat, Apr 29

Volans claimed T163363: Switchdc: improvements.
Sat, Apr 29, 1:54 PM · Operations-Software-Development
Volans moved T163369: Switchdc warmup: be less verbose from In Code Review to Done on the Operations-Software-Development board.
Sat, Apr 29, 1:54 PM · Patch-For-Review, Operations-Software-Development
Volans closed T163376: Switchdc dnsdisc: remove stale conftool files, a subtask of T163363: Switchdc: improvements, as "Resolved".
Sat, Apr 29, 1:54 PM · Operations-Software-Development
Volans closed T163376: Switchdc dnsdisc: remove stale conftool files as "Resolved".
Sat, Apr 29, 1:54 PM · Patch-For-Review, Operations-Software-Development
Volans closed T163373: Switchdc: varnish switch, remove the manual confirmation as "Resolved".
Sat, Apr 29, 1:54 PM · Patch-For-Review, Operations-Software-Development
Volans closed T163373: Switchdc: varnish switch, remove the manual confirmation, a subtask of T163363: Switchdc: improvements, as "Resolved".
Sat, Apr 29, 1:54 PM · Operations-Software-Development
Volans moved T163376: Switchdc dnsdisc: remove stale conftool files from In Code Review to Done on the Operations-Software-Development board.
Sat, Apr 29, 1:53 PM · Patch-For-Review, Operations-Software-Development
Volans moved T163373: Switchdc: varnish switch, remove the manual confirmation from In Code Review to Done on the Operations-Software-Development board.
Sat, Apr 29, 1:52 PM · Patch-For-Review, Operations-Software-Development

Fri, Apr 28

Volans created P5345 db1061.eqiad.wmnet.
Fri, Apr 28, 2:25 PM
Volans closed T157052: Puppet compiler: sync newest facts only as "Resolved".
Fri, Apr 28, 10:20 AM · Patch-For-Review, Operations, Operations-Software-Development
Volans added a comment to T156924: Allow integration of data from etcd into the MediaWiki configuration.

Regarding the implementation of the MW configuration, in particular CR https://gerrit.wikimedia.org/r/#/c/347537 (current patchset is #8), I think that we should first agree on the failure model, because I've seen different comments and approaches.

Fri, Apr 28, 9:59 AM · Availability (Multiple-active-datacenters), MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), MediaWiki-Platform-Team, Patch-For-Review, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, Operations

Thu, Apr 27

Volans created T164042: Racktables: clearly show when hosts are decommissioned.
Thu, Apr 27, 10:54 PM · Operations
Volans added a comment to T161158: Degraded RAID on ocg1001.

Please ensure also that remote IPMI is working, eventually applying the fix in T150160, because right now is not:

Thu, Apr 27, 10:26 PM · Patch-For-Review, ops-eqiad, Operations
Volans added a comment to T155691: es1019.eqiad.wmnet drac unresponsive.

@Cmjohnson: great! Thanks a lot!

Thu, Apr 27, 7:57 PM · ops-eqiad, Operations
Volans edited the description of T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.
Thu, Apr 27, 6:10 PM · Operations
Volans added a comment to T162949: hosts with puppet compiler failures on every run.

I've added a few more that I saw today in https://puppet-compiler.wmflabs.org/6247/

Thu, Apr 27, 4:21 PM · puppet-compiler, Operations
Volans edited the description of T162949: hosts with puppet compiler failures on every run.
Thu, Apr 27, 4:21 PM · puppet-compiler, Operations
Marostegui awarded T132431: labsdb1001 and labsdb1003 short on available space a Love token.
Thu, Apr 27, 9:51 AM · Labs, Tool-Labs, DBA

Wed, Apr 26

Volans closed T163372: Switchdc: add task to re-enable and run puppet as "Resolved".
Wed, Apr 26, 2:34 PM · Patch-For-Review, Operations-Software-Development
Volans closed T163372: Switchdc: add task to re-enable and run puppet, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, Apr 26, 2:34 PM · Operations-Software-Development
Volans closed T163371: Switchdc menu: add a Next option in the menu as "Resolved".
Wed, Apr 26, 2:34 PM · Patch-For-Review, Operations-Software-Development
Volans closed T163371: Switchdc menu: add a Next option in the menu, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, Apr 26, 2:34 PM · Operations-Software-Development
Volans closed T163367: Switchdc: make SAL messages more human-friendly as "Resolved".
Wed, Apr 26, 2:33 PM · Patch-For-Review, Operations-Software-Development
Volans closed T163367: Switchdc: make SAL messages more human-friendly, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, Apr 26, 2:33 PM · Operations-Software-Development
Volans closed T163364: Switchdc dnsdisc: add check for IP records, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, Apr 26, 2:33 PM · Operations-Software-Development
Volans closed T163364: Switchdc dnsdisc: add check for IP records as "Resolved".
Wed, Apr 26, 2:33 PM · Patch-For-Review, Operations-Software-Development

Tue, Apr 25

Volans edited the description of T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.
Tue, Apr 25, 6:06 PM · Operations
Volans edited the description of T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.
Tue, Apr 25, 6:06 PM · Operations
Volans added a comment to T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.

From the audit I got the same results of the tables in T163196#3206314 except the following ones, and all looks good now for the ipaddress6_primary version:

Tue, Apr 25, 6:05 PM · Operations

Mon, Apr 24

Volans added a comment to T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.

Comparison beween ipaddress6 and ipaddress6_primary. All the ones where there is some issue are marked in bold and have a number in square brakects that is referred in the list of details at the bottom. For all the others the correct one seems to be ipaddress6_primary to me, it matches also the DNS record when present:

Mon, Apr 24, 3:09 PM · Operations
Volans added a comment to T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.

Comparison beween ipaddress and ipaddress_primary, for all the different ones the correct one seems to be ipaddress_primary to me, it matches also the DNS record for the host:

Mon, Apr 24, 3:03 PM · Operations