Volans (Riccardo Coccioli)
Operations Software Engineer

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Feb 10 2016, 11:25 AM (63 w, 19 h)
Availability
Available
IRC Nick
volans
LDAP User
Volans
MediaWiki User
RCoccioli (WMF)

Recent Activity

Yesterday

Volans closed T163372: Switchdc: add task to re-enable and run puppet as "Resolved".
Wed, Apr 26, 2:34 PM · Patch-For-Review, Operations-Software-Development
Volans closed T163372: Switchdc: add task to re-enable and run puppet, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, Apr 26, 2:34 PM · Operations-Software-Development
Volans closed T163371: Switchdc menu: add a Next option in the menu as "Resolved".
Wed, Apr 26, 2:34 PM · Patch-For-Review, Operations-Software-Development
Volans closed T163371: Switchdc menu: add a Next option in the menu, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, Apr 26, 2:34 PM · Operations-Software-Development
Volans closed T163367: Switchdc: make SAL messages more human-friendly as "Resolved".
Wed, Apr 26, 2:33 PM · Patch-For-Review, Operations-Software-Development
Volans closed T163367: Switchdc: make SAL messages more human-friendly, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, Apr 26, 2:33 PM · Operations-Software-Development
Volans closed T163364: Switchdc dnsdisc: add check for IP records, a subtask of T163363: Switchdc: improvements, as "Resolved".
Wed, Apr 26, 2:33 PM · Operations-Software-Development
Volans closed T163364: Switchdc dnsdisc: add check for IP records as "Resolved".
Wed, Apr 26, 2:33 PM · Patch-For-Review, Operations-Software-Development

Tue, Apr 25

Volans edited the description of T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.
Tue, Apr 25, 6:06 PM · Patch-For-Review, Operations
Volans edited the description of T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.
Tue, Apr 25, 6:06 PM · Patch-For-Review, Operations
Volans added a comment to T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.

From the audit I got the same results of the tables in T163196#3206314 except the following ones, and all looks good now for the ipaddress6_primary version:

Tue, Apr 25, 6:05 PM · Patch-For-Review, Operations

Mon, Apr 24

Volans added a comment to T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.

Comparison beween ipaddress6 and ipaddress6_primary. All the ones where there is some issue are marked in bold and have a number in square brakects that is referred in the list of details at the bottom. For all the others the correct one seems to be ipaddress6_primary to me, it matches also the DNS record when present:

Mon, Apr 24, 3:09 PM · Patch-For-Review, Operations
Volans added a comment to T163196: Puppet facts around the primary network interface and IPv4/IPv6 address.

Comparison beween ipaddress and ipaddress_primary, for all the different ones the correct one seems to be ipaddress_primary to me, it matches also the DNS record for the host:

Mon, Apr 24, 3:03 PM · Patch-For-Review, Operations
Volans moved T163373: Switchdc: varnish switch, remove the manual confirmation from In Progress to In Code Review on the Operations-Software-Development board.
Mon, Apr 24, 9:09 AM · Patch-For-Review, Operations-Software-Development
Volans moved T163376: Switchdc dnsdisc: remove stale conftool files from In Progress to In Code Review on the Operations-Software-Development board.
Mon, Apr 24, 9:09 AM · Patch-For-Review, Operations-Software-Development

Sun, Apr 23

Volans merged T163454: Degraded RAID on restbase1018 into T163280: Degraded RAID on restbase1018.
Sun, Apr 23, 10:28 PM · ops-eqiad, Operations
Volans merged task T163454: Degraded RAID on restbase1018 into T163280: Degraded RAID on restbase1018.
Sun, Apr 23, 10:28 PM · ops-eqiad, Operations
Volans moved T163376: Switchdc dnsdisc: remove stale conftool files from Backlog to In Progress on the Operations-Software-Development board.
Sun, Apr 23, 10:23 PM · Patch-For-Review, Operations-Software-Development
Volans moved T163373: Switchdc: varnish switch, remove the manual confirmation from Backlog to In Progress on the Operations-Software-Development board.
Sun, Apr 23, 10:35 AM · Patch-For-Review, Operations-Software-Development
Volans moved T163369: Switchdc warmup: be less verbose from In Progress to In Code Review on the Operations-Software-Development board.
Sun, Apr 23, 9:49 AM · Patch-For-Review, Operations-Software-Development
Volans moved T163369: Switchdc warmup: be less verbose from Backlog to In Progress on the Operations-Software-Development board.
Sun, Apr 23, 9:48 AM · Patch-For-Review, Operations-Software-Development

Sat, Apr 22

Volans renamed T163365: Switchdc RO/RW: add check to test it editing a real wiki from "Switchdc RO/RW: add check to test that it on a real wiki" to "Switchdc RO/RW: add check to test it editing a real wiki".
Sat, Apr 22, 4:12 PM · Operations-Software-Development
Volans moved T163364: Switchdc dnsdisc: add check for IP records from In Progress to In Code Review on the Operations-Software-Development board.
Sat, Apr 22, 4:10 PM · Patch-For-Review, Operations-Software-Development
Volans moved T163364: Switchdc dnsdisc: add check for IP records from Backlog to In Progress on the Operations-Software-Development board.
Sat, Apr 22, 4:10 PM · Patch-For-Review, Operations-Software-Development
Volans moved T163372: Switchdc: add task to re-enable and run puppet from In Progress to In Code Review on the Operations-Software-Development board.
Sat, Apr 22, 2:13 PM · Patch-For-Review, Operations-Software-Development
Volans moved T163372: Switchdc: add task to re-enable and run puppet from Backlog to In Progress on the Operations-Software-Development board.
Sat, Apr 22, 2:05 PM · Patch-For-Review, Operations-Software-Development
Volans moved T163363: Switchdc: improvements from Backlog to In Progress on the Operations-Software-Development board.
Sat, Apr 22, 2:04 PM · Operations-Software-Development
Volans moved T163371: Switchdc menu: add a Next option in the menu from In Progress to In Code Review on the Operations-Software-Development board.
Sat, Apr 22, 2:04 PM · Patch-For-Review, Operations-Software-Development
Volans moved T163371: Switchdc menu: add a Next option in the menu from Backlog to In Progress on the Operations-Software-Development board.
Sat, Apr 22, 2:04 PM · Patch-For-Review, Operations-Software-Development
Volans moved T163367: Switchdc: make SAL messages more human-friendly from In Progress to In Code Review on the Operations-Software-Development board.
Sat, Apr 22, 2:04 PM · Patch-For-Review, Operations-Software-Development
Volans added a comment to T163371: Switchdc menu: add a Next option in the menu.

The choice of the next one could be complex to cover all cases with multiple submenu levels, menu with mixed items and submenus, etc. Also it's possible to run a specific task out of order for some reason.

Sat, Apr 22, 12:00 PM · Patch-For-Review, Operations-Software-Development
Volans updated subscribers of T163367: Switchdc: make SAL messages more human-friendly.

With the above CR new SAL messages will be:

Sat, Apr 22, 11:40 AM · Patch-For-Review, Operations-Software-Development

Fri, Apr 21

Volans edited projects for T163565: Install conftool on deployment masters, added: Operations; removed Operations-Software-Development.
Fri, Apr 21, 5:38 PM · Patch-For-Review, Operations, Scap (Scap3-MediaWiki-MVP), Deployment-Systems

Thu, Apr 20

Volans added a comment to T161158: Degraded RAID on ocg1001.

Relating it also to T155692

Thu, Apr 20, 3:46 PM · ops-eqiad, Operations
Volans added a comment to T163286: Tegmen: process spawn loop + failed icinga + failing puppet.

So after the switch of tegmen as active now we have the issue on einsteinium:

Thu, Apr 20, 10:49 AM · Patch-For-Review, Monitoring, Operations
Volans claimed T163367: Switchdc: make SAL messages more human-friendly.
Thu, Apr 20, 8:23 AM · Patch-For-Review, Operations-Software-Development

Wed, Apr 19

Volans created T163398: Switchdc: mediawiki, use etcd-driven config.
Wed, Apr 19, 11:15 PM · Operations-Software-Development
Volans created T163376: Switchdc dnsdisc: remove stale conftool files.
Wed, Apr 19, 7:50 PM · Patch-For-Review, Operations-Software-Development
Volans created T163373: Switchdc: varnish switch, remove the manual confirmation.
Wed, Apr 19, 7:15 PM · Patch-For-Review, Operations-Software-Development
Volans created T163372: Switchdc: add task to re-enable and run puppet.
Wed, Apr 19, 6:35 PM · Patch-For-Review, Operations-Software-Development
Volans created T163371: Switchdc menu: add a Next option in the menu.
Wed, Apr 19, 6:33 PM · Patch-For-Review, Operations-Software-Development
Volans created T163369: Switchdc warmup: be less verbose.
Wed, Apr 19, 6:32 PM · Patch-For-Review, Operations-Software-Development
Volans triaged T163365: Switchdc RO/RW: add check to test it editing a real wiki as "Normal" priority.
Wed, Apr 19, 6:30 PM · Operations-Software-Development
Volans triaged T163363: Switchdc: improvements as "Normal" priority.
Wed, Apr 19, 6:30 PM · Operations-Software-Development
Volans created T163367: Switchdc: make SAL messages more human-friendly.
Wed, Apr 19, 6:30 PM · Patch-For-Review, Operations-Software-Development
Volans renamed T163364: Switchdc dnsdisc: add check for IP records from "dnsdisc: add check for IP records" to "Switchdc dnsdisc: add check for IP records".
Wed, Apr 19, 6:28 PM · Patch-For-Review, Operations-Software-Development
Volans renamed T163365: Switchdc RO/RW: add check to test it editing a real wiki from "RO/RW: add check to test that it on a real wiki" to "Switchdc RO/RW: add check to test that it on a real wiki".
Wed, Apr 19, 6:28 PM · Operations-Software-Development
Volans created T163365: Switchdc RO/RW: add check to test it editing a real wiki.
Wed, Apr 19, 6:21 PM · Operations-Software-Development
Volans created T163364: Switchdc dnsdisc: add check for IP records.
Wed, Apr 19, 6:17 PM · Patch-For-Review, Operations-Software-Development
Volans created T163363: Switchdc: improvements.
Wed, Apr 19, 6:15 PM · Operations-Software-Development
Volans added a comment to T163286: Tegmen: process spawn loop + failed icinga + failing puppet.

What was that configuration error ?

Wed, Apr 19, 1:23 PM · Patch-For-Review, Monitoring, Operations
Volans added a comment to T163286: Tegmen: process spawn loop + failed icinga + failing puppet.

@akosiaris: I've found that the catalog for tegmen doesn't have Nagios_Host and Nagios_Service resources and I think this is due because of this hack:

Wed, Apr 19, 11:01 AM · Patch-For-Review, Monitoring, Operations
Volans closed T163312: lvs2001: intermittent packet loss from Icinga checks as "Resolved".

Increased the max ICMP out packets to 3000 to overcome the bottleneck.
Packet loss is down back to zero and the graph has a normal trend without bottlenecks.

Wed, Apr 19, 10:39 AM · Patch-For-Review, netops, Traffic, Operations
Volans added a comment to T163312: lvs2001: intermittent packet loss from Icinga checks.

Ping from various codfw hosts confirms packet loss:

Wed, Apr 19, 9:25 AM · Patch-For-Review, netops, Traffic, Operations
Volans created T163312: lvs2001: intermittent packet loss from Icinga checks.
Wed, Apr 19, 9:03 AM · Patch-For-Review, netops, Traffic, Operations

Tue, Apr 18

Volans added a comment to T163286: Tegmen: process spawn loop + failed icinga + failing puppet.

Also, why we do the stop/sync/start all the time instead of just syncing the files on a safe location and have a script make-icinga-primary or similar that does in a run-no-puppet the stop/mv/start that we can run manually only when needed?

Tue, Apr 18, 11:45 PM · Patch-For-Review, Monitoring, Operations
Volans added a comment to T163286: Tegmen: process spawn loop + failed icinga + failing puppet.

Could it be that the crontab that runs every 10 minutes had a race with a puppet run and make all this mess... I don't see it wrapped in a run-no-puppet:

Tue, Apr 18, 11:28 PM · Patch-For-Review, Monitoring, Operations
Volans created T163286: Tegmen: process spawn loop + failed icinga + failing puppet.
Tue, Apr 18, 11:25 PM · Patch-For-Review, Monitoring, Operations
Volans added a comment to T163087: Degraded RAID on heze.

@Dzahn I've updated the output with the result of sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli (you can get the right one arriving at get- and pressing tab to know which one is available on the specific host)

Tue, Apr 18, 4:03 PM · Operations, ops-codfw
Volans edited the description of T163087: Degraded RAID on heze.
Tue, Apr 18, 4:01 PM · Operations, ops-codfw
Volans triaged T163209: Degraded RAID on ms-be1002 as "Normal" priority.
Tue, Apr 18, 2:52 PM · media-storage, ops-eqiad, Operations

Fri, Apr 14

Volans added a project to T148609: Review and deploy Linter extension to Wikimedia wikis: DBA.
Fri, Apr 14, 9:26 PM · MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), DBA, MediaWiki-Platform-Team, User-notice, Patch-For-Review, MediaWiki-extensions-Linter, Wikimedia-Extension-setup
Volans added a comment to T148609: Review and deploy Linter extension to Wikimedia wikis.

We had to revert the last change on emergency because it was causing issues on commonswiki (s4) and in general on large wikis.

Fri, Apr 14, 9:25 PM · MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), DBA, MediaWiki-Platform-Team, User-notice, Patch-For-Review, MediaWiki-extensions-Linter, Wikimedia-Extension-setup

Thu, Apr 13

Volans added a parent task for T162879: Replace deprecated hook usage ChangesListSpecialPageFilters in WikimediaEvents: T162885: Fix MediaWiki deprecated calls in Wikimedia production, 2017-04-13.
Thu, Apr 13, 1:25 PM · MW-1.29-release (WMF-deploy-2017-04-11_(1.29.0-wmf.20)), Patch-For-Review, Wikimedia-log-errors, MediaWiki-extensions-WikimediaEvents
Volans added a subtask for T162885: Fix MediaWiki deprecated calls in Wikimedia production, 2017-04-13: T162879: Replace deprecated hook usage ChangesListSpecialPageFilters in WikimediaEvents.
Thu, Apr 13, 1:25 PM · Technical-Debt, Wikimedia-log-errors, MediaWiki-General-or-Unknown
Volans added a parent task for T162880: Update deprecated hook usage EditPageBeforeEditChecks: T162885: Fix MediaWiki deprecated calls in Wikimedia production, 2017-04-13.
Thu, Apr 13, 1:23 PM · Wikimedia-log-errors, MediaWiki-extensions-WikibaseClient, Wikidata
Volans added a subtask for T162885: Fix MediaWiki deprecated calls in Wikimedia production, 2017-04-13: T162880: Update deprecated hook usage EditPageBeforeEditChecks.
Thu, Apr 13, 1:23 PM · Technical-Debt, Wikimedia-log-errors, MediaWiki-General-or-Unknown
Volans added a subtask for T162885: Fix MediaWiki deprecated calls in Wikimedia production, 2017-04-13: T162878: Update deprecated hooks in Flagged Revs.
Thu, Apr 13, 1:20 PM · Technical-Debt, Wikimedia-log-errors, MediaWiki-General-or-Unknown
Volans added a parent task for T162878: Update deprecated hooks in Flagged Revs: T162885: Fix MediaWiki deprecated calls in Wikimedia production, 2017-04-13.
Thu, Apr 13, 1:20 PM · MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), Patch-For-Review, Wikimedia-log-errors, MediaWiki-extensions-FlaggedRevs
Volans created T162885: Fix MediaWiki deprecated calls in Wikimedia production, 2017-04-13.
Thu, Apr 13, 1:20 PM · Technical-Debt, Wikimedia-log-errors, MediaWiki-General-or-Unknown
Volans reopened T157353: prometheus-vhtcpd-stats cronspamming if vhtcpd is not running yet as "Open".

Re-opening because this is happening when rebooting hosts, see last days root@ mails

Thu, Apr 13, 10:57 AM · Patch-For-Review, Traffic, User-Elukey, Operations
Volans reopened T157353: prometheus-vhtcpd-stats cronspamming if vhtcpd is not running yet, a subtask of T132324: Tracking and Reducing cron-spam from root@ , as "Open".
Thu, Apr 13, 10:57 AM · Patch-For-Review, User-Elukey, Operations

Wed, Apr 12

Volans renamed T162780: ocg1003 partitions are severely misconfigured from "ogc1003 partitions are severely misconfigured" to "ocg1003 partitions are severely misconfigured".
Wed, Apr 12, 9:29 AM · Operations

Sun, Apr 9

Volans added a project to T145360: Cronspam from terbium: MediaWiki-extensions-PageAssessments.

Since Feb. 19th we're getting one email every day from terbium with an error for each wiki (~900 lines email) with:

Sun, Apr 9, 9:53 PM · MediaWiki-extensions-PageAssessments, Operations
Volans added a comment to T159137: certspotter: Error retrieving STH from log.

Since a couple of days both einsteinium and tegmen are spamming root@ every hour with certspotter errors, this time seems that the DigiCert service is responding 400 for the check requests:

Sun, Apr 9, 9:44 PM · Traffic, Operations

Sat, Apr 8

Volans added a comment to T162347: Degraded RAID on ms-be1006.

I've also ACK'ed on Icinga the related puppet run alarm

Sat, Apr 8, 1:35 PM · media-storage, ops-eqiad, Operations

Thu, Apr 6

Volans added a comment to T162345: Reduce number of false positive alerts on postgresql lag for maps.

I think it might happen when a VACUUM is running on the master, at least today that we have a lot of delay on the maps-test cluster I've noticed that a VACUUM is running since 15h:

Thu, Apr 6, 10:05 PM · Patch-For-Review, Discovery, Operations, Interactive-Sprint, Maps, PostgreSQL
Volans added a comment to P5212 no-CT-on-304s.patch.

Just nitpicking, pop() is returning the value that of course you don't need, for how the HeaderKeyDict is implemented.
There is a __delitem__ implemented that you could use with del self.headers['Content-Type'] if I'm not mistaken.

Thu, Apr 6, 9:58 AM
Volans added a project to T162347: Degraded RAID on ms-be1006: media-storage.
Thu, Apr 6, 9:35 AM · media-storage, ops-eqiad, Operations
Volans added a comment to T162122: Swiftrepl was stuck in an infinite loop since days.

A second pass was completed successfully without any manual intervention.

Thu, Apr 6, 8:04 AM · User-fgiunchedi, Operations, media-storage

Wed, Apr 5

Volans added a comment to T156465: For switchovers: A way to check if slaves are up to date.

In the medium term I've in mind a bunch of things that should help towards this direction. Feel free to ping me to talk about it.

Wed, Apr 5, 11:57 PM · DBA
Volans added a comment to T162122: Swiftrepl was stuck in an infinite loop since days.

The first run of the swiftrepl has finally completed! It is now in the 2 hour sleep between runs, I'll check the next one completes without manual intevention.

Wed, Apr 5, 12:11 PM · User-fgiunchedi, Operations, media-storage
Volans closed T159163: PuppetDB is auto-deactivating hosts as "Resolved".
Wed, Apr 5, 12:04 PM · Patch-For-Review, Puppet, Operations
Volans added a comment to T162122: Swiftrepl was stuck in an infinite loop since days.

The third one was:

wikipedia-commons-local-thumb.3b        3/3b/Hendrick_de_Keyser_-_gulden_cabinet.png/85px-Hendrick_de_Keyser_-_gulden_cabinet.png       E-Tag mismatch:
bc68f6efc732fda68647dcd65867cef9/cd3b1b810889387c0ff7bed187e87125, syncing
Wed, Apr 5, 9:49 AM · User-fgiunchedi, Operations, media-storage

Tue, Apr 4

Volans moved T162151: Cumin: PuppetDB, fail better if regex are used on resource parameters from In Progress to In Code Review on the Operations-Software-Development board.
Tue, Apr 4, 2:52 PM · Patch-For-Review, Operations-Software-Development
Volans moved T162151: Cumin: PuppetDB, fail better if regex are used on resource parameters from Backlog to In Progress on the Operations-Software-Development board.
Tue, Apr 4, 2:45 PM · Patch-For-Review, Operations-Software-Development
Volans created T162151: Cumin: PuppetDB, fail better if regex are used on resource parameters.
Tue, Apr 4, 2:33 PM · Patch-For-Review, Operations-Software-Development
Volans created T162123: Running swiftrepl is not puppetized.
Tue, Apr 4, 9:29 AM · User-fgiunchedi, Operations, media-storage
Volans created T162122: Swiftrepl was stuck in an infinite loop since days.
Tue, Apr 4, 9:27 AM · User-fgiunchedi, Operations, media-storage
mmodell awarded T156100: DNS: dynamically generate entries for service discovery a Love token.
Tue, Apr 4, 2:27 AM · Patch-For-Review, Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)

Mon, Apr 3

Volans edited P5186 content-type.
Mon, Apr 3, 3:02 PM
Volans edited P5186 content-type.
Mon, Apr 3, 1:49 PM
Volans created P5186 content-type.
Mon, Apr 3, 1:42 PM
Volans added a comment to T159163: PuppetDB is auto-deactivating hosts.

Ops, I read the previous message as it required a restart of puppetmasters, not puppetdb, sorry for the misunderstanding.

Mon, Apr 3, 8:43 AM · Patch-For-Review, Puppet, Operations
Volans added a comment to T159163: PuppetDB is auto-deactivating hosts.

@Joe @akosiaris, actually looks like this is a NOOP on the puppetmasters, but a change on just the puppetdb hosts:

Mon, Apr 3, 8:40 AM · Patch-For-Review, Puppet, Operations

Sun, Apr 2

Volans lowered the priority of T86086: Make DI forms visually match what's on donate wiki from "Normal" to "Low".
Sun, Apr 2, 4:34 PM · § Fundraising Sprint Abba, Fundraising-Backlog
Volans added a member for Phabricator: Luke081515.
Sun, Apr 2, 4:28 PM
Volans added a member for Phabricator: mmodell.
Sun, Apr 2, 4:26 PM
Volans added a member for Phabricator: chasemp.
Sun, Apr 2, 4:26 PM
Volans removed a member for Phabricator: Cholof13.
Sun, Apr 2, 4:26 PM