Volans (Riccardo Coccioli)
Operations Software Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Feb 10 2016, 11:25 AM (101 w, 6 d)
Availability
Available
IRC Nick
volans
LDAP User
Volans
MediaWiki User
RCoccioli (WMF)

Recent Activity

Yesterday

Volans added a comment to T185505: Netbox: add Icinga check for the website.

Those are a good start, but if I'm not mistaken, all of them are local on the host, right?
I think we might need one for netbox.wikimedia.org too, done from the Icinga host itself to check that the service is reachable, and potentially that has some data in it.
The last part might be a bit more difficult given that is behind a login, unless netbox expose a "check" URL where it makes some internal checks and expose the result.

Mon, Jan 22, 10:16 PM · monitoring, Operations
Volans updated the task description for T184561: Modernize Puppet Configuration Management (2017-18 Q3 Goal).
Mon, Jan 22, 4:52 PM · Goal, Puppet, Operations
Volans closed T182575: Cumin: PuppetDB backend, add support for API v4, a subtask of T184561: Modernize Puppet Configuration Management (2017-18 Q3 Goal), as Resolved.
Mon, Jan 22, 4:51 PM · Goal, Puppet, Operations
Volans closed T182575: Cumin: PuppetDB backend, add support for API v4 as Resolved.

Cumin 2.0.0 with support for Puppet API v4 was released. Debdeploy was updated accordingly.
Both debdeploy and cumin were released into production.

Mon, Jan 22, 4:51 PM · Puppet, Operations-Software-Development
Volans moved T185504: Netbox: add Icinga check for PosgreSQL from Backlog to Up next on the monitoring board.
Mon, Jan 22, 4:25 PM · monitoring, Operations
Volans moved T185505: Netbox: add Icinga check for the website from Backlog to Up next on the monitoring board.
Mon, Jan 22, 4:25 PM · monitoring, Operations
Volans removed a project from T184634: Netbox: postgres cannot be restarted w/ current config: monitoring.
Mon, Jan 22, 4:24 PM · Patch-For-Review, Operations
Volans triaged T185505: Netbox: add Icinga check for the website as Normal priority.
Mon, Jan 22, 4:24 PM · monitoring, Operations
Volans triaged T185504: Netbox: add Icinga check for PosgreSQL as Normal priority.
Mon, Jan 22, 4:22 PM · monitoring, Operations
Volans committed rCUMIN5b520ad954f1: Backends: add known hosts files backend (authored by Volans).
Backends: add known hosts files backend
Mon, Jan 22, 2:31 PM
Volans committed rCUMIN36bbf0ad9f86: Migration to Python 3 (authored by Volans).
Migration to Python 3
Mon, Jan 22, 10:33 AM

Fri, Jan 19

Volans added a comment to T184634: Netbox: postgres cannot be restarted w/ current config.

Nice! So I guess that our puppetization is not correct and should restart Postgres after the first configuration change to ensure that the new data directory is used from the start.

Fri, Jan 19, 5:11 PM · Patch-For-Review, Operations
Volans committed rCUMIN9187921e93a3: Upstream release v2.0.0 (authored by Volans).
Upstream release v2.0.0
Fri, Jan 19, 4:51 PM
Volans committed rCUMIN283bab462066: Merge tag 'tags/v2.0.0' into debian (authored by Volans).
Merge tag 'tags/v2.0.0' into debian
Fri, Jan 19, 4:51 PM
Volans committed rCUMIN0c369cf61bdc: CHANGELOG: add changelogs for release v2.0.0 (authored by Volans).
CHANGELOG: add changelogs for release v2.0.0
Fri, Jan 19, 4:26 PM

Thu, Jan 18

Volans added a comment to T178815: decom cp40(09|1[078]).

I've run clean + deactivate for cp4018 as part of cleanup of stale puppet certs.

Thu, Jan 18, 5:20 PM · Traffic, Operations, ops-ulsfo
Volans committed rCUMIN3d4aa180ba15: Copyright notice: add 2018 (authored by Volans).
Copyright notice: add 2018
Thu, Jan 18, 3:29 PM
Volans added a commit to T166397: Cumin fails on huge nodelists emitted by its own outputs: rCUMIN687bddf39386: PuppetDB backend: add support for API v4.
Thu, Jan 18, 3:29 PM · Operations-Software-Development
Volans added a task to rCUMIN687bddf39386: PuppetDB backend: add support for API v4: T166397: Cumin fails on huge nodelists emitted by its own outputs.
Thu, Jan 18, 3:29 PM
Volans added a comment to T185195: Sporadic logrotate issue for stretch mediawiki appservers.

FYI https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=881725

Thu, Jan 18, 11:02 AM · Operations, User-Elukey

Wed, Jan 17

Volans updated the task description for T182597: Use EtcdConfig in production to allow automation of a datacenter switch.
Wed, Jan 17, 10:00 AM · discovery-system, MediaWiki-Configuration, Operations
Volans updated the task description for T185078: Test EtcdConfig in different failure scenarios.
Wed, Jan 17, 9:59 AM · discovery-system, MediaWiki-Configuration, Operations

Tue, Jan 16

Volans updated the task description for T184634: Netbox: postgres cannot be restarted w/ current config.
Tue, Jan 16, 5:59 PM · Patch-For-Review, Operations
Volans closed T169548: Prepare for Puppet 4 as Resolved.
Tue, Jan 16, 3:34 PM · User-Joe, Puppet, Operations
Volans closed T169548: Prepare for Puppet 4, a subtask of T177254: Upgrade to puppet 4 (4.8 or newer), as Resolved.
Tue, Jan 16, 3:34 PM · cloud-services-team (FY2017-18), Puppet, User-Joe, Operations

Fri, Jan 12

Volans added a comment to T184784: Is the 'puppet3-diffs' VPS project still in use?.

@Andrew yes those are the Puppet compiler instances that Jenkins uses. We can agree that the name of the project was not chosen to be very future-proof ;) but the hosts are very much in use.

Fri, Jan 12, 10:05 AM · cloud-services-team

Thu, Jan 11

Volans closed T184103: ircecho doesn't reconnect on failure as Resolved.

This was a misunderstanding on my side, @Dzahn actually stopped it manually.

Thu, Jan 11, 11:08 PM · Patch-For-Review, IRCecho, monitoring, Operations
Volans reopened T184103: ircecho doesn't reconnect on failure as "Open".
Thu, Jan 11, 11:04 PM · Patch-For-Review, IRCecho, monitoring, Operations
Volans committed rCUMIN17773689f25b: Migration to Python 3 (authored by Volans).
Migration to Python 3
Thu, Jan 11, 9:38 PM
Volans committed rCUMIN687bddf39386: PuppetDB backend: add support for API v4 (authored by Volans).
PuppetDB backend: add support for API v4
Thu, Jan 11, 9:38 PM
Volans updated the task description for T184714: Puppet fail to properly refresh Icinga.
Thu, Jan 11, 12:52 PM · monitoring, Operations
Volans updated the task description for T184714: Puppet fail to properly refresh Icinga.
Thu, Jan 11, 12:52 PM · monitoring, Operations
Volans created T184714: Puppet fail to properly refresh Icinga.
Thu, Jan 11, 12:52 PM · monitoring, Operations
Volans closed T170353: Icinga: timeseries checks should have the link to a graph with the data as Resolved.

TL;DR: Everything is back to einsteinium now, and everything is working. Resolving.

Thu, Jan 11, 12:50 PM · Operations, monitoring
Volans added a comment to T170144: Evaluate NetBox as a Racktables replacement & IPAM.

While trying to fix the issues after the reboot for the kernel upgrade, I've opened T184634.
But now it seems that the Postgres DB is empty (no tables in the netbox DB). I'm not sure if it was emptied as part of some of the tests above, or the reboot + puppet broken might have done this.

Thu, Jan 11, 9:43 AM · Patch-For-Review, netops, Operations
Volans updated the task description for T184634: Netbox: postgres cannot be restarted w/ current config.
Thu, Jan 11, 9:27 AM · Patch-For-Review, Operations
Volans updated the task description for T184634: Netbox: postgres cannot be restarted w/ current config.
Thu, Jan 11, 9:14 AM · Patch-For-Review, Operations

Wed, Jan 10

Volans created T184634: Netbox: postgres cannot be restarted w/ current config.
Wed, Jan 10, 6:28 PM · Patch-For-Review, Operations
Volans edited P6567 Icinga: einstenium vs tegmen diffs.
Wed, Jan 10, 3:29 PM
Volans created P6567 Icinga: einstenium vs tegmen diffs.
Wed, Jan 10, 3:28 PM
Volans added a comment to T170353: Icinga: timeseries checks should have the link to a graph with the data.

Confirmed that on tegmen it works fine after failovering the active Icinga server to it.
The links are properly rendered and the ampersends are not dropped, as opposed to what happens on einsteinium.

Wed, Jan 10, 11:26 AM · Operations, monitoring

Tue, Jan 9

Volans updated subscribers of T184561: Modernize Puppet Configuration Management (2017-18 Q3 Goal).
Tue, Jan 9, 11:06 PM · Goal, Puppet, Operations
Volans updated the task description for T184561: Modernize Puppet Configuration Management (2017-18 Q3 Goal).
Tue, Jan 9, 10:48 PM · Goal, Puppet, Operations
Volans added a project to T184561: Modernize Puppet Configuration Management (2017-18 Q3 Goal): Goal.
Tue, Jan 9, 10:42 PM · Goal, Puppet, Operations
Volans merged T184530: Degraded RAID on lvs3001 into T168619: Degraded RAID on lvs3001.
Tue, Jan 9, 4:29 PM · ops-esams, Operations
Volans merged task T184530: Degraded RAID on lvs3001 into T168619: Degraded RAID on lvs3001.
Tue, Jan 9, 4:29 PM · ops-esams, Operations
Volans merged T184533: Degraded RAID on lvs3001 into T168619: Degraded RAID on lvs3001.
Tue, Jan 9, 4:29 PM · ops-esams, Operations
Volans merged task T184533: Degraded RAID on lvs3001 into T168619: Degraded RAID on lvs3001.
Tue, Jan 9, 4:29 PM · ops-esams, Operations
Volans committed rCUMIN78077bdf59f4: Migration to Python 3 (authored by Volans).
Migration to Python 3
Tue, Jan 9, 3:17 PM
Volans merged T184528: Degraded RAID on lvs3001 into T168619: Degraded RAID on lvs3001.
Tue, Jan 9, 3:15 PM · ops-esams, Operations
Volans merged task T184528: Degraded RAID on lvs3001 into T168619: Degraded RAID on lvs3001.
Tue, Jan 9, 3:15 PM · ops-esams, Operations
Volans updated subscribers of T182575: Cumin: PuppetDB backend, add support for API v4.
Tue, Jan 9, 3:05 PM · Puppet, Operations-Software-Development

Mon, Jan 8

Volans triaged T184435: Puppet tox: properly lint both Py2 and Py3 files as Normal priority.
Mon, Jan 8, 2:19 PM · Continuous-Integration-Config, Operations
Volans created T184435: Puppet tox: properly lint both Py2 and Py3 files.
Mon, Jan 8, 2:19 PM · Continuous-Integration-Config, Operations
Volans added a project to T184390: Degraded RAID on ms-be2037: media-storage.
Mon, Jan 8, 9:23 AM · media-storage, Operations, ops-codfw

Thu, Jan 4

Volans committed rCUMIN73d1e72473ff: Migration to Python 3 (authored by Volans).
Migration to Python 3
Thu, Jan 4, 3:22 PM
Volans committed rCUMINdc2fb297a7e6: Migration to Python 3 (authored by Volans).
Migration to Python 3
Thu, Jan 4, 3:17 PM
Volans committed rCUMIN72cfb69d78b4: PuppetDB backend: add support for API v4 (authored by Volans).
PuppetDB backend: add support for API v4
Thu, Jan 4, 12:20 PM

Wed, Jan 3

Volans updated subscribers of T184103: ircecho doesn't reconnect on failure.

Thanks to @cwdent for notifying us.

Wed, Jan 3, 7:07 PM · Patch-For-Review, IRCecho, monitoring, Operations
Volans triaged T184103: ircecho doesn't reconnect on failure as High priority.
Wed, Jan 3, 7:06 PM · Patch-For-Review, IRCecho, monitoring, Operations
Volans created T184103: ircecho doesn't reconnect on failure.
Wed, Jan 3, 7:06 PM · Patch-For-Review, IRCecho, monitoring, Operations

Tue, Jan 2

Volans committed rCUMIN7c23486236f3: PuppetDB backend: add support for API v4 (authored by Volans).
PuppetDB backend: add support for API v4
Tue, Jan 2, 3:12 PM
Volans committed rCUMINade3a3e9c811: ClusterShell backend: fix execute() return code (authored by Volans).
ClusterShell backend: fix execute() return code
Tue, Jan 2, 3:12 PM
Volans committed rCUMIN7d056a851c74: PuppetDB backend: add support for API v4 (authored by Volans).
PuppetDB backend: add support for API v4
Tue, Jan 2, 3:12 PM
Volans committed rCUMINa5b7fdb656dc: ClusterShell backend: fix execute() return code (authored by Volans).
ClusterShell backend: fix execute() return code
Tue, Jan 2, 3:12 PM

Tue, Dec 26

Volans updated subscribers of T174916: electron/pdfrender hangs.

pdfrender on all eqiad hosts required restarts tonight (UTC), see SAL. Thanks @madhuvishy for taking care of it.

Tue, Dec 26, 10:37 AM · Readers-Web-Backlog (Tracking), Electron-PDFs, Operations, Services (blocked)

Dec 22 2017

Volans committed rCUMIN71f0450eeeaf: PuppetDB backend: add support for API v4 (authored by Volans).
PuppetDB backend: add support for API v4
Dec 22 2017, 4:33 PM
Volans committed rCUMIN43220d3053a5: ClusterShell backend: fix execute() return code (authored by Volans).
ClusterShell backend: fix execute() return code
Dec 22 2017, 4:33 PM
Volans moved T182575: Cumin: PuppetDB backend, add support for API v4 from In Progress to In Code Review on the Operations-Software-Development board.
Dec 22 2017, 3:41 PM · Puppet, Operations-Software-Development
Volans committed rCUMINb8543da9ae9d: PuppetDB backend: add support for API v4 (authored by Volans).
PuppetDB backend: add support for API v4
Dec 22 2017, 3:26 PM
Volans triaged T182575: Cumin: PuppetDB backend, add support for API v4 as Normal priority.
Dec 22 2017, 3:13 PM · Puppet, Operations-Software-Development
Volans triaged T183071: Import kibana package from jessie into stretch as Normal priority.
Dec 22 2017, 9:20 AM · Patch-For-Review, MediaWiki-Vagrant, Operations
Volans triaged T183209: decom uranium as Normal priority.
Dec 22 2017, 9:18 AM · Patch-For-Review, hardware-requests, ops-eqiad, monitoring, Technical-Debt, Operations

Dec 21 2017

Volans added a comment to T170353: Icinga: timeseries checks should have the link to a graph with the data.

To summarize the current status, everything is deployed and works as expected, except one small detail: the ampersand are removed from the dashboard URLs, making them mostly useless :/

Dec 21 2017, 6:59 PM · Operations, monitoring
Volans closed T183475: ircecho: UnicodeDecodeError on reconnect as Resolved.

Workaround to make it fail completely and let systemd restart it deployed. Resolving it for now.

Dec 21 2017, 4:59 PM · monitoring, Operations
Volans created T183475: ircecho: UnicodeDecodeError on reconnect.
Dec 21 2017, 4:18 PM · monitoring, Operations

Dec 19 2017

Volans added a comment to T181121: Hardware errors on ganeti1005- ganeti1008.

@akosiaris if you're trying to reimage those as Jessie, we still have the netinst issue open, so you need to set numa=off to unblock it, see T182702.

Dec 19 2017, 7:40 PM · ops-eqiad, Operations
Volans placed T181952: Requesting access to EventLogging data for Vinitha up for grabs.
Dec 19 2017, 6:03 PM · Patch-For-Review, AICaptcha, WMF-NDA-Requests, Operations, Ops-Access-Requests
Volans closed T182908: Requesting access to analytics-privatedata-users group for Jonas Kress as Resolved.

All done, resolving.

Dec 19 2017, 4:57 PM · Patch-For-Review, User-Addshore, Operations, Ops-Access-Requests
Volans updated the task description for T182908: Requesting access to analytics-privatedata-users group for Jonas Kress.
Dec 19 2017, 4:56 PM · Patch-For-Review, User-Addshore, Operations, Ops-Access-Requests
Volans triaged T183236: After reimage Puppet order: sudo command failed as Normal priority.
Dec 19 2017, 12:25 PM · Operations
Volans created T183236: After reimage Puppet order: sudo command failed.
Dec 19 2017, 12:25 PM · Operations
Volans created T183234: Gerrit: autocomplete to add reviewers slow.
Dec 19 2017, 11:50 AM · Gerrit
Volans triaged T182597: Use EtcdConfig in production to allow automation of a datacenter switch as Normal priority.
Dec 19 2017, 11:34 AM · discovery-system, MediaWiki-Configuration, Operations
Volans added a comment to T183176: cp4032 memory error.

@RobH FYI I've ack'ed the Icinga alert of the host down and set it to downtime until Fri UTC morning.

Dec 19 2017, 9:06 AM · Operations, Traffic, ops-ulsfo

Dec 18 2017

Volans added a subtask for T132324: Tracking and Reducing cron-spam from root@ : Unknown Object (Task).
Dec 18 2017, 2:26 PM · Patch-For-Review, Operations
Volans added a comment to T181121: Hardware errors on ganeti1005- ganeti1008.

@akosiaris reimages should be unblocked, see T182702#3844595

Dec 18 2017, 1:47 PM · ops-eqiad, Operations
Volans added a comment to T182702: Debian Jessie reimage/install ends up in kernel panic with 8.10 netboot image .

The reimage scripts should be back on track and work as expected. It was tested today with a couple of reimages. I cannot exclude we'll find some other corner cases with less used OS versions and with Puppet4 clients. But from my side this could be resolved.

Dec 18 2017, 1:46 PM · Patch-For-Review, Operations
Volans removed a project from T85451: scale graphite deployment (tracking): Blocked-on-Operations.
Dec 18 2017, 10:47 AM · Services (watching), Tracking, Patch-For-Review, WMDE-Analytics-Engineering, Operations, Graphite
Volans removed a project from T94457: Install nodejs, nginx and other dependencies on francium: Blocked-on-Operations.
Dec 18 2017, 10:47 AM · Operations, Patch-For-Review
Volans triaged T177397: Create scaffolding of services templates for deployment in production/staging as Normal priority.
Dec 18 2017, 10:36 AM · Patch-For-Review, Prod-Kubernetes, User-Joe, Operations, Kubernetes
Volans triaged T181971: Disable hiera autolookups as Normal priority.
Dec 18 2017, 10:35 AM · Patch-For-Review, User-Joe, Puppet, Operations
Volans triaged T179395: Cluster puppet variable and ganglia decommission as Normal priority.
Dec 18 2017, 10:33 AM · Patch-For-Review, monitoring, Operations
Volans triaged T181952: Requesting access to EventLogging data for Vinitha as Normal priority.
Dec 18 2017, 8:35 AM · Patch-For-Review, AICaptcha, WMF-NDA-Requests, Operations, Ops-Access-Requests
Volans triaged T182860: Allow contint-admins to interact with docker on CI hosts as Normal priority.
Dec 18 2017, 8:33 AM · Patch-For-Review, Ops-Access-Requests, Operations, Continuous-Integration-Infrastructure (shipyard), Release-Engineering-Team (Kanban)
Volans added a comment to T181121: Hardware errors on ganeti1005- ganeti1008.

Powercycled ganeti1005, unable to ssh, console unresponsive.

Dec 18 2017, 8:10 AM · ops-eqiad, Operations

Dec 13 2017

Volans added a comment to T156027: Configuration for Asia Cache DC hosts.

I just noticed that in late_command.sh we have a special case for cp[1234]* that I guess will need to be updated to include eqsin too.
Mentioning it here because it's not a common place to look for and might be missed.

Dec 13 2017, 10:40 AM · Patch-For-Review, Operations, Traffic

Dec 11 2017

Volans added a comment to T178177: Investigate aberrant Cassandra columnfamily read latency of restbase101{0,2,4}.

I'm sorry the test did't helped.
Digging a bit more it seems that the controller that we have (Smart Array P440ar) supports HBA mode (Host Bus Adapter), that, according to HP manual [1]:

Dec 11 2017, 11:51 PM · User-Eevans, Services (doing), Cassandra
Volans moved T182575: Cumin: PuppetDB backend, add support for API v4 from Backlog to In Progress on the Operations-Software-Development board.
Dec 11 2017, 2:19 PM · Puppet, Operations-Software-Development
Volans created T182575: Cumin: PuppetDB backend, add support for API v4.
Dec 11 2017, 12:37 PM · Puppet, Operations-Software-Development