ema (Emanuele Rocca)
WMF Operations Engineer

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Sep 29 2015, 8:49 PM (119 w, 6 d)
Availability
Available
IRC Nick
ema
LDAP User
Ema
MediaWiki User
Unknown

Recent Activity

Yesterday

ema created P6585 (An Untitled Masterwork).
Mon, Jan 15, 11:10 AM
ema moved T184448: Upgrade cache_text to Varnish 5 from Triage to Caching on the Traffic board.
Mon, Jan 15, 10:16 AM · Performance-Team (Radar), Traffic, Operations
ema moved T184715: pybal's "can-depool" logic only takes downServers into account from Triage to LoadBalancer on the Traffic board.
Mon, Jan 15, 10:16 AM · Patch-For-Review, Pybal, Traffic, Operations
ema moved T184721: Alert instrumentation returning 500 errors from Triage to LoadBalancer on the Traffic board.
Mon, Jan 15, 10:16 AM · Patch-For-Review, Pybal, Traffic, Operations

Fri, Jan 12

ema added a comment to T184715: pybal's "can-depool" logic only takes downServers into account.

pooled/not pooled: the status of the server in the ipvs pool

Fri, Jan 12, 11:18 AM · Patch-For-Review, Pybal, Traffic, Operations

Thu, Jan 11

ema moved T184715: pybal's "can-depool" logic only takes downServers into account from Backlog to In Progress on the Pybal board.
Thu, Jan 11, 3:11 PM · Patch-For-Review, Pybal, Traffic, Operations
ema moved T184721: Alert instrumentation returning 500 errors from Backlog to In Progress on the Pybal board.
Thu, Jan 11, 3:11 PM · Patch-For-Review, Pybal, Traffic, Operations
ema edited projects for T184715: pybal's "can-depool" logic only takes downServers into account, added: Pybal; removed Puppet.
Thu, Jan 11, 2:03 PM · Patch-For-Review, Pybal, Traffic, Operations
ema triaged T184721: Alert instrumentation returning 500 errors as High priority.
Thu, Jan 11, 1:58 PM · Patch-For-Review, Pybal, Traffic, Operations
ema created T184721: Alert instrumentation returning 500 errors.
Thu, Jan 11, 1:58 PM · Patch-For-Review, Pybal, Traffic, Operations
ema triaged T184715: pybal's "can-depool" logic only takes downServers into account as High priority.
Thu, Jan 11, 1:19 PM · Patch-For-Review, Pybal, Traffic, Operations
ema created T184715: pybal's "can-depool" logic only takes downServers into account.
Thu, Jan 11, 1:18 PM · Patch-For-Review, Pybal, Traffic, Operations

Wed, Jan 10

ema created P6568 (An Untitled Masterwork).
Wed, Jan 10, 4:42 PM

Tue, Jan 9

ema closed T168619: Degraded RAID on lvs3001 as Resolved.

Disk replaced today, raid rebuilt.

Tue, Jan 9, 5:20 PM · ops-esams, Operations
ema closed T168619: Degraded RAID on lvs3001, a subtask of T166965: Degraded RAID on lvs3001, as Resolved.
Tue, Jan 9, 5:20 PM · Traffic, ops-esams, Operations
ema closed T166965: Degraded RAID on lvs3001 as Resolved.

Disk replaced today, raid rebuilt.

Tue, Jan 9, 5:20 PM · Traffic, ops-esams, Operations
ema created P6563 lvs1007-70-persistent-net.rules.
Tue, Jan 9, 10:58 AM

Mon, Jan 8

ema triaged T184448: Upgrade cache_text to Varnish 5 as Normal priority.
Mon, Jan 8, 4:18 PM · Performance-Team (Radar), Traffic, Operations
ema moved T184293: rack/setup/install lvs101[3-6] from Triage to Hardware on the Traffic board.
Mon, Jan 8, 9:20 AM · Patch-For-Review, ops-eqiad, Operations, Traffic
ema moved T184255: Lower varnish caching length on doc.wikimedia.org from Triage to Watching on the Traffic board.
Mon, Jan 8, 9:20 AM · Patch-For-Review, Traffic, Operations, Continuous-Integration-Infrastructure
ema triaged T184255: Lower varnish caching length on doc.wikimedia.org as Normal priority.
Mon, Jan 8, 9:19 AM · Patch-For-Review, Traffic, Operations, Continuous-Integration-Infrastructure
ema added a comment to T184255: Lower varnish caching length on doc.wikimedia.org.

Yes Apache should send the Cache-Control header for that purpose. Eg:

Mon, Jan 8, 9:19 AM · Patch-For-Review, Traffic, Operations, Continuous-Integration-Infrastructure

Fri, Jan 5

ema assigned T184196: cp1066's DRAC not responding to SSH to Cmjohnson.
Fri, Jan 5, 2:32 PM · Operations, ops-eqiad, DC-Ops

Thu, Jan 4

ema updated subscribers of T184196: cp1066's DRAC not responding to SSH.
Thu, Jan 4, 4:33 PM · Operations, ops-eqiad, DC-Ops
ema updated the task description for T184196: cp1066's DRAC not responding to SSH.
Thu, Jan 4, 4:30 PM · Operations, ops-eqiad, DC-Ops
ema triaged T184196: cp1066's DRAC not responding to SSH as Normal priority.
Thu, Jan 4, 4:29 PM · Operations, ops-eqiad, DC-Ops
ema created T184196: cp1066's DRAC not responding to SSH.
Thu, Jan 4, 4:29 PM · Operations, ops-eqiad, DC-Ops
ema created P6530 cp1066 'mc info'.
Thu, Jan 4, 4:16 PM

Wed, Jan 3

ema moved T183902: Swift invalid range requests causing 501s from Triage to Watching on the Traffic board.
Wed, Jan 3, 8:50 AM · Traffic, media-storage, Operations
ema moved T183926: Limit http methods reported by varnishmtail from Triage to Caching on the Traffic board.
Wed, Jan 3, 8:49 AM · Patch-For-Review, Traffic, User-fgiunchedi, Goal, Operations

Tue, Jan 2

ema created P6512 (An Untitled Masterwork).
Tue, Jan 2, 3:32 PM

Tue, Dec 19

ema updated the task description for T177199: Add Prometheus client support for varnish/statsd metrics daemons.
Tue, Dec 19, 3:55 PM · Patch-For-Review, Traffic, User-fgiunchedi, Goal, Operations
ema added a comment to T183176: cp4032 memory error.

@RobH FYI I've ack'ed the Icinga alert of the host down and set it to downtime until Fri UTC morning.

Tue, Dec 19, 1:42 PM · Operations, Traffic, ops-ulsfo

Mon, Dec 18

ema created P6482 (An Untitled Masterwork).
Mon, Dec 18, 3:42 PM
ema triaged T182656: Integrate jessie 8.10 point release as Normal priority.
Mon, Dec 18, 2:40 PM · Operations
ema triaged T183146: Monitor resource usage on a per-cgroup basis as Normal priority.
Mon, Dec 18, 2:30 PM · Operations, monitoring
ema created T183146: Monitor resource usage on a per-cgroup basis.
Mon, Dec 18, 2:30 PM · Operations, monitoring
ema created P6480 (An Untitled Masterwork).
Mon, Dec 18, 2:12 PM
ema created P6479 (An Untitled Masterwork).
Mon, Dec 18, 2:10 PM

Dec 15 2017

ema updated the task description for T177199: Add Prometheus client support for varnish/statsd metrics daemons.
Dec 15 2017, 11:27 AM · Patch-For-Review, Traffic, User-fgiunchedi, Goal, Operations

Dec 4 2017

ema edited P6424 (An Untitled Masterwork).
Dec 4 2017, 4:07 PM
ema created P6424 (An Untitled Masterwork).
Dec 4 2017, 3:21 PM

Nov 30 2017

ema added a comment to T180998: Switch on http/2 in phabricator apache.

Note that varnish does not speak TLS, so no h2. I'm not sure how stable varnish's h2c support (HTTP/2 without TLS) is.

Nov 30 2017, 3:51 PM · Traffic, Operations, Phabricator
ema triaged T180998: Switch on http/2 in phabricator apache as Low priority.
Nov 30 2017, 3:48 PM · Traffic, Operations, Phabricator
ema triaged T180657: Purchase domains mediawi.ki and media.wiki to use as a url shortener as Normal priority.
Nov 30 2017, 3:45 PM · Operations, Traffic, Domains
ema moved T181315: load.php response taking 160s (of which only 0.031s in Apache) from Triage to Caching on the Traffic board.
Nov 30 2017, 3:44 PM · Patch-For-Review, Traffic, Performance-Team, Operations

Nov 28 2017

ema added a comment to T177199: Add Prometheus client support for varnish/statsd metrics daemons.

The approach we have in mind can roughly be summed up with varnishncsa | mtail. We can group the six scripts to be ported into frontend and backend ones:

Nov 28 2017, 9:42 PM · Patch-For-Review, Traffic, User-fgiunchedi, Goal, Operations

Nov 27 2017

ema added a comment to T181315: load.php response taking 160s (of which only 0.031s in Apache).

We've recently started logging requests taking longer than 60 seconds (from varnish's point of view) and sending the logs to logstash. Here are the logs relevant to load.php.

Nov 27 2017, 2:38 PM · Patch-For-Review, Traffic, Performance-Team, Operations
ema moved T180655: Phabricator and Gerrit: Improve the way that maintenance downtime is communicated to users. from Triage to Caching on the Traffic board.
Nov 27 2017, 8:50 AM · Traffic, periodic-update, Gerrit, Operations, Phabricator
ema moved T181368: Log source port for anonymous users and expose it for sysops/checkusers from Triage to Caching on the Traffic board.
Nov 27 2017, 8:47 AM · Operations, Traffic, CheckUser
ema added a comment to T181368: Log source port for anonymous users and expose it for sysops/checkusers.

<Krenair> I would expect $_SERVER['REMOTE_PORT'] to be useless inside WMF infrastructure

Nov 27 2017, 8:46 AM · Operations, Traffic, CheckUser
ema triaged T181315: load.php response taking 160s (of which only 0.031s in Apache) as Normal priority.
Nov 27 2017, 8:34 AM · Patch-For-Review, Traffic, Performance-Team, Operations
ema triaged T181368: Log source port for anonymous users and expose it for sysops/checkusers as Normal priority.
Nov 27 2017, 8:33 AM · Operations, Traffic, CheckUser

Nov 16 2017

ema moved T180712: VCL: handling of uncacheable responses in wikimedia-common from Triage to Caching on the Traffic board.
Nov 16 2017, 5:27 PM · Operations, Traffic
ema triaged T180712: VCL: handling of uncacheable responses in wikimedia-common as Normal priority.
Nov 16 2017, 5:27 PM · Operations, Traffic
ema created T180712: VCL: handling of uncacheable responses in wikimedia-common.
Nov 16 2017, 5:26 PM · Operations, Traffic

Nov 15 2017

ema renamed T180568: Aberrant load on instances involved in recent bootstrap from Abberant load on instances involved in recent bootstrap to Aberrant load on instances involved in recent bootstrap.
Nov 15 2017, 3:53 PM · Services (doing), User-Eevans, Cassandra, Operations

Nov 14 2017

ema edited P6104 cache-misc-labs-hiera.yaml.
Nov 14 2017, 3:28 PM
ema triaged T180257: Puppet / LVS: confusion in service vs IP name as Normal priority.
Nov 14 2017, 11:13 AM · Operations, Traffic
ema moved T180257: Puppet / LVS: confusion in service vs IP name from Triage to LoadBalancer on the Traffic board.
Nov 14 2017, 9:41 AM · Operations, Traffic
ema moved T180269: Wikimedia's recent upgrade to nginx v. 1.13.6 breaks older Android HTTP libraries from Triage to TLS on the Traffic board.
Nov 14 2017, 9:41 AM · Traffic, Wikimedia-General-or-Unknown, HTTPS, Operations
ema moved T180407: Change "CP" cookie from subdomain to project level from Triage to Caching on the Traffic board.
Nov 14 2017, 9:41 AM · Operations, Traffic
ema moved T180424: cp3048 crashed from Triage to Hardware on the Traffic board.
Nov 14 2017, 9:41 AM · Operations, ops-esams, Traffic
ema moved T180433: Upgrade cache_upload to Varnish 5 from Triage to Caching on the Traffic board.
Nov 14 2017, 9:41 AM · Performance-Team (Radar), Traffic, Operations
ema moved T180434: Uncacheable content handling: hfp vs hfm from Triage to Caching on the Traffic board.
Nov 14 2017, 9:41 AM · Patch-For-Review, Operations, Traffic
ema triaged T180434: Uncacheable content handling: hfp vs hfm as Normal priority.
Nov 14 2017, 9:41 AM · Patch-For-Review, Operations, Traffic
ema created T180434: Uncacheable content handling: hfp vs hfm.
Nov 14 2017, 9:41 AM · Patch-For-Review, Operations, Traffic
ema created T180433: Upgrade cache_upload to Varnish 5.
Nov 14 2017, 9:32 AM · Performance-Team (Radar), Traffic, Operations

Nov 13 2017

ema updated subscribers of T180329: Add CI to all operations/software/varnish/* repositories and archive obsolete ones.

I've updated the task description with comments about all repos. They're all debian packages with the exception of varnishkafka/testing.

Nov 13 2017, 1:13 PM · Operations, Traffic, Continuous-Integration-Config
ema updated the task description for T180329: Add CI to all operations/software/varnish/* repositories and archive obsolete ones.
Nov 13 2017, 1:08 PM · Operations, Traffic, Continuous-Integration-Config
ema triaged T180329: Add CI to all operations/software/varnish/* repositories and archive obsolete ones as Normal priority.
Nov 13 2017, 12:59 PM · Operations, Traffic, Continuous-Integration-Config
ema triaged T180179: Evaluate the possibility to add Juniper images to Openstack as Normal priority.
Nov 13 2017, 12:58 PM · cloud-services-team (Kanban), Cloud-VPS, netops, Traffic, Operations
ema moved T180329: Add CI to all operations/software/varnish/* repositories and archive obsolete ones from Triage to Caching on the Traffic board.
Nov 13 2017, 12:54 PM · Operations, Traffic, Continuous-Integration-Config

Nov 10 2017

ema moved T172459: eqiad row D switch upgrade from General to Network on the Traffic board.
Nov 10 2017, 4:38 PM · Patch-For-Review, Operations, netops, Traffic
ema moved T180179: Evaluate the possibility to add Juniper images to Openstack from General to Network on the Traffic board.
Nov 10 2017, 4:37 PM · cloud-services-team (Kanban), Cloud-VPS, netops, Traffic, Operations
ema moved T180179: Evaluate the possibility to add Juniper images to Openstack from Triage to General on the Traffic board.
Nov 10 2017, 4:37 PM · cloud-services-team (Kanban), Cloud-VPS, netops, Traffic, Operations
ema moved T180178: Request increased quota for traffic Cloud VPS project from Triage to General on the Traffic board.
Nov 10 2017, 4:36 PM · netops, Traffic, Cloud-VPS (Quota-requests), Operations
ema triaged T180178: Request increased quota for traffic Cloud VPS project as Normal priority.
Nov 10 2017, 4:36 PM · netops, Traffic, Cloud-VPS (Quota-requests), Operations
ema moved T180256: authdns prometheus metrics are not available anymore from Triage to DNS Infra on the Traffic board.
Nov 10 2017, 4:35 PM · Patch-For-Review, monitoring, Prometheus-metrics-monitoring, Operations, Traffic
ema triaged T180256: authdns prometheus metrics are not available anymore as Normal priority.
Nov 10 2017, 4:34 PM · Patch-For-Review, monitoring, Prometheus-metrics-monitoring, Operations, Traffic
ema created T180256: authdns prometheus metrics are not available anymore.
Nov 10 2017, 4:34 PM · Patch-For-Review, monitoring, Prometheus-metrics-monitoring, Operations, Traffic

Nov 9 2017

ema triaged T158604: Investigate usefulness of SameSite cookies for logged-in accounts as Normal priority.
Nov 9 2017, 7:40 AM · Operations, Traffic, Security-Core, MediaWiki-Authentication-and-authorization
ema added a comment to T178567: Server error (500) while trying to download files from Commons from PAWS.

Anything else left to do here? Is the problem solved for you @Chicocvenancio?

Nov 9 2017, 7:38 AM · Patch-For-Review, media-storage, Traffic, Operations, Pywikibot-Commons, PAWS
ema triaged T178567: Server error (500) while trying to download files from Commons from PAWS as Normal priority.
Nov 9 2017, 7:35 AM · Patch-For-Review, media-storage, Traffic, Operations, Pywikibot-Commons, PAWS
ema triaged T179026: LVS IPv6 IPs should all be recorded in DNS as Normal priority.
Nov 9 2017, 7:29 AM · Operations, Traffic
ema triaged T176875: Allow access to wdqs.svc.eqiad.wmnet on port 8888 as Normal priority.
Nov 9 2017, 7:29 AM · Traffic, Wikidata-Query-Service, Operations, WMDE-Analytics-Engineering, User-Addshore, Wikidata, Discovery
ema removed a project from T178778: Parsoid, VisualEditor not working with SSL / HTTPS: Traffic.
Nov 9 2017, 7:28 AM · Operations, HTTPS, Parsoid, VisualEditor
ema moved T179953: cp3043 disk failure from Caching to Hardware on the Traffic board.
Nov 9 2017, 7:22 AM · Traffic, Operations, ops-esams
ema moved T179050: setup bast4002/WMF7218 from Triage to Watching on the Traffic board.
Nov 9 2017, 7:20 AM · Traffic, Operations, ops-ulsfo
ema moved T179204: setup/deploy dns400[12]/wmf721[56] from Triage to Watching on the Traffic board.
Nov 9 2017, 7:20 AM · Traffic, Operations, ops-ulsfo
ema moved T177742: Investigate Chrony as a replacement for ISC ntpd from Triage to General on the Traffic board.
Nov 9 2017, 7:19 AM · Traffic, Operations
ema moved T180069: Pybal should be able to advertise to multiple routers from Triage to LoadBalancer on the Traffic board.
Nov 9 2017, 7:14 AM · Patch-For-Review, Pybal, Operations, Traffic
ema changed the profile image for blog The Traffic Blog.
Nov 9 2017, 7:12 AM · Traffic

Nov 8 2017

ema triaged T180041: Please create a phame blog for the Traffic team as Normal priority.
Nov 8 2017, 3:19 PM · RelEng-Archive-FY201718-Q2, User-greg, Operations, Traffic, Phabricator
ema created T180041: Please create a phame blog for the Traffic team.
Nov 8 2017, 3:19 PM · RelEng-Archive-FY201718-Q2, User-greg, Operations, Traffic, Phabricator
ema moved T179953: cp3043 disk failure from Triage to Caching on the Traffic board.
Nov 8 2017, 10:35 AM · Traffic, Operations, ops-esams

Nov 7 2017

ema moved T179197: Investigate what caused the the unattended varnish upgrade in Beta Cluster from Triage to Caching on the Traffic board.
Nov 7 2017, 8:07 AM · Release-Engineering-Team (Someday), Traffic, Operations, Beta-Cluster-Infrastructure
ema moved T179156: 503 spikes and resulting API slowness starting 18:45 October 26 from Triage to Caching on the Traffic board.
Nov 7 2017, 8:07 AM · Release-Engineering-Team (Watching / External), Patch-For-Review, Traffic, Wikimedia-Incident, Operations, ORES, Wikidata, Scoring-platform-team
ema closed T63782: Add varnish logs to logstash as Resolved.

Done.

Nov 7 2017, 7:00 AM · Patch-For-Review, Traffic, Operations, Wikimedia-Logstash
ema closed T63782: Add varnish logs to logstash, a subtask of T63779: Add system logs to logstash (tracking), as Resolved.
Nov 7 2017, 7:00 AM · Tracking, Wikimedia-Logstash