ema (Emanuele Rocca)
Senior Site Reliability Engineer, Traffic Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Sep 29 2015, 8:49 PM (155 w, 3 d)
Availability
Available
IRC Nick
ema
LDAP User
Ema
MediaWiki User
Unknown

Recent Activity

Tue, Sep 18

ema closed T204600: Pass on name of the node serving ORES requests as response header to the user as Resolved.
$ curl -v https://ores.wikimedia.org/v3/scores/wikidatawiki/421063984/damaging 2>&1 | grep '< server'
< server: ores2006.codfw.wmnet
Tue, Sep 18, 11:25 AM · Patch-For-Review, Operations, Traffic, Scoring-platform-team, ORES
ema moved T204600: Pass on name of the node serving ORES requests as response header to the user from Caching to Watching on the Traffic board.
Tue, Sep 18, 10:13 AM · Patch-For-Review, Operations, Traffic, Scoring-platform-team, ORES
ema triaged T204600: Pass on name of the node serving ORES requests as response header to the user as Normal priority.
Tue, Sep 18, 9:34 AM · Patch-For-Review, Operations, Traffic, Scoring-platform-team, ORES
ema moved T204600: Pass on name of the node serving ORES requests as response header to the user from Triage to Caching on the Traffic board.
Tue, Sep 18, 9:34 AM · Patch-For-Review, Operations, Traffic, Scoring-platform-team, ORES

Mon, Sep 17

ema closed T164609: Merge cache_misc into cache_text functionally as Resolved.
Mon, Sep 17, 11:58 AM · Patch-For-Review, Operations, Traffic
ema moved T204355: Allow traffic team to manage the traffic blog on phame from Triage to General on the Traffic board.
Mon, Sep 17, 10:12 AM · Operations, Phabricator, Traffic
ema added a comment to T204355: Allow traffic team to manage the traffic blog on phame.
Mon, Sep 17, 10:11 AM · Operations, Phabricator, Traffic
ema moved T204365: Stop oversampling Asian countries from Triage to Watching on the Traffic board.
Mon, Sep 17, 9:59 AM · Patch-For-Review, Traffic, Operations, Performance-Team

Fri, Sep 14

ema updated the task description for T164609: Merge cache_misc into cache_text functionally.
Fri, Sep 14, 3:50 PM · Patch-For-Review, Operations, Traffic
ema triaged T204355: Allow traffic team to manage the traffic blog on phame as Normal priority.
Fri, Sep 14, 2:52 PM · Operations, Phabricator, Traffic
ema created T204355: Allow traffic team to manage the traffic blog on phame.
Fri, Sep 14, 2:52 PM · Operations, Phabricator, Traffic
ema moved T204232: Package and deploy ATS v8.x from Triage to Caching on the Traffic board.
Fri, Sep 14, 9:12 AM · Traffic, Operations
ema updated the task description for T164609: Merge cache_misc into cache_text functionally.
Fri, Sep 14, 8:59 AM · Patch-For-Review, Operations, Traffic
ema added a member for Traffic: jijiki.
Fri, Sep 14, 8:31 AM
ema updated subscribers of The Traffic Blog.
Fri, Sep 14, 8:28 AM · Traffic

Thu, Sep 13

ema updated the task description for T202966: Make cp1099 the new pinkunicorn.
Thu, Sep 13, 3:52 PM · Patch-For-Review, Operations, Traffic
ema created P7543 trafficserver-8.0.0-rc1 FTBFS.
Thu, Sep 13, 2:27 PM
ema updated the task description for T204232: Package and deploy ATS v8.x.
Thu, Sep 13, 12:41 PM · Traffic, Operations
ema triaged T204232: Package and deploy ATS v8.x as Normal priority.
Thu, Sep 13, 12:41 PM · Traffic, Operations
ema created T204232: Package and deploy ATS v8.x.
Thu, Sep 13, 12:40 PM · Traffic, Operations
ema moved T204225: ATS: log inspection at runtime from Triage to Caching on the Traffic board.
Thu, Sep 13, 12:15 PM · Operations, Traffic
ema triaged T204225: ATS: log inspection at runtime as Normal priority.
Thu, Sep 13, 12:15 PM · Operations, Traffic
ema created T204225: ATS: log inspection at runtime.
Thu, Sep 13, 12:15 PM · Operations, Traffic
ema moved T202479: Investigate source of 404 Not Found responses from load.php from Triage to Caching on the Traffic board.
Thu, Sep 13, 12:02 PM · Operations, Traffic, Performance-Team, MediaWiki-ResourceLoader
ema added a comment to T202479: Investigate source of 404 Not Found responses from load.php.
  1. Hostnames we route to text-lb that Varnish doesn't recognise (receives varnish-generated errorpage with 404, "Domain not served here")
Thu, Sep 13, 12:00 PM · Operations, Traffic, Performance-Team, MediaWiki-ResourceLoader
ema moved T204209: Define and deploy Icinga checks for ATS backends from Triage to Caching on the Traffic board.
Thu, Sep 13, 11:33 AM · Traffic, Operations
ema moved T204208: puppetize http purging for ATS backends from Triage to Caching on the Traffic board.
Thu, Sep 13, 11:33 AM · Operations, Traffic
ema triaged T204209: Define and deploy Icinga checks for ATS backends as Normal priority.
Thu, Sep 13, 11:33 AM · Traffic, Operations
ema created T204209: Define and deploy Icinga checks for ATS backends.
Thu, Sep 13, 11:33 AM · Traffic, Operations
ema triaged T204208: puppetize http purging for ATS backends as Normal priority.
Thu, Sep 13, 11:22 AM · Operations, Traffic
ema created T204208: puppetize http purging for ATS backends.
Thu, Sep 13, 11:22 AM · Operations, Traffic

Wed, Sep 12

ema moved T204056: Move wikimedia.ee under WM-EE from Triage to DNS Names on the Traffic board.
Wed, Sep 12, 9:49 AM · WMF-Legal, Patch-For-Review, Operations, Domains, Traffic
ema awarded T204110: Add favicon to icinga and tendril a Love token.
Wed, Sep 12, 9:46 AM · Operations
ema added a comment to T200822: Remove webrequest misc analytics related jobs and code after cache misc -> text merge is complete.

@ema just to be sure, can you confirm that cache misc is gone and that we can get rid of all our data processing for it?

Wed, Sep 12, 9:08 AM · Analytics-Kanban, Patch-For-Review, Analytics

Tue, Sep 11

ema closed T96853: Evaluate Apache Traffic Server as Resolved.

This can be closed now that we have: deployed two test clusters running ATS and routing traffic to all our applications, gained basic operational experience with it, verified that with PURGE traffic peaks of 12K requests per second, resource usage stays reasonable.

Tue, Sep 11, 8:53 AM · Operations, Traffic
ema closed T96853: Evaluate Apache Traffic Server, a subtask of T199720: Deploy initial ATS test clusters in core DCs , as Resolved.
Tue, Sep 11, 8:53 AM · Patch-For-Review, Operations, Traffic
ema moved T204013: Horizon Designate dashboard not allowing creation of NS records from Triage to Watching on the Traffic board.
Tue, Sep 11, 8:32 AM · Operations, Traffic, Upstream, Horizon
ema triaged T204013: Horizon Designate dashboard not allowing creation of NS records as Normal priority.
Tue, Sep 11, 8:32 AM · Operations, Traffic, Upstream, Horizon

Mon, Sep 10

ema closed T199720: Deploy initial ATS test clusters in core DCs as Resolved.

Request routing to all current applications added. Closing!

Mon, Sep 10, 3:56 PM · Patch-For-Review, Operations, Traffic
ema updated the task description for T199720: Deploy initial ATS test clusters in core DCs .
Mon, Sep 10, 3:55 PM · Patch-For-Review, Operations, Traffic

Wed, Sep 5

ema created P7515 trafficserver-wmf.service.
Wed, Sep 5, 2:59 PM
ema created P7514 trafficserver-default.service.
Wed, Sep 5, 2:59 PM
ema moved T156462: Framework to transfer files over the LAN from Triage to Watching on the Traffic board.
Wed, Sep 5, 8:09 AM · Patch-For-Review, Operations, Traffic, DBA
ema moved T191183: Enable avatars in gerrit from Triage to Watching on the Traffic board.
Wed, Sep 5, 8:08 AM · Operations, Traffic, Patch-For-Review, Gerrit
ema moved T202564: https://sv.wikipedia.beta.wmflabs.org/ has invalid certificate from Triage to TLS on the Traffic board.
Wed, Sep 5, 8:06 AM · Operations, Traffic, HTTPS, Beta-Cluster-Infrastructure
ema moved T170606: Add Accept header to webrequest logs from Triage to Watching on the Traffic board.
Wed, Sep 5, 8:05 AM · Operations, Traffic, Services (blocked), Analytics
ema moved T203396: certcentral: challenge checking on *all* pooled backend hosts from Triage to TLS on the Traffic board.
Wed, Sep 5, 8:05 AM · Traffic, Operations
ema moved T203423: certcentral: Provide script for certificate revocation from Triage to TLS on the Traffic board.
Wed, Sep 5, 8:05 AM · Traffic, Operations

Mon, Sep 3

ema moved T203194: cp1080 - kernel / bnxt_en failures from Triage to Caching on the Traffic board.
Mon, Sep 3, 11:40 AM · ops-eqiad, Traffic, Operations
ema moved T203191: prometheus-varnish-exporter@frontend.service: Unit entered failed state - invalid character 'C' from Triage to Caching on the Traffic board.
Mon, Sep 3, 11:36 AM · Traffic, Operations
ema triaged T203191: prometheus-varnish-exporter@frontend.service: Unit entered failed state - invalid character 'C' as Normal priority.
Mon, Sep 3, 11:36 AM · Traffic, Operations

Tue, Aug 28

ema closed T200445: Upgrade cache servers to stretch as Resolved.

The only cache host running jessie is cp1008, which will be replaced soon by cp1099: T202966.

Tue, Aug 28, 10:11 AM · Patch-For-Review, Operations, Traffic
ema moved T202966: Make cp1099 the new pinkunicorn from Triage to Caching on the Traffic board.
Tue, Aug 28, 7:36 AM · Patch-For-Review, Operations, Traffic
ema triaged T202966: Make cp1099 the new pinkunicorn as Normal priority.
Tue, Aug 28, 7:36 AM · Patch-For-Review, Operations, Traffic
ema created T202966: Make cp1099 the new pinkunicorn.
Tue, Aug 28, 7:36 AM · Patch-For-Review, Operations, Traffic
ema moved T202682: Improve Accept header normalization in VCL for REST API from Triage to Caching on the Traffic board.
Tue, Aug 28, 7:19 AM · Services (done), RESTBase-API, Parsing-Team, Traffic, RESTBase, Operations
ema created P7488 26-restbase-accept-ignore-semver.vtc.
Tue, Aug 28, 7:12 AM

Fri, Aug 24

ema triaged T199252: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest as Normal priority.
Fri, Aug 24, 11:00 AM · Patch-For-Review, SEO, Performance-Team, Operations, Traffic, Wikimedia-General-or-Unknown
ema triaged T202564: https://sv.wikipedia.beta.wmflabs.org/ has invalid certificate as Normal priority.
Fri, Aug 24, 11:00 AM · Operations, Traffic, HTTPS, Beta-Cluster-Infrastructure
ema triaged T117618: Add restrictive CSP to upload.wikimedia.org as Normal priority.
Fri, Aug 24, 11:00 AM · Wikimedia-General-or-Unknown, Traffic, Operations, Security-Team

Aug 23 2018

ema moved T202627: cp3036 PS Redundancy Lost from Triage to Hardware on the Traffic board.
Aug 23 2018, 2:15 PM · Traffic, ops-esams, Operations
ema triaged T202627: cp3036 PS Redundancy Lost as Normal priority.
Aug 23 2018, 2:15 PM · Traffic, ops-esams, Operations
ema created T202627: cp3036 PS Redundancy Lost.
Aug 23 2018, 2:15 PM · Traffic, ops-esams, Operations
ema updated the task description for T202381: Traffic Server - Prometheus integration.
Aug 23 2018, 9:28 AM · Patch-For-Review, Operations, Traffic

Aug 22 2018

ema created P7473 (An Untitled Masterwork).
Aug 22 2018, 1:42 PM

Aug 21 2018

ema triaged T202397: ms-be2020 crashed as Normal priority.
Aug 21 2018, 12:45 PM · media-storage, Operations
ema created T202397: ms-be2020 crashed.
Aug 21 2018, 12:44 PM · media-storage, Operations
ema moved T202381: Traffic Server - Prometheus integration from Triage to Caching on the Traffic board.
Aug 21 2018, 11:58 AM · Patch-For-Review, Operations, Traffic
ema updated the task description for T202381: Traffic Server - Prometheus integration.
Aug 21 2018, 11:57 AM · Patch-For-Review, Operations, Traffic
ema triaged T202381: Traffic Server - Prometheus integration as Normal priority.
Aug 21 2018, 11:55 AM · Patch-For-Review, Operations, Traffic
ema created T202381: Traffic Server - Prometheus integration.
Aug 21 2018, 11:55 AM · Patch-For-Review, Operations, Traffic

Aug 20 2018

ema created P7467 traffic_server -C check.
Aug 20 2018, 3:23 PM

Aug 16 2018

ema added a project to T202046: cp3032 PS Redundancy Lost: ops-esams.
Aug 16 2018, 9:20 AM · ops-esams, Operations, Traffic
ema moved T202046: cp3032 PS Redundancy Lost from Triage to Hardware on the Traffic board.
Aug 16 2018, 9:19 AM · ops-esams, Operations, Traffic
ema triaged T202046: cp3032 PS Redundancy Lost as Normal priority.
Aug 16 2018, 9:19 AM · ops-esams, Operations, Traffic
ema created T202046: cp3032 PS Redundancy Lost.
Aug 16 2018, 9:19 AM · ops-esams, Operations, Traffic
ema closed T201986: cassandra-a instance on aqs1007 is not starting as Resolved.

@Joe removed the log and restarted cassandra-a. The service seems now to be working fine.

Aug 16 2018, 6:52 AM · Cassandra, Operations

Aug 15 2018

ema added a comment to T201986: cassandra-a instance on aqs1007 is not starting.

It looks like the host is up only since ~6 hours, and cassandra-a never actually managed to start.

root@aqs1007:~# uptime ; date ; journalctl -u cassandra-a.service | head
 08:20:20 up  6:39,  1 user,  load average: 1.22, 1.69, 2.13
Wed Aug 15 08:20:20 UTC 2018
-- Logs begin at Wed 2018-08-15 01:40:36 UTC, end at Wed 2018-08-15 08:20:15 UTC. --
Aug 15 01:41:20 aqs1007 systemd[1]: Started distributed storage system for structured data.
Aug 15 01:43:33 aqs1007 systemd[1]: cassandra-a.service: Main process exited, code=exited, status=100/n/a
Aug 15 01:43:33 aqs1007 systemd[1]: cassandra-a.service: Unit entered failed state.
Aug 15 01:43:33 aqs1007 systemd[1]: cassandra-a.service: Failed with result 'exit-code'.
Aug 15 02:07:18 aqs1007 systemd[1]: Started distributed storage system for structured data.
Aug 15 02:09:23 aqs1007 systemd[1]: cassandra-a.service: Main process exited, code=exited, status=100/n/a
Aug 15 02:09:23 aqs1007 systemd[1]: cassandra-a.service: Unit entered failed state.
Aug 15 02:09:23 aqs1007 systemd[1]: cassandra-a.service: Failed with result 'exit-code'.
Aug 15 02:36:37 aqs1007 systemd[1]: Started distributed storage system for structured data.
Aug 15 2018, 8:21 AM · Cassandra, Operations
ema triaged T201986: cassandra-a instance on aqs1007 is not starting as Normal priority.
Aug 15 2018, 8:16 AM · Cassandra, Operations
ema added a comment to T201952: operations-puppet:0.3.4 doesn't seem to be properly published.

One of the build failures caused by this is https://integration.wikimedia.org/ci/job/operations-puppet-tests-docker/26286/console

Aug 15 2018, 6:56 AM · Operations, docker-pkg
ema triaged T201952: operations-puppet:0.3.4 doesn't seem to be properly published as Normal priority.
Aug 15 2018, 6:52 AM · Operations, docker-pkg
ema moved T199252: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest from Triage to Caching on the Traffic board.
Aug 15 2018, 6:36 AM · Patch-For-Review, SEO, Performance-Team, Operations, Traffic, Wikimedia-General-or-Unknown
ema moved T201409: Harmonise the identification of requests across our stack from Triage to Caching on the Traffic board.
Aug 15 2018, 6:36 AM · Performance-Team (Radar), Patch-For-Review, Operations, Services (designing), TechCom-RFC, User-mobrovac, Traffic
ema moved T201666: cp3040: kernel crash in ipsec code shortly after reboot from Triage to Caching on the Traffic board.
Aug 15 2018, 6:35 AM · Operations, Traffic

Aug 14 2018

ema added a comment to T199720: Deploy initial ATS test clusters in core DCs .

Yay, dependancies.

Aug 14 2018, 5:01 PM · Patch-For-Review, Operations, Traffic
ema added a comment to T199720: Deploy initial ATS test clusters in core DCs .

Thanks @Reedy! The luarocks part fails with:

Aug 14 2018, 4:57 PM · Patch-For-Review, Operations, Traffic

Aug 13 2018

ema updated subscribers of T164609: Merge cache_misc into cache_text functionally.

Sometimes we get 503 peaks from a cache_misc application like phabricator or gerrit; knowing the origin of the 5xxs in broad categories ("public traffic for the sites" vs "miscellanea") was very useful IMHO; do we have a way to preserve such information?

Aug 13 2018, 8:45 AM · Patch-For-Review, Operations, Traffic
ema closed T193865: Enable numa_networking on all caches as Resolved.
Aug 13 2018, 8:15 AM · Patch-For-Review, Operations, Traffic

Aug 12 2018

ema created P7451 maps-tiles-ban.sh.
Aug 12 2018, 10:37 AM
ema closed T201737: docker-registry is returnning HTTP 403 Forbidden for all requests as Resolved.

@Legoktm confirmed that the issue is now solved, closing.

Aug 12 2018, 9:16 AM · Patch-For-Review, Continuous-Integration-Infrastructure, Operations
ema added a comment to T201737: docker-registry is returnning HTTP 403 Forbidden for all requests.

This is due to the move of cache_misc sites to cache_text T164609.

Aug 12 2018, 8:55 AM · Patch-For-Review, Continuous-Integration-Infrastructure, Operations
ema triaged T201769: Significant increase in Time To First Byte on 2018-08-08, between 16:00 and 20:00 UTC as Normal priority.

The increase is very visible in the tests performed from us-east, not so much from Dulles. I'm not aware of the specifics about how this test is performed, but maybe check the differences between the two testing setups?

Aug 12 2018, 8:44 AM · Operations, Traffic, Performance-Team

Aug 10 2018

ema added a comment to T196336: Icinga passive checks go awal and downtime stops working.

Mentioned in SAL (#wikimedia-operations) [2018-08-10T12:18:19Z] <gehel> restarting icinga on einsteinium - T196336

Aug 10 2018, 12:30 PM · Icinga, monitoring
ema triaged T201666: cp3040: kernel crash in ipsec code shortly after reboot as Normal priority.
Aug 10 2018, 8:19 AM · Operations, Traffic
ema created T201666: cp3040: kernel crash in ipsec code shortly after reboot.
Aug 10 2018, 8:19 AM · Operations, Traffic
ema updated the task description for T164609: Merge cache_misc into cache_text functionally.
Aug 10 2018, 7:10 AM · Patch-For-Review, Operations, Traffic
ema moved T102099: Fix IPv6 autoconf issues once and for all, across the fleet. from Triage to Network on the Traffic board.
Aug 10 2018, 6:37 AM · Traffic, netops, Operations, IPv6
ema moved T201630: False alarms on varnish-http-requests 70% GET drop in 30 min alert from Triage to Caching on the Traffic board.
Aug 10 2018, 6:25 AM · Patch-For-Review, Traffic, monitoring, Operations

Aug 9 2018

ema updated the task description for T164609: Merge cache_misc into cache_text functionally.
Aug 9 2018, 4:55 PM · Patch-For-Review, Operations, Traffic
ema added a project to T201630: False alarms on varnish-http-requests 70% GET drop in 30 min alert: Traffic.
Aug 9 2018, 4:54 PM · Patch-For-Review, Traffic, monitoring, Operations