Page MenuHomePhabricator

ema (Emanuele Rocca)
Senior Site Reliability Engineer, Traffic Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Sep 29 2015, 8:49 PM (206 w, 4 d)
Availability
Away Away until Sep 30.
IRC Nick
ema
LDAP User
Ema
MediaWiki User
Unknown

Recent Activity

Wed, Sep 11

ema updated the task description for T210411: Applayer services without TLS.
Wed, Sep 11, 10:14 AM · Patch-For-Review, serviceops, Operations, Traffic
ema added a comment to T189333: Changing Kibana filters is ridiculously slow.

I re-ran my analysis today, and oddly enough the total number of fields it not only similar but equal to the number of fields there were three months ago. Currently at 7,665 table columns.
@ema @fgiunchedi Has the above change been deployed?

Wed, Sep 11, 9:14 AM · User-fgiunchedi, observability, Operations, Traffic, User-Addshore, Wikimedia-Logstash
ema renamed T232574: Alert in case of significant discrepancies between the number of nginx and varnish responses from varnish request rates showed a spike up while nginx request rates didn't to Alert in case of significant discrepancies between the number of nginx and varnish responses.
Wed, Sep 11, 9:07 AM · Operations, observability, Traffic

Tue, Sep 10

ema added a project to T232453: Cookies and misc services caching: Analytics.
Tue, Sep 10, 10:35 AM · Analytics, Operations, Traffic
ema closed T230772: Piwik JS isn't cached as Resolved.

This should probably be its own task, though, it's not specific to piwik.js

Tue, Sep 10, 10:35 AM · Analytics-Kanban, Performance-Team (Radar), Traffic, Operations, Analytics-Wikistats, Analytics
ema closed T230772: Piwik JS isn't cached, a subtask of T230708: Performance review of new foundation website design, as Resolved.
Tue, Sep 10, 10:35 AM · Performance-Team
ema triaged T232453: Cookies and misc services caching as Normal priority.
Tue, Sep 10, 10:34 AM · Analytics, Operations, Traffic
ema created T232453: Cookies and misc services caching.
Tue, Sep 10, 10:34 AM · Analytics, Operations, Traffic

Mon, Sep 9

ema committed rLPRI9379d6ab2f6e: secret: dummy key for etherpad (authored by ema).
secret: dummy key for etherpad
Mon, Sep 9, 2:52 PM
ema updated the task description for T210411: Applayer services without TLS.
Mon, Sep 9, 1:03 PM · Patch-For-Review, serviceops, Operations, Traffic
ema updated the task description for T210411: Applayer services without TLS.
Mon, Sep 9, 12:59 PM · Patch-For-Review, serviceops, Operations, Traffic
ema updated the task description for T210411: Applayer services without TLS.
Mon, Sep 9, 12:55 PM · Patch-For-Review, serviceops, Operations, Traffic
ema updated subscribers of T232319: PyBal ProxyFetch checks using HTTP/1.0 with https and HTTP/1.1 with plain http.
Mon, Sep 9, 10:32 AM · Traffic, Operations
ema moved T232319: PyBal ProxyFetch checks using HTTP/1.0 with https and HTTP/1.1 with plain http from Triage to LoadBalancer on the Traffic board.
Mon, Sep 9, 10:06 AM · Traffic, Operations
ema triaged T232319: PyBal ProxyFetch checks using HTTP/1.0 with https and HTTP/1.1 with plain http as Normal priority.
Mon, Sep 9, 10:06 AM · Traffic, Operations
ema created T232319: PyBal ProxyFetch checks using HTTP/1.0 with https and HTTP/1.1 with plain http.
Mon, Sep 9, 10:05 AM · Traffic, Operations
ema closed T228629: ATS Backends: Test live cache_text traffic , a subtask of T227432: Replace Varnish backends with ATS on cache text nodes, as Resolved.
Mon, Sep 9, 7:47 AM · Patch-For-Review, Traffic, Operations
ema closed T228629: ATS Backends: Test live cache_text traffic as Resolved.

cp1075 has been serving live production traffic for several days now, we can consider the test successful.

Mon, Sep 9, 7:47 AM · Patch-For-Review, Goal, Traffic, Operations

Thu, Sep 5

ema triaged T230687: Decide/document criteria needed to serve acme-chief LE issued unified certificate to end users as Normal priority.
Thu, Sep 5, 3:20 PM · Operations, Traffic, Acme-chief
ema triaged T231286: Track TLS related ATS metrics in prometheus as Normal priority.
Thu, Sep 5, 3:20 PM · Traffic, Operations

Wed, Sep 4

ema added a comment to T230772: Piwik JS isn't cached.

I'm guessing it might be coming from the cookies? Which the Chrome developer tools weren't showing. We've had this issue before on the performance site, where being logged into CentralAuth prevented Varnish caching from kicking in, due to a VCL rule somewhere. Meaning that us (staff members) tend to see a different caching situation than most visitors.
I think it's the same here, the Varnish caching for this particular asset should always work, regardless of cookies received. Especially since we're talking about loggedin status on other domains.

Wed, Sep 4, 1:53 PM · Analytics-Kanban, Performance-Team (Radar), Traffic, Operations, Analytics-Wikistats, Analytics
ema added a comment to T230772: Piwik JS isn't cached.

I can see the cache-control: max-age=604800 , I think @ema needs to change something on his end so varnish /ATS settings apply?

Wed, Sep 4, 9:13 AM · Analytics-Kanban, Performance-Team (Radar), Traffic, Operations, Analytics-Wikistats, Analytics
ema moved T220022: Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" from Triage to Watching on the Traffic board.
Wed, Sep 4, 7:46 AM · Performance-Team (Radar), Traffic, Operations
ema moved T228433: server-cache did neither update on uploading nor with ?action=purge from Triage to Caching on the Traffic board.
Wed, Sep 4, 7:46 AM · Traffic, Operations, MediaWiki-File-management, Commons, Multimedia
ema moved T220085: Getting registry metadata from a public client fails on our registry from Triage to Caching on the Traffic board.
Wed, Sep 4, 7:40 AM · Traffic, docker-pkg, Operations, serviceops
ema added a comment to T220085: Getting registry metadata from a public client fails on our registry.

It seems that CL is returned properly now:

Wed, Sep 4, 7:40 AM · Traffic, docker-pkg, Operations, serviceops

Tue, Sep 3

ema added a comment to T231422: Cannot download STL files due to network error.

@Gilles: the issue should now be fixed, can you confirm?

Tue, Sep 3, 3:12 PM · Operations, Traffic
ema merged task T231753: Downloading the original SVG of a file on Commons serves a truncated stream into T231422: Cannot download STL files due to network error.
Tue, Sep 3, 1:47 PM · Traffic, Operations, Commons
ema merged T231753: Downloading the original SVG of a file on Commons serves a truncated stream into T231422: Cannot download STL files due to network error.
Tue, Sep 3, 1:47 PM · Operations, Traffic
ema triaged T231753: Downloading the original SVG of a file on Commons serves a truncated stream as Normal priority.
Tue, Sep 3, 1:47 PM · Traffic, Operations, Commons
ema added a comment to T231422: Cannot download STL files due to network error.

The issue happens due to varnish-frontend giving up the fetch from ATS because of lack of free space:

Tue, Sep 3, 10:50 AM · Operations, Traffic
ema added a comment to T231620: https://en.wikipedia.org/wiki/Heteromyidae shows the mobile version on desktop.

This should now be fixed. Please let me know if that's not the case!

Tue, Sep 3, 9:02 AM · Operations, Traffic, MobileFrontend, Mobile
ema added a comment to T231504: Unexpectedly received mobile version of an article while logged out.

This should now be fixed. Please let me know if that's not the case!

Tue, Sep 3, 9:02 AM · Operations, Traffic
ema triaged T231620: https://en.wikipedia.org/wiki/Heteromyidae shows the mobile version on desktop as Normal priority.
Tue, Sep 3, 9:01 AM · Operations, Traffic, MobileFrontend, Mobile
ema added a comment to T230772: Piwik JS isn't cached.

@ema So I understand: caching pass needs to be removed

Tue, Sep 3, 8:28 AM · Analytics-Kanban, Performance-Team (Radar), Traffic, Operations, Analytics-Wikistats, Analytics

Fri, Aug 30

ema moved T231422: Cannot download STL files due to network error from Triage to Caching on the Traffic board.
Fri, Aug 30, 8:00 AM · Operations, Traffic
ema claimed T231422: Cannot download STL files due to network error.
Fri, Aug 30, 8:00 AM · Operations, Traffic

Thu, Aug 29

ema claimed T231504: Unexpectedly received mobile version of an article while logged out.
Thu, Aug 29, 1:22 PM · Operations, Traffic
ema added a comment to T231504: Unexpectedly received mobile version of an article while logged out.

Thanks for filing this bug and for providing all request/response headers, very useful!

Thu, Aug 29, 11:19 AM · Operations, Traffic
ema triaged T231504: Unexpectedly received mobile version of an article while logged out as High priority.
Thu, Aug 29, 11:08 AM · Operations, Traffic
ema triaged T231525: cp1085 - IPMI not working as Normal priority.
Thu, Aug 29, 10:50 AM · ops-eqiad, Traffic, Operations
ema triaged T231533: Improve ATS prometheus metrics as Normal priority.
Thu, Aug 29, 10:50 AM · Operations, Traffic
ema closed T231388: Error pulling image from docker registry as Resolved.

We have managed to generate a proper certificate for the docker-registry origin servers, and cp1075 is now back to using TLS to connect to them.

$ curl -v --resolve docker-registry.wikimedia.org:443:208.80.154.224 https://docker-registry.wikimedia.org/v2/wikimedia/mediawiki-services-kask/manifests/v1.0.3
[...]
> GET /v2/wikimedia/mediawiki-services-kask/manifests/v1.0.3 HTTP/2
> Host: docker-registry.wikimedia.org
[...]
< HTTP/2 200 
< date: Thu, 29 Aug 2019 10:45:21 GMT
< content-type: application/vnd.docker.distribution.manifest.v1+prettyjws
[...]
< x-cache: cp1075 miss, cp1077 miss
[...]
{
   "schemaVersion": 1,
   "name": "wikimedia/mediawiki-services-kask",
   "tag": "v1.0.3",
   "architecture": "amd64",
Thu, Aug 29, 10:48 AM · Traffic, serviceops, Operations
ema moved T231423: cergen fails signing CSR from Triage to TLS on the Traffic board.
Thu, Aug 29, 10:28 AM · Patch-For-Review, Operations, Traffic
ema added a comment to T231423: cergen fails signing CSR.

@Ottomata: I understand that this is now fixed, can you confirm and close the task if so?

Thu, Aug 29, 10:28 AM · Patch-For-Review, Operations, Traffic
ema moved T231504: Unexpectedly received mobile version of an article while logged out from Triage to Caching on the Traffic board.
Thu, Aug 29, 10:28 AM · Operations, Traffic
ema moved T231525: cp1085 - IPMI not working from Triage to Hardware on the Traffic board.
Thu, Aug 29, 10:26 AM · ops-eqiad, Traffic, Operations
ema moved T231533: Improve ATS prometheus metrics from Triage to Caching on the Traffic board.
Thu, Aug 29, 10:26 AM · Operations, Traffic
ema closed T231063: Allow blocking requests from specific networks on the edge as Resolved.

This is now done.

Thu, Aug 29, 10:26 AM · Operations, Traffic

Wed, Aug 28

ema added a comment to T231388: Error pulling image from docker registry.

A proper fix for this issue is blocked on cergen bug T231423.

Wed, Aug 28, 9:37 AM · Traffic, serviceops, Operations
ema triaged T231423: cergen fails signing CSR as High priority.
Wed, Aug 28, 9:33 AM · Patch-For-Review, Operations, Traffic
ema created T231423: cergen fails signing CSR.
Wed, Aug 28, 9:33 AM · Patch-For-Review, Operations, Traffic

Tue, Aug 27

ema moved T231331: varnishkafka statsv and webrequest crashed on cp1081 from Triage to Caching on the Traffic board.
Tue, Aug 27, 2:48 PM · Traffic, Analytics, Operations
ema triaged T231331: varnishkafka statsv and webrequest crashed on cp1081 as Normal priority.
Tue, Aug 27, 2:48 PM · Traffic, Analytics, Operations
ema created T231331: varnishkafka statsv and webrequest crashed on cp1081.
Tue, Aug 27, 2:47 PM · Traffic, Analytics, Operations
ema moved T230687: Decide/document criteria needed to serve acme-chief LE issued unified certificate to end users from Triage to TLS on the Traffic board.
Tue, Aug 27, 10:09 AM · Operations, Traffic, Acme-chief
ema moved T231063: Allow blocking requests from specific networks on the edge from Triage to Caching on the Traffic board.
Tue, Aug 27, 10:08 AM · Operations, Traffic
ema moved T231086: Picture from Commons not found from Singapore from Triage to Watching on the Traffic board.
Tue, Aug 27, 10:08 AM · User-fgiunchedi, Structured-Data-Backlog, Structured Data Engineering, Multimedia, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Commons, MediaWiki-File-management, media-storage, Traffic, Operations
ema moved T231108: upload LB: retry swift 404s cross-cluster from Triage to Caching on the Traffic board.
Tue, Aug 27, 10:07 AM · Commons, MediaWiki-File-management, media-storage, Traffic, Operations
ema triaged T231108: upload LB: retry swift 404s cross-cluster as Normal priority.
Tue, Aug 27, 10:06 AM · Commons, MediaWiki-File-management, media-storage, Traffic, Operations
ema moved T231286: Track TLS related ATS metrics in prometheus from Triage to TLS on the Traffic board.
Tue, Aug 27, 10:06 AM · Traffic, Operations
ema moved T231287: Investigate HTTP/2 limits on trafficserver from Triage to TLS on the Traffic board.
Tue, Aug 27, 10:06 AM · Patch-For-Review, Operations, Traffic

Mon, Aug 26

ema added a comment to T231063: Allow blocking requests from specific networks on the edge.

Feature implemented and documented on https://wikitech.wikimedia.org/wiki/Varnish#HOWTO

Mon, Aug 26, 8:21 AM · Operations, Traffic

Fri, Aug 23

ema triaged T231079: check_prometheus_metric: show errors in icinga, make "warning" optional as Low priority.
Fri, Aug 23, 1:10 PM · observability
ema created T231079: check_prometheus_metric: show errors in icinga, make "warning" optional .
Fri, Aug 23, 1:10 PM · observability
ema committed rLPRIfa6d1b76e25f: Define secret varnish/blocked-nets.inc.vcl (authored by ema).
Define secret varnish/blocked-nets.inc.vcl
Fri, Aug 23, 8:44 AM
ema added a comment to T231063: Allow blocking requests from specific networks on the edge.

I think it's good to have a first, simple implementation, like the one above, but I think going further we would need a "block" object in puppet (or elsewhere, more on that below) that includes:

  • IP or an IP range
  • User agent (or regexp)
  • An arbitrary list of headers
  • A url (or regexp)

And the block should be applied to anything that matches all the non-null values above. The object should also probably include information on what response to return.

Fri, Aug 23, 8:24 AM · Operations, Traffic
ema triaged T231063: Allow blocking requests from specific networks on the edge as Normal priority.
Fri, Aug 23, 8:07 AM · Operations, Traffic
ema created T231063: Allow blocking requests from specific networks on the edge.
Fri, Aug 23, 8:07 AM · Operations, Traffic
ema created P8967 (An Untitled Masterwork).
Fri, Aug 23, 6:56 AM
ema created P8966 (An Untitled Masterwork).
Fri, Aug 23, 5:06 AM

Thu, Aug 22

ema created P8959 (An Untitled Masterwork).
Thu, Aug 22, 3:17 PM
ema moved T230772: Piwik JS isn't cached from Triage to Watching on the Traffic board.
Thu, Aug 22, 1:52 PM · Analytics-Kanban, Performance-Team (Radar), Traffic, Operations, Analytics-Wikistats, Analytics
ema triaged T230772: Piwik JS isn't cached as Normal priority.
Thu, Aug 22, 1:52 PM · Analytics-Kanban, Performance-Team (Radar), Traffic, Operations, Analytics-Wikistats, Analytics
ema added a comment to T230772: Piwik JS isn't cached.

@ema hi :) Are response headers like Cache-Control used by Varnish in case caching: 'pass' is configured?

Thu, Aug 22, 1:51 PM · Analytics-Kanban, Performance-Team (Radar), Traffic, Operations, Analytics-Wikistats, Analytics

Fri, Aug 16

ema created P8922 (An Untitled Masterwork).
Fri, Aug 16, 11:14 AM

Aug 15 2019

ema added a comment to T188831: Some thumbnail images delivered with wrong application/x-www-form-urlencoded mime-type.
Aug 15 2019, 1:15 PM · Traffic, Operations, Multimedia, Thumbor, Commons, MediaWiki-File-management, media-storage
ema committed rLPRIdd4ec657b0a1: secret: dummy key for grafana (authored by ema).
secret: dummy key for grafana
Aug 15 2019, 1:07 PM
ema moved T188831: Some thumbnail images delivered with wrong application/x-www-form-urlencoded mime-type from Triage to Caching on the Traffic board.
Aug 15 2019, 10:48 AM · Traffic, Operations, Multimedia, Thumbor, Commons, MediaWiki-File-management, media-storage
ema added a project to T188831: Some thumbnail images delivered with wrong application/x-www-form-urlencoded mime-type: Traffic.
Aug 15 2019, 10:47 AM · Traffic, Operations, Multimedia, Thumbor, Commons, MediaWiki-File-management, media-storage
ema added a comment to T188831: Some thumbnail images delivered with wrong application/x-www-form-urlencoded mime-type.

@Ciencia_Al_Poder, @Wang_Qiliang: I have added a workaround at the CDN level which replaces the wrong Content-Type based on file extension. Can you please check if the issue is still reproducible on your side?

Aug 15 2019, 10:46 AM · Traffic, Operations, Multimedia, Thumbor, Commons, MediaWiki-File-management, media-storage
ema added a comment to T162035: Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser.

See T188831 for the application/x-www-form-urlencoded variation of this.

Aug 15 2019, 9:02 AM · Patch-For-Review, Traffic, Operations, media-storage

Aug 14 2019

ema committed rLPRI294bb5509916: secret: dummy key for webserver-misc-apps (authored by ema).
secret: dummy key for webserver-misc-apps
Aug 14 2019, 2:06 PM
ema updated the task description for T210411: Applayer services without TLS.
Aug 14 2019, 1:18 PM · Patch-For-Review, serviceops, Operations, Traffic
ema updated the task description for T210411: Applayer services without TLS.
Aug 14 2019, 1:13 PM · Patch-For-Review, serviceops, Operations, Traffic
ema updated the task description for T210411: Applayer services without TLS.
Aug 14 2019, 1:12 PM · Patch-For-Review, serviceops, Operations, Traffic
ema moved T229875: [Bug] iPadOS 13 shows the desktop version of Safari with a broken layout from Triage to Watching on the Traffic board.
Aug 14 2019, 12:52 PM · Readers-Web-Backlog (Needs Product Owner Decisions), Operations, Traffic
ema moved T230051: wikidata.org handles GET MWAPI requests, but silently fails on POST from Triage to Caching on the Traffic board.
Aug 14 2019, 12:45 PM · Traffic, Core Platform Team Workboards (Clinic Duty Team), Operations, Wikidata-Campsite, Wikidata, MediaWiki-API
ema moved T230075: Setting up static maintenance page on Foundation servers for Foundation website from Triage to General on the Traffic board.
Aug 14 2019, 12:41 PM · Traffic, wikimediafoundation.org, Operations, Security
ema moved T230382: Remove aliases `minnan` and `zh-cfr` for the Min Nan Wikipedia from Triage to DNS Names on the Traffic board.
Aug 14 2019, 12:39 PM · Patch-For-Review, Traffic, Operations, Wikimedia-Apache-configuration, DNS
ema moved T230448: Aug 28th: turn off 1/3 esams-knams lasers in advance of Relined PA-988002 maintenance from Triage to Network on the Traffic board.
Aug 14 2019, 12:39 PM · netops, Traffic, Operations
ema moved T230470: Could not reach wikipedia from domain wikipedia.fi from Triage to DNS Names on the Traffic board.
Aug 14 2019, 12:39 PM · Traffic, Operations, DNS, Domains
ema created P8907 (An Untitled Masterwork).
Aug 14 2019, 7:58 AM

Aug 10 2019

ema raised the priority of T188831: Some thumbnail images delivered with wrong application/x-www-form-urlencoded mime-type from Low to High.

Priority set to High as images are not displayed correctly due to this. I see the bug happening right now on https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Logo_European_Central_Bank.svg/150px-Logo_European_Central_Bank.svg.png

Aug 10 2019, 9:31 AM · Traffic, Operations, Multimedia, Thumbor, Commons, MediaWiki-File-management, media-storage

Aug 9 2019

ema committed rLPRIcdc4c9ab972d: secret: dummy key for phabricator (authored by ema).
secret: dummy key for phabricator
Aug 9 2019, 7:49 AM
ema added a comment to T210411: Applayer services without TLS.

We have added TLS termination to bromine/vega with profile::tlsproxy::envoy. In the upcoming days I'll use the profile to add termination to all remaining services. Thanks @Joe!!

Aug 9 2019, 7:08 AM · Patch-For-Review, serviceops, Operations, Traffic
ema updated the task description for T210411: Applayer services without TLS.
Aug 9 2019, 7:05 AM · Patch-For-Review, serviceops, Operations, Traffic

Aug 8 2019

ema created P8887 traffic-cache-atstext.yaml.
Aug 8 2019, 10:00 AM
ema created P8886 (An Untitled Masterwork).
Aug 8 2019, 9:09 AM
ema moved T230053: Add a repo reference to Design Strategy web address from Triage to Watching on the Traffic board.
Aug 8 2019, 9:03 AM · Traffic, Domains, Product-Design-Strategy, Operations