ema (Emanuele Rocca)
Senior Site Reliability Engineer, Traffic Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Sep 29 2015, 8:49 PM (172 w, 2 d)
Availability
Available
IRC Nick
ema
LDAP User
Ema
MediaWiki User
Unknown

Recent Activity

Thu, Jan 10

ema triaged T213417: lvs2002: raid battery failure as Normal priority.
Thu, Jan 10, 12:33 PM · Operations, ops-codfw, Traffic
ema created T213417: lvs2002: raid battery failure.
Thu, Jan 10, 12:32 PM · Operations, ops-codfw, Traffic

Wed, Jan 9

ema updated the task description for T206339: Separate Traffic layer caches for PHP7/HHVM.
Wed, Jan 9, 11:30 AM · Patch-For-Review, Traffic, Operations
ema triaged T213263: Partial cache_upload traffic switchover to ATS and switchback to Varnish as Normal priority.
Wed, Jan 9, 11:10 AM · Patch-For-Review, Operations, Traffic
ema created T213263: Partial cache_upload traffic switchover to ATS and switchback to Varnish.
Wed, Jan 9, 11:10 AM · Patch-For-Review, Operations, Traffic
ema closed T209590: HTTP/2 requests fail with too-long URLs as Resolved.

The patch by @Vgutierrez fixed this bug. Closing.

Wed, Jan 9, 11:02 AM · Patch-For-Review, Traffic, Operations
ema moved T212197: Deliver mobile-based version for automatic translations from Triage to Caching on the Traffic board.
Wed, Jan 9, 10:52 AM · Patch-For-Review, Operations, Traffic, ExternalGuidance
ema closed T207048: ATS production-ready as a backend cache layer as Resolved.

We've added TLS support for maps and fixed the SAN list on swift to ensure proper TLS connections with upload origin servers. This is thus done.

Wed, Jan 9, 9:50 AM · Patch-For-Review, Operations, Traffic

Fri, Jan 4

ema added a comment to T209590: HTTP/2 requests fail with too-long URLs.

Note it's ok if the server implements some limits. The problem here is that the client isn't getting the expected 4xx response (e.g. 414 or 431) when the limit is hit, all it sees is a dropped connection.

Fri, Jan 4, 9:01 PM · Patch-For-Review, Traffic, Operations
ema moved T209590: HTTP/2 requests fail with too-long URLs from Triage to TLS on the Traffic board.
Fri, Jan 4, 2:37 PM · Patch-For-Review, Traffic, Operations
ema updated subscribers of T209590: HTTP/2 requests fail with too-long URLs.

P7916, which worked for @Huji, does not work for me. The path in P7916 (/wiki/Main_Page?_=xxxx...) is 4698 characters long.

Fri, Jan 4, 2:34 PM · Patch-For-Review, Traffic, Operations
ema moved T212312: prometheus-based graph significantly slower than statsd equivalent from Triage to Watching on the Traffic board.
Fri, Jan 4, 1:30 PM · monitoring, Traffic, Operations
ema moved T212914: Redirecting incoming queries to non-existent subpages from Triage to DNS Names on the Traffic board.
Fri, Jan 4, 1:30 PM · Operations, Traffic, Domains
ema triaged T212914: Redirecting incoming queries to non-existent subpages as Normal priority.
Fri, Jan 4, 1:30 PM · Operations, Traffic, Domains

Wed, Jan 2

ema awarded T209136: python3-etcd needs python3-dnspython a Party Time token.
Wed, Jan 2, 12:03 PM · Patch-For-Review, Operations, Operations-Software-Development
ema closed T212215: Update Subject Alternative Name field in TLS certificates for swift as Resolved.

New certificates deployed both in codfw and in eqiad.

Wed, Jan 2, 12:02 PM · Patch-For-Review, media-storage, Operations, Traffic
ema closed T212215: Update Subject Alternative Name field in TLS certificates for swift, a subtask of T207048: ATS production-ready as a backend cache layer, as Resolved.
Wed, Jan 2, 12:02 PM · Patch-For-Review, Operations, Traffic

Fri, Dec 21

ema added a comment to T212215: Update Subject Alternative Name field in TLS certificates for swift.

Tested the new cert on ms-fe2006, looks good:

Fri, Dec 21, 9:36 AM · Patch-For-Review, media-storage, Operations, Traffic

Dec 19 2018

ema triaged T212312: prometheus-based graph significantly slower than statsd equivalent as Normal priority.
Dec 19 2018, 4:18 PM · monitoring, Traffic, Operations
ema created T212312: prometheus-based graph significantly slower than statsd equivalent.
Dec 19 2018, 4:18 PM · monitoring, Traffic, Operations
ema moved T212310: varnishreqstats sends truncated statsd traffic from Triage to Caching on the Traffic board.
Dec 19 2018, 4:12 PM · Operations, Traffic
ema moved T212215: Update Subject Alternative Name field in TLS certificates for swift from Triage to TLS on the Traffic board.
Dec 19 2018, 4:12 PM · Patch-For-Review, media-storage, Operations, Traffic

Dec 18 2018

ema triaged T212219: wmf-auto-restart fails on certain legacy services as Normal priority.
Dec 18 2018, 3:06 PM · Patch-For-Review, Operations
ema created T212219: wmf-auto-restart fails on certain legacy services.
Dec 18 2018, 3:06 PM · Patch-For-Review, Operations
ema triaged T212215: Update Subject Alternative Name field in TLS certificates for swift as Normal priority.
Dec 18 2018, 2:43 PM · Patch-For-Review, media-storage, Operations, Traffic

Dec 17 2018

ema updated the task description for T210411: Applayer services without TLS.
Dec 17 2018, 1:15 PM · serviceops, Operations, Traffic
ema added a comment to T210411: Applayer services without TLS.

I do agree with @Joe, without a proper PKI this is going to be painful. For now however I've added TLS support to kartotherian (T211970) as that's part of cache_upload, which is the cluster we're gonna tackle first for the conversion to ATS.

Dec 17 2018, 1:15 PM · serviceops, Operations, Traffic
ema closed T211970: kartotherian TLS support as Resolved.
Dec 17 2018, 1:11 PM · Patch-For-Review, Maps (Kartotherian), Operations, Traffic
ema closed T211970: kartotherian TLS support, a subtask of T210411: Applayer services without TLS, as Resolved.
Dec 17 2018, 1:11 PM · serviceops, Operations, Traffic
ema moved T211970: kartotherian TLS support from Triage to TLS on the Traffic board.
Dec 17 2018, 11:58 AM · Patch-For-Review, Maps (Kartotherian), Operations, Traffic
ema closed T204355: Allow traffic team to manage the traffic blog on phame as Resolved.

Yes @mmodell, now I can. Thank you!

Dec 17 2018, 9:59 AM · Operations, Phabricator, Traffic

Dec 14 2018

ema added a project to T211970: kartotherian TLS support: Maps (Kartotherian).
Dec 14 2018, 12:18 PM · Patch-For-Review, Maps (Kartotherian), Operations, Traffic
ema triaged T211970: kartotherian TLS support as Normal priority.
Dec 14 2018, 12:17 PM · Patch-For-Review, Maps (Kartotherian), Operations, Traffic

Dec 12 2018

ema moved T211661: Automatically clean up unused thumbnails in Swift from Triage to Watching on the Traffic board.
Dec 12 2018, 2:22 PM · Traffic, media-storage, Operations, Performance-Team
ema moved T182028: DNS repo: add CI checks for obvious configuration errors from Triage to DNS Infra on the Traffic board.
Dec 12 2018, 2:21 PM · Traffic, DNS, Patch-For-Review, Operations-Software-Development, Operations
ema moved T211697: clean up deprecated TLS certificates from the puppet repo from Triage to TLS on the Traffic board.
Dec 12 2018, 2:21 PM · Traffic, Patch-For-Review, Operations
ema triaged T211750: Introduce Python code formatters usage as Normal priority.
Dec 12 2018, 10:48 AM · Operations, Operations-Software-Development

Dec 11 2018

ema added a comment to T211416: Put restbase201[3-8] into conftool and LVS.

And finally confctl throws an exception:

root@restbase2014:~# confctl --quiet depool --service restbase
CRITICAL:conftool:Could not load driver etcd: No module named 'dns'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/conftool/backend.py", line 15, in __init__
    exec(compile(open(driver_file).read(), driver_file, 'exec'), ctx)
  File "/usr/lib/python3/dist-packages/conftool/drivers/etcd.py", line 4, in <module>
    import etcd
  File "/usr/lib/python3/dist-packages/etcd/__init__.py", line 2, in <module>
    from .client import Client
  File "/usr/lib/python3/dist-packages/etcd/client.py", line 21, in <module>
    import dns.resolver
ImportError: No module named 'dns'
Dec 11 2018, 3:49 PM · Core Platform Team Kanban (Done with CPT), Services (done), User-fgiunchedi, Operations
ema added a project to T211668: mw1272 crashed: Bad page map in process hhvm: ops-eqiad.

The problem could be due to bad RAM. @Cmjohnson could you check?

Dec 11 2018, 9:45 AM · serviceops, ops-eqiad, Operations, HHVM
ema triaged T211668: mw1272 crashed: Bad page map in process hhvm as Normal priority.
Dec 11 2018, 9:42 AM · serviceops, ops-eqiad, Operations, HHVM
ema created T211668: mw1272 crashed: Bad page map in process hhvm.
Dec 11 2018, 9:42 AM · serviceops, ops-eqiad, Operations, HHVM

Dec 6 2018

ema moved T211254: Free up 185.15.59.0/24 from Triage to Network on the Traffic board.
Dec 6 2018, 12:55 PM · Patch-For-Review, Traffic, Operations, netops
ema closed T209021: ATS backend-side request-mangling as Resolved.

All the functionalities currently provided by our varnish backends in terms of request/response mangling have been implemented, with two exceptions:

Dec 6 2018, 12:40 PM · Patch-For-Review, Operations, Traffic
ema closed T209021: ATS backend-side request-mangling, a subtask of T207048: ATS production-ready as a backend cache layer, as Resolved.
Dec 6 2018, 12:40 PM · Patch-For-Review, Operations, Traffic
ema updated the task description for T209021: ATS backend-side request-mangling.
Dec 6 2018, 12:21 PM · Patch-For-Review, Operations, Traffic

Dec 5 2018

ema moved T209785: INMARSAT geolocates to the UK, leading to requests going to esams from Triage to General on the Traffic board.
Dec 5 2018, 10:22 AM · Operations, Traffic
ema moved T210134: wikidata.org lacks SPF record from Triage to DNS Names on the Traffic board.
Dec 5 2018, 10:21 AM · Mail, Patch-For-Review, Wikidata, User-revi, Traffic, Operations, DNS
ema moved T210167: Disable WMF-Last-Access cookies for wmfusercontent.org from Triage to Caching on the Traffic board.
Dec 5 2018, 10:21 AM · Privacy, Traffic, Operations
ema updated subscribers of T210167: Disable WMF-Last-Access cookies for wmfusercontent.org.

@Nuria thoughts from Analytics on this?

Dec 5 2018, 10:21 AM · Privacy, Traffic, Operations
ema moved T207718: Errors trying to fetch RDF from Wikidata from Triage to Watching on the Traffic board.
Dec 5 2018, 10:18 AM · Traffic, Operations, Performance-Team, Wikidata-Query-Service, Wikidata
ema moved T210484: Only serve debug HTTP headers when x-wikimedia-debug is present from Triage to Caching on the Traffic board.
Dec 5 2018, 10:17 AM · Patch-For-Review, Operations, Analytics, Traffic, Performance-Team
ema moved T211079: IPv6 ~20ms higher ping than IPv4 to gerrit from Triage to Network on the Traffic board.
Dec 5 2018, 10:16 AM · Operations, Traffic, netops
ema moved T211131: DNS recursors TCP retransmits from Triage to DNS Infra on the Traffic board.
Dec 5 2018, 10:16 AM · Pybal, Operations, Traffic

Dec 4 2018

ema added a project to T207718: Errors trying to fetch RDF from Wikidata: Traffic.
Dec 4 2018, 10:06 AM · Traffic, Operations, Performance-Team, Wikidata-Query-Service, Wikidata
ema added a comment to T207718: Errors trying to fetch RDF from Wikidata.
  • If it possible for nginx to be restarted (interrupting existing persistent connections) due to config updates or the like,
Dec 4 2018, 10:00 AM · Traffic, Operations, Performance-Team, Wikidata-Query-Service, Wikidata

Dec 3 2018

ema updated the task description for T207048: ATS production-ready as a backend cache layer.
Dec 3 2018, 2:57 PM · Patch-For-Review, Operations, Traffic
ema updated the task description for T209021: ATS backend-side request-mangling.
Dec 3 2018, 2:24 PM · Patch-For-Review, Operations, Traffic
ema updated the task description for T209021: ATS backend-side request-mangling.
Dec 3 2018, 9:32 AM · Patch-For-Review, Operations, Traffic
ema created P7876 timeout-no-timestamp-resp.vtc.
Dec 3 2018, 8:25 AM

Nov 29 2018

ema created P7868 (An Untitled Masterwork).
Nov 29 2018, 3:51 PM

Nov 28 2018

ema edited P7858 objhits.cc.
Nov 28 2018, 2:27 PM
ema created P7858 objhits.cc.
Nov 28 2018, 1:51 PM
ema created P7857 (An Untitled Masterwork).
Nov 28 2018, 11:23 AM

Nov 27 2018

ema moved T210411: Applayer services without TLS from Triage to Caching on the Traffic board.
Nov 27 2018, 9:41 AM · serviceops, Operations, Traffic

Nov 26 2018

ema closed T210295: ATS path normalization as Resolved.

Deployed and working fine. Closing.

Nov 26 2018, 4:04 PM · RESTBase, ChangeProp, Core Platform Team Backlog (Watching / External), Services (watching), Patch-For-Review, Operations, Traffic
ema closed T210295: ATS path normalization, a subtask of T209021: ATS backend-side request-mangling, as Resolved.
Nov 26 2018, 4:04 PM · Patch-For-Review, Operations, Traffic
ema updated the task description for T207048: ATS production-ready as a backend cache layer.
Nov 26 2018, 4:03 PM · Patch-For-Review, Operations, Traffic
ema triaged T210411: Applayer services without TLS as Normal priority.
Nov 26 2018, 4:00 PM · serviceops, Operations, Traffic
ema edited P7842 ats-origin-no-tls.py.
Nov 26 2018, 3:57 PM
ema created P7842 ats-origin-no-tls.py.
Nov 26 2018, 3:20 PM
ema closed T204225: ATS: log inspection at runtime as Resolved.
Nov 26 2018, 2:55 PM · Patch-For-Review, Operations, Traffic
ema closed T204225: ATS: log inspection at runtime, a subtask of T207048: ATS production-ready as a backend cache layer, as Resolved.
Nov 26 2018, 2:55 PM · Patch-For-Review, Operations, Traffic
ema added a comment to T210295: ATS path normalization.

Looking at the logs on cp1071 and other cp-ats hosts, it seems that the patch above is working as expected:

Nov 26 2018, 7:49 AM · RESTBase, ChangeProp, Core Platform Team Backlog (Watching / External), Services (watching), Patch-For-Review, Operations, Traffic

Nov 23 2018

ema updated the task description for T209021: ATS backend-side request-mangling.
Nov 23 2018, 4:18 PM · Patch-For-Review, Operations, Traffic
ema updated subscribers of T210295: ATS path normalization.

I now see that @BBlack prepped https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/407643/ to bring reality closer to theory by adding the missing characters to {mediawiki,restbase}_encode -- with a caveat for RB, namely the single quote being added to restbase_decode instead of restbase_encode.

Nov 23 2018, 4:16 PM · RESTBase, ChangeProp, Core Platform Team Backlog (Watching / External), Services (watching), Patch-For-Review, Operations, Traffic
ema updated the task description for T210295: ATS path normalization.
Nov 23 2018, 3:19 PM · RESTBase, ChangeProp, Core Platform Team Backlog (Watching / External), Services (watching), Patch-For-Review, Operations, Traffic
ema triaged T210295: ATS path normalization as Normal priority.
Nov 23 2018, 3:14 PM · RESTBase, ChangeProp, Core Platform Team Backlog (Watching / External), Services (watching), Patch-For-Review, Operations, Traffic

Nov 21 2018

ema added a comment to T204225: ATS: log inspection at runtime.
  1. trafficserver closes its open logpipes upon logging.yaml config reload
Nov 21 2018, 10:22 AM · Patch-For-Review, Operations, Traffic

Nov 20 2018

ema added a comment to T204225: ATS: log inspection at runtime.

Yesterday all ATS hosts ran out of disk space. That's due to trafficserver logging several messages like the following:

Nov 20 2018, 9:23 AM · Patch-For-Review, Operations, Traffic

Nov 19 2018

ema reopened T204225: ATS: log inspection at runtime as "Open".

There's a problem with fifo-log-demux reading from the pipe, reopening!

Nov 19 2018, 5:23 PM · Patch-For-Review, Operations, Traffic
ema reopened T204225: ATS: log inspection at runtime, a subtask of T207048: ATS production-ready as a backend cache layer, as Open.
Nov 19 2018, 5:23 PM · Patch-For-Review, Operations, Traffic
ema updated the task description for T207048: ATS production-ready as a backend cache layer.
Nov 19 2018, 4:48 PM · Patch-For-Review, Operations, Traffic
ema closed T204209: Define and deploy Icinga checks for ATS backends as Resolved.
Nov 19 2018, 4:47 PM · Patch-For-Review, Traffic, Operations
ema closed T204209: Define and deploy Icinga checks for ATS backends, a subtask of T207048: ATS production-ready as a backend cache layer, as Resolved.
Nov 19 2018, 4:47 PM · Patch-For-Review, Operations, Traffic
ema moved T209805: Wikipedia sends WebP thumbnails when Opera claims to support it but lies from Triage to Caching on the Traffic board.
Nov 19 2018, 2:20 PM · Performance-Team, Operations, Traffic, Multimedia
ema triaged T209805: Wikipedia sends WebP thumbnails when Opera claims to support it but lies as Normal priority.
Nov 19 2018, 2:20 PM · Performance-Team, Operations, Traffic, Multimedia
ema added a project to T209805: Wikipedia sends WebP thumbnails when Opera claims to support it but lies: Traffic.
Nov 19 2018, 1:41 PM · Performance-Team, Operations, Traffic, Multimedia
ema triaged T209707: tagged_interface sometimes exceeds IFNAMSIZ as Normal priority.
Nov 19 2018, 12:44 PM · Patch-For-Review, Traffic, Operations
ema moved T209703: trafficserver debian-glue builds failing on integration-slave-jessie-1001: No space left on device from Triage to Watching on the Traffic board.
Nov 19 2018, 12:44 PM · Traffic, Operations, Continuous-Integration-Infrastructure
ema moved T209707: tagged_interface sometimes exceeds IFNAMSIZ from Triage to LoadBalancer on the Traffic board.
Nov 19 2018, 12:44 PM · Patch-For-Review, Traffic, Operations

Nov 16 2018

ema changed the status of T204209: Define and deploy Icinga checks for ATS backends from Stalled to Open.

We fixed the verify_config issue in ATS 8.0.0-1wm2, this is not stalled anymore.

Nov 16 2018, 4:32 PM · Patch-For-Review, Traffic, Operations
ema changed the status of T204209: Define and deploy Icinga checks for ATS backends, a subtask of T207048: ATS production-ready as a backend cache layer, from Stalled to Open.
Nov 16 2018, 4:32 PM · Patch-For-Review, Operations, Traffic
ema triaged T209703: trafficserver debian-glue builds failing on integration-slave-jessie-1001: No space left on device as Normal priority.
Nov 16 2018, 2:21 PM · Traffic, Operations, Continuous-Integration-Infrastructure
ema created T209703: trafficserver debian-glue builds failing on integration-slave-jessie-1001: No space left on device.
Nov 16 2018, 2:21 PM · Traffic, Operations, Continuous-Integration-Infrastructure

Nov 15 2018

ema updated the task description for T207048: ATS production-ready as a backend cache layer.
Nov 15 2018, 10:08 AM · Patch-For-Review, Operations, Traffic
ema closed T204225: ATS: log inspection at runtime as Resolved.
Nov 15 2018, 10:08 AM · Patch-For-Review, Operations, Traffic
ema closed T204225: ATS: log inspection at runtime, a subtask of T207048: ATS production-ready as a backend cache layer, as Resolved.
Nov 15 2018, 10:08 AM · Patch-For-Review, Operations, Traffic
ema moved T208282: Increase EventLogging limit from 2K to 5K from Triage to General on the Traffic board.
Nov 15 2018, 10:07 AM · Performance-Team (Radar), Traffic, Analytics-EventLogging, Operations, Analytics
ema moved T209515: Renew Digicert Unified in 2019 from Triage to TLS on the Traffic board.
Nov 15 2018, 10:06 AM · Patch-For-Review, Operations, Traffic
ema moved T209337: lvs2006 crashed into (what it seems) an unrecoverable state from Triage to LoadBalancer on the Traffic board.
Nov 15 2018, 10:06 AM · Patch-For-Review, ops-codfw, Operations, Traffic