Page MenuHomePhabricator

Vgutierrez (Valentín Gutiérrez)
Traffic Security Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Feb 12 2018, 9:51 AM (141 w, 1 d)
Availability
Available
IRC Nick
vgutierrez
LDAP User
Vgutierrez
MediaWiki User
Unknown

Recent Activity

Mon, Oct 19

Ladsgroup awarded T133548: Create a secure redirect service for large count of non-canonical / junk domains a Like token.
Mon, Oct 19, 8:38 AM · Goal, Patch-For-Review, HTTPS, Traffic, Operations

Fri, Oct 16

Vgutierrez changed the status of T184715: pybal's "can-depool" logic only takes downServers into account from Open to Stalled.

this hasn't been backported to the 1.15 branch so it's never been deployed in production, I'd keep the task open

Fri, Oct 16, 10:42 AM · Pybal, Traffic, Operations
Vgutierrez closed T265584: Wipe digicert-2019a from the caching cluster as Resolved.
Fri, Oct 16, 7:32 AM · Operations, Traffic

Thu, Oct 15

Vgutierrez triaged T265584: Wipe digicert-2019a from the caching cluster as Medium priority.
Thu, Oct 15, 10:31 AM · Operations, Traffic
Vgutierrez created T265584: Wipe digicert-2019a from the caching cluster.
Thu, Oct 15, 10:30 AM · Operations, Traffic

Wed, Oct 7

Vgutierrez added a comment to T264074: varnishkafka 1.1.0 CPU usage increase.

@elukey double checking https://gerrit.wikimedia.org/r/plugins/gitiles/operations/debs/varnish4/+/refs/heads/debian-wmf/lib/libvarnishapi/vut.c#424 it looks like ed1696efc92cb6a9aa96d2b8e586be8dbbb1736b made it to varnish 6.0.6 as well

Wed, Oct 7, 9:32 AM · Patch-For-Review, Analytics-Clusters, Operations, Traffic
Vgutierrez added a comment to T264074: varnishkafka 1.1.0 CPU usage increase.

yeah.. I'll handle the backport :)

Wed, Oct 7, 9:22 AM · Patch-For-Review, Analytics-Clusters, Operations, Traffic

Tue, Oct 6

Vgutierrez added a comment to T262946: Bump Firefox version in basic support to 3.6 or newer.

@Izno You're right. But even if that's the case right now, there will be gaps in the future again.

While awaiting the final feedback from Traffic and @ema, point in case with the necessary TLS 1.2 support (second time verified of Wikipedia not being accessible by Safari 6.2 on OS X Mountain Lion) were to broaden this task with bumping the Grade A browsers to the list already shared by @Esanders in T262946#6512792
if we agree to intertwine MediaWiki support with Wikipedia traffic/ops support.

Right now, Firefox 3 basic support vs proposed 3.6 is holding current needs like relying on rem back. So I'm open for both ways, limiting the task to the current description or – personally leaning slightly towards – broadening it up to be more future-facing in our development limitations.

Tue, Oct 6, 1:45 PM · TechCom-RFC, Browser-Support-Firefox, Front-end-Standards-Group, MediaWiki-General

Tue, Sep 29

Vgutierrez updated the task description for T258405: Deprecate TLSv1.2 weak ciphersuites.
Tue, Sep 29, 1:32 PM · User-notice, Patch-For-Review, Operations, Traffic

Sep 17 2020

Vgutierrez lowered the priority of T263006: Let's Encrypt transitioning to ISRG's Root from High to Medium.

acme-chief updated to version 0.29 in our production environment, the unified cert should be renewed tomorrow, we will check the offered chains then

Sep 17 2020, 11:11 AM · Patch-For-Review, Traffic, Acme-chief, Operations
Vgutierrez committed rOSACf714f4750f31: debian: Add release 0.29 to changelog (authored by Vgutierrez).
debian: Add release 0.29 to changelog
Sep 17 2020, 11:02 AM

Sep 16 2020

Vgutierrez added a comment to T263006: Let's Encrypt transitioning to ISRG's Root.

So I've prepared a 0.29 release shipping https://gerrit.wikimedia.org/r/q/topic:%22T263006%22+(status:open%20OR%20status:merged)

Sep 16 2020, 4:04 PM · Patch-For-Review, Traffic, Acme-chief, Operations
Vgutierrez created T263006: Let's Encrypt transitioning to ISRG's Root.
Sep 16 2020, 8:38 AM · Patch-For-Review, Traffic, Acme-chief, Operations

Sep 10 2020

Vgutierrez added a comment to T261632: Package varnish 6.0.x.

@ema I've added to the task description the CRs required to get the packages of all the vmods and varnishkafka, I've seen that we have varnish-modules compiled on deneb but I haven't found a CR.

Sep 10 2020, 3:57 PM · Analytics-Radar, Patch-For-Review, Traffic, Operations
Vgutierrez updated the task description for T261632: Package varnish 6.0.x.
Sep 10 2020, 2:40 PM · Analytics-Radar, Patch-For-Review, Traffic, Operations

Sep 8 2020

Vgutierrez updated the task description for T261632: Package varnish 6.0.x.
Sep 8 2020, 1:58 PM · Analytics-Radar, Patch-For-Review, Traffic, Operations
Vgutierrez triaged T262251: acme-chief shouldn't try to perform OCSP stapling of expired certs as Medium priority.
Sep 8 2020, 12:08 PM · Traffic, Acme-chief, cloud-services-team (Kanban), Operations, Cloud-VPS
Vgutierrez created T262251: acme-chief shouldn't try to perform OCSP stapling of expired certs.
Sep 8 2020, 10:02 AM · Traffic, Acme-chief, cloud-services-team (Kanban), Operations, Cloud-VPS

Sep 7 2020

Vgutierrez updated the task description for T261632: Package varnish 6.0.x.
Sep 7 2020, 3:04 PM · Analytics-Radar, Patch-For-Review, Traffic, Operations

Sep 1 2020

Vgutierrez updated the task description for T261632: Package varnish 6.0.x.
Sep 1 2020, 1:15 PM · Analytics-Radar, Patch-For-Review, Traffic, Operations

Aug 28 2020

Vgutierrez added a comment to T261528: SSL cert renewal warnings for cloudelastic100[5-6].wikimedia.org.

So.. as I can see on acmechief1001 the cert is been renewed as expected:

root@acmechief1001:~# openssl x509 -dates -noout -in /var/lib/acme-chief/certs/cloudelastic/live/ec-prime256v1.crt
notBefore=Aug  3 19:00:35 2020 GMT
notAfter=Nov  1 19:00:35 2020 GMT
root@acmechief1001:~# openssl x509 -dates -noout -in /var/lib/acme-chief/certs/cloudelastic/live/rsa-2048.crt
notBefore=Aug  3 19:00:43 2020 GMT
notAfter=Nov  1 19:00:43 2020 GMT
Aug 28 2020, 9:25 PM · Discovery-Search

Aug 24 2020

Vgutierrez updated the task description for T258405: Deprecate TLSv1.2 weak ciphersuites.
Aug 24 2020, 3:39 PM · User-notice, Patch-For-Review, Operations, Traffic

Aug 20 2020

Vgutierrez moved T260889: confd's watch functionality appears to be partially broken when interacting with etcd 3.x from Triage to LoadBalancer on the Traffic board.
Aug 20 2020, 9:58 AM · Traffic, conftool, serviceops, Operations

Aug 18 2020

Vgutierrez added a comment to T260702: Analyze custom varnish 5.1 patches considering the migration to varnish 6.
patchbackport/customavailable on varnish 6.0available on varnish 6.4can be removed?
0002-exp-thread-realtime.patchcustomnonoTBD (varnish-be specific)
0003-vsm-perms.patchcustomnonono
0004-storage-file-off-t.patchcustomnonoTBD (varnish-be specific)
0005-stats-shortlived.patchcustomnonono
0006-transaction-timeout.patchcustomnonoyes (adds a config parameter that is currently unused)
0007-varnishncsa-record-prefix.patchbackportyesyesyes
0008-vsv00002-5.1.patchbackportyesyesyes
0011-fix-discarding-labelsbackportyesyesyes
0012-oh-leak.patchbackportyesyesyes
0013-issue-1799.patchbackportyesyesyes
0014-n_lru_limited-counter.patchbackportyesyesyes
0015-cache_hit_grace-counter.patchbackportyesyesyes
0016-expired-objects-ignore-req.ttl.patchbackportyesyesyes
0017-new-ttl-in-vcl-calculation.patchbackportyesyesyes
0018-post-and-multiple-vcl.patchbackportyesyesyes
0019-vary-stevedore-mem-leak.patchbackportyesyesyes
0020-assert-error-http1_minimal_response.patchbackportyesyesyes
0021-dont-test-gunzip-partial.patchbackportyesyesyes
0022-deref-objcore-synth-err.patchbackportyesyesyes
0023-pass-delivery-is-no-err.patchbackportyesyesyes
0024-vbt-get-force-fresh.patchbackportyesyesyes
0025-extrachance-one-retry.patchbackportyesyesyes
0026-transient-full-cache_req_body-panic.patchbackportyesyesyes
0027-assert-error-vca_make_session.patchbackportyesyesyes
0028-panic-return-cond-fetch.patchbackportyesyesyes
0029-ban-lurker-bo-backoff.patchbackportyesyesyes
0030-startup-show-version.patchbackportyesyesyes
0031-vbt-close-stolen.patchbackportyesyesyes
0032-vbe_dir_finish-no-VBT_Wait.patchbackportyesyesyes
0033-recycled-honor-first_byte_timeout.patchbackportyesyesyes
0034-r02135.vtc-fixes.patchbackportyesyesyes
0035-vbf_stp_condfetch_crash.patchbackportyesnoyes iff target version is 6.0
0036-VSV00004.patchbackportyesyesyes
0037-force-discard.patchcustomnonoyes (failed experiment)
0038-vcl_active-lock.patchbackportyesyesyes
Aug 18 2020, 4:20 PM · Patch-For-Review, Operations, Traffic
Vgutierrez triaged T260702: Analyze custom varnish 5.1 patches considering the migration to varnish 6 as Medium priority.
Aug 18 2020, 3:38 PM · Patch-For-Review, Operations, Traffic
Vgutierrez created T260702: Analyze custom varnish 5.1 patches considering the migration to varnish 6.
Aug 18 2020, 3:38 PM · Patch-For-Review, Operations, Traffic

Aug 17 2020

Vgutierrez closed T260279: Add DVrandecic to group nda as Declined.

as I mentioned on my previous comment, being part of the wmf LDAP group is enough. @DVrandecic could you point us to the onboarding documentation that you've been following to get it updated? Thanks

Aug 17 2020, 7:46 AM · Operations, LDAP-Access-Requests, WMF-NDA-Requests

Aug 14 2020

Vgutierrez added a comment to T260279: Add DVrandecic to group nda.

From https://wikitech.wikimedia.org/wiki/LDAP/Groups:

wmf - for WMF staff/contractors (documented below)
ops - for operations people (see ops group in puppet manifests/site.pp) (documented below)
nda - for others who have signed NDAs for access to confidential data (documented below)

and as https://ldap.toolforge.org/group/wmf indicates, @DVrandecic is already a member of the wmf ldap group. I'd say we can close this task

Aug 14 2020, 10:20 AM · Operations, LDAP-Access-Requests, WMF-NDA-Requests
Vgutierrez triaged T260279: Add DVrandecic to group nda as Medium priority.
Aug 14 2020, 8:51 AM · Operations, LDAP-Access-Requests, WMF-NDA-Requests

Aug 12 2020

Vgutierrez triaged T259979: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey as Low priority.
Aug 12 2020, 2:17 PM · Research, Wikimedia-Apache-configuration, Patch-For-Review, Operations
Vgutierrez triaged T260240: UNIX group 'bird' missing on bird package installation as Medium priority.
Aug 12 2020, 1:50 PM · observability, Cloud-VPS, Operations

Aug 11 2020

Vgutierrez added a comment to T238593: Phabricator downtime due to aphlict and websockets (aphlict current disabled).

hmm that's interesting, please note that this is not the first time we use websockets. etherpad.wm.o is already using websockets successfully (even when HTTP/2 is used to perform the upgrade request)

Aug 11 2020, 1:06 PM · Release-Engineering-Team-TODO (2020-07-01 to 2020-09-30 (Q1)), Patch-For-Review, Phabricator, serviceops, Operations, Traffic

Aug 10 2020

Vgutierrez closed T259388: Requesting access to production shell for Denny Vrandecic as Resolved.

@DVrandecic will also need a kerberos password

Aug 10 2020, 4:55 PM · Analytics-Radar, SRE-Access-Requests, Operations
Vgutierrez claimed T259388: Requesting access to production shell for Denny Vrandecic.
Aug 10 2020, 4:27 PM · Analytics-Radar, SRE-Access-Requests, Operations
Vgutierrez added a comment to T259388: Requesting access to production shell for Denny Vrandecic.

@akosiaris is on vacations, I'll handle this ASAP

Aug 10 2020, 4:26 PM · Analytics-Radar, SRE-Access-Requests, Operations
Vgutierrez awarded Blog Post: RPKI Origin Validation a Love token.
Aug 10 2020, 1:39 PM · netops

Aug 4 2020

Vgutierrez added a comment to T257968: Certificate for *.beta.wmflabs.org has expired.

I think we have two bugs here:

  1. The API service must be restarted as well after an acme-chief upgrade.
  2. The API service shouldn't list non allowed files, I'd suspect that dropping a file on the cert directory would break puppet on the acme-chief clients right now.

I'm currently on vacations, I'll handle those issues next week

Aug 4 2020, 10:54 AM · Beta-Cluster-Infrastructure
Vgutierrez closed T259338: do not generate metadata for parts that aren't allowed as Resolved.
Aug 4 2020, 10:52 AM · Patch-For-Review, Traffic, Operations, Acme-chief
Vgutierrez closed T259338: do not generate metadata for parts that aren't allowed, a subtask of T257968: Certificate for *.beta.wmflabs.org has expired, as Resolved.
Aug 4 2020, 10:52 AM · Beta-Cluster-Infrastructure

Jul 31 2020

Vgutierrez triaged T259338: do not generate metadata for parts that aren't allowed as Medium priority.
Jul 31 2020, 11:03 AM · Patch-For-Review, Traffic, Operations, Acme-chief
Vgutierrez created T259338: do not generate metadata for parts that aren't allowed.
Jul 31 2020, 11:03 AM · Patch-For-Review, Traffic, Operations, Acme-chief

Jul 30 2020

Vgutierrez added a comment to T255249: acme-chief: support for generating a concatenated cert/key file.

@bd808 acme-chief 0.27 shipping your changes has been deployed in production. Please note that your change will be effective the next time acme-chief reissues your cert.

Jul 30 2020, 1:54 PM · Patch-For-Review, Acme-chief

Jul 28 2020

Vgutierrez closed T238724: ATS logs aren't being rotated as Resolved.
Jul 28 2020, 10:38 AM · Operations, Traffic
Vgutierrez closed T242620: ats-tls is having issues when varnish-fe goes away as Resolved.
Jul 28 2020, 10:37 AM · Patch-For-Review, Operations, Traffic
Vgutierrez closed T256632: cp3053 nvme0 issues as Resolved.

Thanks for pinging me @wiki_willy, we can close to this task, everything seems good in cp3053 so far. I'll reopen the task if needed

Jul 28 2020, 8:54 AM · DC-Ops, ops-esams, Traffic, Operations

Jul 21 2020

Ladsgroup awarded T238625: Remove nginx puppetization for cache text/text_ats a Yellow Medal token.
Jul 21 2020, 1:44 PM · Patch-For-Review, Traffic, Operations

Jul 20 2020

Vgutierrez triaged T258405: Deprecate TLSv1.2 weak ciphersuites as Medium priority.
Jul 20 2020, 2:13 PM · User-notice, Patch-For-Review, Operations, Traffic
Vgutierrez moved T258405: Deprecate TLSv1.2 weak ciphersuites from Triage to TLS on the Traffic board.
Jul 20 2020, 2:05 PM · User-notice, Patch-For-Review, Operations, Traffic
Vgutierrez created T258405: Deprecate TLSv1.2 weak ciphersuites.
Jul 20 2020, 2:05 PM · User-notice, Patch-For-Review, Operations, Traffic
Vgutierrez closed T238038: Start warning and deprecation process for all legacy TLS as Resolved.
Jul 20 2020, 1:19 PM · Operations, Traffic
Vgutierrez awarded T257573: Remove multicast a Love token.
Jul 20 2020, 12:26 PM · Patch-For-Review, netops, Operations, Traffic

Jul 15 2020

Vgutierrez added a comment to T257968: Certificate for *.beta.wmflabs.org has expired.

I think we have two bugs here:

  1. The API service must be restarted as well after an acme-chief upgrade.
  2. The API service shouldn't list non allowed files, I'd suspect that dropping a file on the cert directory would break puppet on the acme-chief clients right now.
Jul 15 2020, 5:38 AM · Beta-Cluster-Infrastructure

Jul 9 2020

Vgutierrez added a comment to T257537: Configure HTTPS for pywikibot.org.

our acme-chief production environment uses dns-01 challenges to validate domain ownership against Let's Encrypt. In order to be able to issue a certificate for pywikibot.org we need to control the NS records for pywikibot.org (basically set them to ns[012].wikimedia.org)

Jul 9 2020, 8:16 AM · HTTPS, Operations, Pywikibot, Traffic

Jul 1 2020

Vgutierrez closed T256655: Current codfw caches have wrong NVME format as Resolved.
Jul 1 2020, 9:44 AM · Operations, Traffic

Jun 30 2020

Vgutierrez changed the status of T256632: cp3053 nvme0 issues from Open to Stalled.

repooled after powercycling & issuing the following commands:

/usr/sbin/nvme format /dev/nvme0n1 -l 2
 echo ';' | /usr/sbin/sfdisk /dev/nvme0n1
Jun 30 2020, 8:29 AM · DC-Ops, ops-esams, Traffic, Operations

Jun 29 2020

Vgutierrez added a comment to T256655: Current codfw caches have wrong NVME format.

we have scheduled a system reboot of these boxes.. I'll sync that with the "re-format" of the NVMe devices.

Jun 29 2020, 4:45 PM · Operations, Traffic
Vgutierrez triaged T256632: cp3053 nvme0 issues as Medium priority.
Jun 29 2020, 1:47 PM · DC-Ops, ops-esams, Traffic, Operations
Vgutierrez created T256632: cp3053 nvme0 issues.
Jun 29 2020, 1:47 PM · DC-Ops, ops-esams, Traffic, Operations

Jun 18 2020

Vgutierrez added a comment to T251732: wikiworkshop.org has Facebook button, external statcounter, https to http redirect.

that's interesting.. why we don't have HSTS headers for wikiworkshop.org?

Jun 18 2020, 3:31 PM · Security-Team, Privacy, Research, Privacy Engineering, Traffic, Operations
Vgutierrez created P11584 (An Untitled Masterwork).
Jun 18 2020, 11:28 AM
Vgutierrez created P11579 (An Untitled Masterwork).
Jun 18 2020, 8:47 AM

Jun 15 2020

Vgutierrez closed T254714: ats-backend throttles connections under heavy load as Resolved.
Jun 15 2020, 4:12 PM · Operations, Traffic
aborrero awarded T255249: acme-chief: support for generating a concatenated cert/key file a Love token.
Jun 15 2020, 3:56 PM · Patch-For-Review, Acme-chief
Vgutierrez closed T255249: acme-chief: support for generating a concatenated cert/key file as Resolved.

This seems to be working (from my tests on acmechief-test1001):

root@acmechief-test1001:/var/lib/acme-chief/certs/mirrors/new# grep "BEGIN EC PRIVATE KEY" ec-prime256v1.crt.key
-----BEGIN EC PRIVATE KEY-----
root@acmechief-test1001:/var/lib/acme-chief/certs/mirrors/new# grep "BEGIN CERT" ec-prime256v1.crt.key
-----BEGIN CERTIFICATE-----
Jun 15 2020, 12:57 PM · Patch-For-Review, Acme-chief
Vgutierrez updated subscribers of T255368: noc.wikimedia.org consistently 503s in eqsin and sometimes 503s in esams.
Jun 15 2020, 9:03 AM · Traffic, Operations
Vgutierrez added a comment to T255368: noc.wikimedia.org consistently 503s in eqsin and sometimes 503s in esams.

Filtering by BeReqHeader we can see how varnish-fe apparently gets a 200 from ats-be and returns a 503 cause the "body cannot be fetched":

Jun 15 2020, 9:02 AM · Traffic, Operations
Vgutierrez added a comment to T255368: noc.wikimedia.org consistently 503s in eqsin and sometimes 503s in esams.

varnish-fe also shows a 503:

vgutierrez@cp5010:~$ sudo -i varnishlog -n frontend -q "ReqHeader:Host eq noc.wikimedia.org"
*   << Request  >> 1066411391
-   Begin          req 1058354185 rxreq
-   Timestamp      Start: 1592211439.329728 0.000000 0.000000
-   Timestamp      Req: 1592211439.329728 0.000000 0.000000
-   ReqStart       10.132.0.110 20543
-   ReqMethod      GET
-   ReqURL         /
-   ReqProtocol    HTTP/1.1
-   ReqHeader      User-Agent: curl/7.64.1
-   ReqHeader      Accept: */*
-   ReqHeader      Host: noc.wikimedia.org
-   ReqHeader      X-Forwarded-For: 84.78.247.11
-   ReqHeader      X-Analytics-TLS: vers=TLSv1.2;keyx=X25519;auth=ECDSA;ciph=AES256-GCM-SHA384;prot=h2;sess=new
-   ReqHeader      X-Client-IP: 84.78.247.11
-   ReqHeader      X-Connection-Properties: H2=1; SSR=0; SSL=TLSv1.2; C=ECDHE-ECDSA-AES256-GCM-SHA384; EC=X25519;
-   ReqHeader      X-Forwarded-Proto: https
-   ReqUnset       X-Forwarded-For: 84.78.247.11
-   ReqHeader      X-Forwarded-For: 84.78.247.11, 10.132.0.110
-   VCL_call       RECV
-   ReqUnset       X-Analytics-TLS: vers=TLSv1.2;keyx=X25519;auth=ECDSA;ciph=AES256-GCM-SHA384;prot=h2;sess=new
-   VCL_acl        MATCH wikimedia_trust "10.0.0.0"/8
-   VCL_acl        NO_MATCH local_host
-   VCL_acl        MATCH local_tls_terminator "10.132.0.110"
-   ReqUnset       X-Forwarded-For: 84.78.247.11, 10.132.0.110
-   ReqHeader      X-Forwarded-For: 84.78.247.11, 10.132.0.110
-   ReqUnset       X-Forwarded-For: 84.78.247.11, 10.132.0.110
-   ReqHeader      X-Forwarded-For: 84.78.247.11, 10.132.0.110
-   ReqUnset       X-Forwarded-For: 84.78.247.11, 10.132.0.110
-   ReqHeader      X-Forwarded-For: 84.78.247.11, 10.132.0.110
-   ReqHeader      via-nginx: 1
-   ReqHeader      X-Trusted-Proxy:
-   ReqUnset       X-Trusted-Proxy:
-   ReqHeader      x-tls-prot: 1
-   ReqUnset       x-tls-prot: 1
-   ReqHeader      x-tls-prot: h2
-   ReqHeader      x-tls-sess: 0
-   ReqUnset       x-tls-sess: 0
-   ReqHeader      x-tls-sess: new
-   ReqHeader      x-tls-vers: TLSv1.2
-   ReqHeader      x-tls-keyx: X25519
-   ReqHeader      x-tls-auth: ECDHE-ECDSA-AES256-GCM-SHA384
-   ReqHeader      x-tls-ciph: AES256-GCM-SHA384
-   ReqUnset       x-tls-auth: ECDHE-ECDSA-AES256-GCM-SHA384
-   ReqHeader      x-tls-auth: ECDSA
-   ReqUnset       x-tls-ciph: AES256-GCM-SHA384
-   ReqHeader      x-tls-ciph: AES256-GCM-SHA384
-   VCL_Log        tls: vers=TLSv1.2;keyx=X25519;auth=ECDSA;ciph=AES256-GCM-SHA384;prot=h2;sess=new
-   ReqUnset       x-tls-prot: h2
-   ReqUnset       x-tls-vers: TLSv1.2
-   ReqUnset       x-tls-sess: new
-   ReqUnset       x-tls-keyx: X25519
-   ReqUnset       x-tls-auth: ECDSA
-   ReqUnset       x-tls-ciph: AES256-GCM-SHA384
-   VCL_acl        NO_MATCH blocked_nets
-   VCL_acl        NO_MATCH bot_blocked_nets
-   ReqUnset       Host: noc.wikimedia.org
-   ReqHeader      Host: noc.wikimedia.org
-   ReqUnset       Host: noc.wikimedia.org
-   ReqHeader      Host: noc.wikimedia.org
-   Debug          "Now using wikimedia_misc VCL"
-   VCL_return     vcl
-   VCL_call       RECV
-   ReqUnset       X-Analytics-TLS: vers=TLSv1.2;keyx=X25519;auth=ECDSA;ciph=AES256-GCM-SHA384;prot=h2;sess=new
-   VCL_acl        MATCH wikimedia_trust "10.0.0.0"/8
-   VCL_acl        NO_MATCH local_host
-   VCL_acl        MATCH local_tls_terminator "10.132.0.110"
-   ReqUnset       X-Forwarded-For: 84.78.247.11
-   ReqHeader      X-Forwarded-For: 84.78.247.11
-   ReqUnset       X-Forwarded-For: 84.78.247.11
-   ReqHeader      X-Forwarded-For: 84.78.247.11
-   ReqUnset       X-Forwarded-For: 84.78.247.11
-   ReqHeader      X-Forwarded-For: 84.78.247.11
-   ReqHeader      via-nginx: 1
-   ReqHeader      X-Trusted-Proxy:
-   ReqUnset       X-Trusted-Proxy:
-   ReqHeader      x-tls-prot: 1
-   ReqUnset       x-tls-prot: 1
-   ReqHeader      x-tls-prot: h2
-   ReqHeader      x-tls-sess: 0
-   ReqUnset       x-tls-sess: 0
-   ReqHeader      x-tls-sess: new
-   ReqHeader      x-tls-vers: TLSv1.2
-   ReqHeader      x-tls-keyx: X25519
-   ReqHeader      x-tls-auth: ECDHE-ECDSA-AES256-GCM-SHA384
-   ReqHeader      x-tls-ciph: AES256-GCM-SHA384
-   ReqUnset       x-tls-auth: ECDHE-ECDSA-AES256-GCM-SHA384
-   ReqHeader      x-tls-auth: ECDSA
-   ReqUnset       x-tls-ciph: AES256-GCM-SHA384
-   ReqHeader      x-tls-ciph: AES256-GCM-SHA384
-   VCL_Log        tls: vers=TLSv1.2;keyx=X25519;auth=ECDSA;ciph=AES256-GCM-SHA384;prot=h2;sess=new
-   ReqUnset       x-tls-prot: h2
-   ReqUnset       x-tls-vers: TLSv1.2
-   ReqUnset       x-tls-sess: new
-   ReqUnset       x-tls-keyx: X25519
-   ReqUnset       x-tls-auth: ECDSA
-   ReqUnset       x-tls-ciph: AES256-GCM-SHA384
-   VCL_acl        NO_MATCH blocked_nets
-   VCL_acl        NO_MATCH bot_blocked_nets
-   ReqUnset       Host: noc.wikimedia.org
-   ReqHeader      Host: noc.wikimedia.org
-   ReqUnset       Host: noc.wikimedia.org
-   ReqHeader      Host: noc.wikimedia.org
-   ReqHeader      X-WMF-NOCOOKIES: 1
-   VCL_return     pass
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       PASS
-   ReqHeader      X-CDIS: pass
-   VCL_return     fetch
-   Link           bereq 1066411392 pass
-   Timestamp      Fetch: 1592211440.829750 1.500022 1.500022
-   RespProtocol   HTTP/1.1
-   RespStatus     503
-   RespReason     Backend fetch failed
-   RespHeader     Date: Mon, 15 Jun 2020 08:57:20 GMT
-   RespHeader     Server: Varnish
-   RespHeader     X-CDIS: int
-   RespHeader     Content-Type: text/html; charset=utf-8
-   RespHeader     X-Varnish: 1066411391
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish (Varnish/5.1)
-   VCL_call       DELIVER
-   ReqUnset       X-CDIS: pass
-   ReqHeader      X-CDIS: int
-   RespUnset      X-CDIS: int
-   RespHeader     X-Cache-Int: cp5010 int
-   RespHeader     X-Cache: cp5010 int
-   RespHeader     X-Cache-Status: int
-   RespUnset      X-Cache-Int: cp5010 int
-   RespUnset      Via: 1.1 varnish (Varnish/5.1)
-   RespUnset      X-Cache-Status: int
-   RespHeader     X-Cache-Status: int-front
-   RespHeader     Server-Timing: cache;desc="int-front"
-   RespHeader     Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
-   RespHeader     X-Analytics:
-   ReqHeader      X-NowDay: 15-Jun-2020
-   RespHeader     Set-Cookie: WMF-Last-Access=15-Jun-2020;Path=/;HttpOnly;secure;Expires=Fri, 17 Jul 2020 00:00:00 GMT
-   RespUnset      X-Analytics:
-   RespHeader     X-Analytics: ;https=1
-   RespUnset      X-Analytics: ;https=1
-   RespHeader     X-Analytics: ;https=1;nocookies=1
-   RespUnset      X-Analytics: ;https=1;nocookies=1
-   RespHeader     X-Analytics: https=1;nocookies=1
-   RespHeader     X-Client-IP: 84.78.247.11
-   VCL_return     deliver
-   Timestamp      Process: 1592211440.829807 1.500079 0.000057
-   RespHeader     Content-Length: 1795
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1592211440.829838 1.500110 0.000031
-   ReqAcct        355 0 355 532 1795 2327
-   End
Jun 15 2020, 8:58 AM · Traffic, Operations
Vgutierrez added a comment to T255368: noc.wikimedia.org consistently 503s in eqsin and sometimes 503s in esams.

curl --http1.1 -H 'Host: noc.wikimedia.org' https://mwmaint.discovery.wmnet from cp5010 returns a HTTP 200 as expected

Jun 15 2020, 8:36 AM · Traffic, Operations
Vgutierrez added a comment to T255368: noc.wikimedia.org consistently 503s in eqsin and sometimes 503s in esams.

Checking against eqsin with curl --resolve noc.wikimedia.org:443:$(dig +short text-lb.eqsin.wikimedia.org) https://noc.wikimedia.org I do get a 503 and atslog shows the following results on cp5010:

Jun 15 2020, 8:29 AM · Traffic, Operations

Jun 12 2020

Vgutierrez triaged T255249: acme-chief: support for generating a concatenated cert/key file as Medium priority.
Jun 12 2020, 10:38 AM · Patch-For-Review, Acme-chief

Jun 8 2020

Vgutierrez created T254714: ats-backend throttles connections under heavy load.
Jun 8 2020, 7:56 AM · Operations, Traffic

Jun 4 2020

Vgutierrez closed T253807: Create diff.wikimedia.org subdomain as Resolved.
willikins:dns vgutierrez$ dig diff.wikimedia.org.
Jun 4 2020, 1:15 PM · Traffic, DNS, Operations, Domains
Vgutierrez moved T253807: Create diff.wikimedia.org subdomain from Triage to DNS Names on the Traffic board.
Jun 4 2020, 9:52 AM · Traffic, DNS, Operations, Domains

Jun 2 2020

Vgutierrez added a comment to T251726: Certificate *.wikipedia.org valid until 2020-06-20.

can we close this task or at least change the task title to lfocus on the icinga alerts? there is no issue with cert renewal itself :)

willikins:puppet vgutierrez$ openssl s_client -connect text-lb.eqiad.wikimedia.org:443 2>/dev/null < /dev/null |openssl x509 -noout -dates
notBefore=May 21 09:53:05 2020 GMT
notAfter=Aug 19 09:53:05 2020 GMT
Jun 2 2020, 12:59 PM · Traffic, serviceops, Operations
Vgutierrez triaged T254235: Let ats-tls handle port 80 as Medium priority.
Jun 2 2020, 12:53 PM · Patch-For-Review, Operations, Traffic
Vgutierrez created T254235: Let ats-tls handle port 80.
Jun 2 2020, 12:53 PM · Patch-For-Review, Operations, Traffic
Vgutierrez closed T229097: Provide ensure => absent support for acme_chief::cert define, a subtask of T229091: acme-chief failing in puppet with "Cannot open input file", as Resolved.
Jun 2 2020, 12:49 PM · Operations, Traffic
Vgutierrez closed T229097: Provide ensure => absent support for acme_chief::cert define as Resolved.

This has been automagically solved with 725e7f4eeb37a3742591a3f7357b6862e3b4c361, moving OCSP stapling to the acme-chief server side.

Jun 2 2020, 12:49 PM · Acme-chief, Traffic, Operations

May 22 2020

Vgutierrez closed T251219: cp5012 memory errors as Resolved.

cp5012 seems stable, I'll reopen this task if I see any sign of memory issues.

May 22 2020, 9:50 AM · Operations, ops-eqsin, Traffic

May 21 2020

Vgutierrez changed the status of T251219: cp5012 memory errors from Open to Stalled.

@RobH done. let's see how it goes, thanks!

May 21 2020, 6:05 AM · Operations, ops-eqsin, Traffic

May 18 2020

Vgutierrez added a comment to T252993: prometheus-trafficserver-exporter: InsecureRequestWarning.

in this case is pretty obvious, ats-tls is the only instance listening on port 443 (TLS), but yeah, +1 to provide unique SylogIdentifiers

May 18 2020, 9:00 AM · observability, Traffic, Operations

May 15 2020

Vgutierrez committed rOSACeec56e5a454d: acme_chief: Handle OCSP Request issues (authored by Vgutierrez).
acme_chief: Handle OCSP Request issues
May 15 2020, 9:38 PM
Vgutierrez committed rOSACbb5cafee93a1: Release 0.25 (authored by Vgutierrez).
Release 0.25
May 15 2020, 9:38 PM
Vgutierrez committed rOSACb7d5cb70c5dd: debian: Add release 0.25 to the changelog (authored by Vgutierrez).
debian: Add release 0.25 to the changelog
May 15 2020, 9:38 PM
Vgutierrez committed rOSAC5838c907cf29: tests: use unittest.mock instead of the 3rd party mock module (authored by Vgutierrez).
tests: use unittest.mock instead of the 3rd party mock module
May 15 2020, 9:38 PM
Vgutierrez committed rOSAC5d10fc8efa67: Release 0.25 (authored by Vgutierrez).
Release 0.25
May 15 2020, 9:38 PM
Vgutierrez committed rOSAC2c88bc08f0f9: acme_chief: Handle OCSP Request issues (authored by Vgutierrez).
acme_chief: Handle OCSP Request issues
May 15 2020, 9:38 PM
Vgutierrez committed rOSAC1a4786e690ec: tests: use unittest.mock instead of the 3rd party mock module (authored by Vgutierrez).
tests: use unittest.mock instead of the 3rd party mock module
May 15 2020, 9:38 PM
Vgutierrez closed T252901: Let's Encrypt OCSP responders are showing 503 errors as Resolved.
May 15 19:43:27 acmechief1001 acme-chief-backend[30417]: Refreshing live OCSP response for certificate non-canonical-redirect-2 / ec-prime256v1
May 15 19:43:27 acmechief1001 acme-chief-backend[30417]: live OCSP response refreshed successfully for non-canonical-redirect-2 / ec-prime256v1
May 15 19:43:27 acmechief1001 acme-chief-backend[30417]: Refreshing live OCSP response for certificate non-canonical-redirect-2 / rsa-2048
May 15 19:43:27 acmechief1001 acme-chief-backend[30417]: live OCSP response refreshed successfully for non-canonical-redirect-2 / rsa-2048
May 15 2020, 7:52 PM · Acme-chief, Operations, Traffic
Vgutierrez triaged T252901: Let's Encrypt OCSP responders are showing 503 errors as Medium priority.
May 15 2020, 6:03 PM · Acme-chief, Operations, Traffic
Vgutierrez created T252901: Let's Encrypt OCSP responders are showing 503 errors.
May 15 2020, 6:03 PM · Acme-chief, Operations, Traffic
Vgutierrez closed T252881: acme-chief crashes upon OCSP responder errors as Resolved.
May 15 2020, 5:57 PM · Operations, Traffic, Acme-chief
Vgutierrez added a comment to T252881: acme-chief crashes upon OCSP responder errors.

OCSP responder issues reported to LE in https://community.letsencrypt.org/t/ocsp-responder-returning-503-errors/122846

May 15 2020, 3:33 PM · Operations, Traffic, Acme-chief
Vgutierrez triaged T252881: acme-chief crashes upon OCSP responder errors as Medium priority.
May 15 2020, 1:45 PM · Operations, Traffic, Acme-chief
Vgutierrez moved T252881: acme-chief crashes upon OCSP responder errors from Triage to TLS on the Traffic board.
May 15 2020, 1:11 PM · Operations, Traffic, Acme-chief
Vgutierrez added a project to T252881: acme-chief crashes upon OCSP responder errors: Traffic.
May 15 2020, 1:10 PM · Operations, Traffic, Acme-chief
Vgutierrez created T252881: acme-chief crashes upon OCSP responder errors.
May 15 2020, 1:10 PM · Operations, Traffic, Acme-chief

May 8 2020

Vgutierrez added a comment to T251219: cp5012 memory errors.

do we have an ETA on this one? :)

May 8 2020, 3:40 PM · Operations, ops-eqsin, Traffic

May 7 2020

Vgutierrez added a comment to T251726: Certificate *.wikipedia.org valid until 2020-06-20.

if every LE certificate checked by that icinga check it's issued by acme-chief then yes, it's good

May 7 2020, 1:55 PM · Traffic, serviceops, Operations

May 5 2020

Vgutierrez added a comment to T251726: Certificate *.wikipedia.org valid until 2020-06-20.

Even if we still use non-LE certs in some DCs i believe this is ok since we should also have other monitoring for the expiration of that cert. We do, right?

the icinga check on cp hosts currently warns 30 days before and goes critical 15 days before cert expiration. IMHO 7 / 3 is not enough for the unified cert even when LE is the issuer considering our anti clock skew measures and that acme-chief should issue the new cert 30 days before the valid one expires

May 5 2020, 3:09 PM · Traffic, serviceops, Operations