Page MenuHomePhabricator

Vgutierrez (Valentín Gutiérrez)
Staff Site Reliability Engineer, Traffic Team

Projects (14)

Today

  • No visible events.

Tomorrow

  • No visible events.

Sunday

  • No visible events.

User Details

User Since
Feb 12 2018, 9:51 AM (407 w, 4 d)
Availability
Available
IRC Nick
vgutierrez
LDAP User
Vgutierrez
MediaWiki User
VGutiérrez (WMF) [ Global Accounts ]

Recent Activity

Yesterday

Vgutierrez added a comment to T411781: lvs1018: remove cross-rack links to rows A, C and D.

the assessment is OK and the link can be removed safely

Thu, Dec 4, 2:03 PM · Patch-For-Review, DC-Ops, ops-eqiad, Infrastructure-Foundations, netops, SRE

Wed, Dec 3

Vgutierrez triaged T411584: Refresh trafficserver_backend_requests_seconds histogram as Medium priority.
Wed, Dec 3, 9:34 AM · Patch-For-Review, Traffic
Vgutierrez assigned T411584: Refresh trafficserver_backend_requests_seconds histogram to CDobbins.
Wed, Dec 3, 9:30 AM · Patch-For-Review, Traffic
Vgutierrez created T411584: Refresh trafficserver_backend_requests_seconds histogram.
Wed, Dec 3, 9:29 AM · Patch-For-Review, Traffic

Tue, Dec 2

Vgutierrez updated the task description for T411467: Let's Encrypt Decreasing Certificate Lifetimes to 45 Days.
Tue, Dec 2, 10:18 AM · Patch-For-Review, Acme-chief, Traffic
Vgutierrez triaged T411467: Let's Encrypt Decreasing Certificate Lifetimes to 45 Days as Medium priority.
Tue, Dec 2, 9:58 AM · Patch-For-Review, Acme-chief, Traffic
Vgutierrez created T411467: Let's Encrypt Decreasing Certificate Lifetimes to 45 Days.
Tue, Dec 2, 9:58 AM · Patch-For-Review, Acme-chief, Traffic

Wed, Nov 26

Vgutierrez added a comment to T408062: FY 25/26 WE 5.4.7 Standardize thumbnail sizes.

We are now rate-limiting non thumbnail steps requests for cache misses when certain X-Is-Browser thresholds are met

Wed, Nov 26, 12:26 PM · MediaViewer, Data-Persistence, Thumbor, SRE-swift-storage, Traffic

Tue, Nov 25

Vgutierrez added a comment to T410944: Reboot cookbook workflow leaves Puppet disabled.

From SREBatchRunnerBase __reboot_action():

Tue, Nov 25, 5:51 PM · Traffic, Infrastructure-Foundations, SRE-tools, SRE

Thu, Nov 13

Vgutierrez moved T410019: alerts should be triggered if druid fails to consume webrequest_sampled kafka topic from Backlog to Radar/Not for Service on the Traffic board.
Thu, Nov 13, 11:04 AM · observability, Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Sustainability (Incident Followup), Traffic, SRE
Vgutierrez added a project to T410019: alerts should be triggered if druid fails to consume webrequest_sampled kafka topic: Data-Engineering.
Thu, Nov 13, 10:57 AM · observability, Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Sustainability (Incident Followup), Traffic, SRE
Vgutierrez removed a project from T410019: alerts should be triggered if druid fails to consume webrequest_sampled kafka topic: Data-Engineering.
Thu, Nov 13, 10:57 AM · observability, Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Sustainability (Incident Followup), Traffic, SRE
Vgutierrez added a project to T410019: alerts should be triggered if druid fails to consume webrequest_sampled kafka topic: Sustainability (Incident Followup).
Thu, Nov 13, 10:56 AM · observability, Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Sustainability (Incident Followup), Traffic, SRE
Vgutierrez triaged T410019: alerts should be triggered if druid fails to consume webrequest_sampled kafka topic as High priority.
Thu, Nov 13, 10:55 AM · observability, Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Sustainability (Incident Followup), Traffic, SRE
Vgutierrez created T410019: alerts should be triggered if druid fails to consume webrequest_sampled kafka topic.
Thu, Nov 13, 10:55 AM · observability, Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Sustainability (Incident Followup), Traffic, SRE

Mon, Nov 10

Vgutierrez moved T409575: Telegram previews broken since unified mobile routing from Backlog to Radar/Not for Service on the Traffic board.
Mon, Nov 10, 9:29 AM · MediaWiki-Platform-Team (Radar), Upstream, Traffic
Vgutierrez added a comment to T409575: Telegram previews broken since unified mobile routing.

a quick check using https://en.wikipedia.org/wiki/Main_Page?vgutierrez=tg resulted in telegram bot visiting https://en.m.wikipedia.org/wiki/Main_Page?vgutierrez=tg and retrying after getting a 301 instead of following the redirect

Mon, Nov 10, 9:28 AM · MediaWiki-Platform-Team (Radar), Upstream, Traffic

Nov 3 2025

Vgutierrez added a comment to T409114: acme-chief doesn't automatically re-create certificates on SAN change.

It does but acme-chief also respects the staging time so it's probably under /new instead of /live or still blocked cause it's waiting the staging time till a previous version gets deployed

Nov 3 2025, 9:11 PM · Acme-chief

Oct 21 2025

Vgutierrez raised the priority of T407787: Alertmanager triggers an alert on IRC and email after the alert has resolved from Low to Medium.

downtiming for prometheus alertmanager seems broken to me. What we are seeing here looks like this:

  • The metric was in an alerting state from 18:45:30 o 18:49:00 per https://grafana.wikimedia.org/goto/BShBVVgDg?orgId=1
  • Downtime (silence) was present, so even if the alert condition was true for 3m, notifications were suppressed.
  • At 18:49:00 the alert condition cleared and at 18:49:19 the downtime was removed.
  • Prometheus’s next evaluation (at 18:49:25) saw that the alert had been pending and satisfied for: 3m, so it fired the alert. This can be verified with this query: https://grafana.wikimedia.org/goto/uJUkS4Rvg?orgId=1 that shows the alert firing till 18:49:30
Oct 21 2025, 1:23 PM · Infrastructure-Foundations, SRE-tools, Spicerack, Traffic, Observability-Alerting
Vgutierrez updated subscribers of T407826: X-Request-Id response header off by 5000.
Oct 21 2025, 9:17 AM · serviceops, Traffic
Vgutierrez moved T407826: X-Request-Id response header off by 5000 from Backlog to Radar/Not for Service on the Traffic board.

This is pretty weird, according to RFC 9562 for UUID v4, the third block should always start with a 4.

Oct 21 2025, 9:16 AM · serviceops, Traffic

Oct 20 2025

Vgutierrez reopened T193473: Add HTTPS support to wdqs-internal service, a subtask of T297555: [epic] Brian's onboarding to the Search Platform team, as Open.
Oct 20 2025, 2:19 PM · Discovery-Search (Current work), Epic
Vgutierrez reopened T193473: Add HTTPS support to wdqs-internal service as "Open".

wdqs-internal-main still has traffic on port 80 in codfw:

TCP  10.2.1.93:80 mh (mh-port)
  -> 10.192.0.85:80               Tunnel  10     67         73        
  -> 10.192.32.155:80             Tunnel  10     46         87        
  -> 10.192.32.156:80             Tunnel  10     52         86
Oct 20 2025, 2:19 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Wikidata-Query-Service, Wikidata

Oct 15 2025

Vgutierrez moved T407320: Package benthos/redpanda for trixie from Backlog to Radar/Not for Service on the Traffic board.
Oct 15 2025, 8:39 AM · Observability-Logging, Traffic
Vgutierrez created T407320: Package benthos/redpanda for trixie.
Oct 15 2025, 8:38 AM · Observability-Logging, Traffic

Oct 14 2025

Vgutierrez added a comment to T407194: Consider using EdDSA rather than RSA for MediaWiki session tokens.

we need to double check that HAProxy supports EdDSA for JWT verification purposes

Oct 14 2025, 4:22 PM · MediaWiki-Platform-Team, Traffic, MediaWiki-Core-AuthManager

Oct 13 2025

Vgutierrez closed T221976: Have CDN edge set the `X-Request-Id` header for incoming external requests, a subtask of T201409: Harmonise the identification of requests across our stack, as Resolved.
Oct 13 2025, 9:53 AM · Traffic-Icebox, Platform Team Legacy (Designing), User-CDanis, TechCom-RFC (TechCom-RFC-Closed), SRE
Vgutierrez closed T221976: Have CDN edge set the `X-Request-Id` header for incoming external requests as Resolved.
Oct 13 2025, 9:53 AM · MediaWiki-Platform-Team (Radar), Traffic, Platform Engineering (Icebox), SRE

Oct 8 2025

Vgutierrez added a comment to T397661: Phabricator videos fail in Firefox ("Range" request gets 503 from Varnish).

following up on my last comment, the webm file size is 1660261 bytes, so a request asking for a range starting at 1660261 should probably trigger a 416 Range Not Satisfiable response instead of a 503 but it still doesn't look like a valid request for me, it I use a valid range like 1000-1024, the request gets a valid response every time but inconsistent, the first one it gets a 1660261 bytes response back, and the following ones a 25 bytes response as requested on the Range header

Oct 8 2025, 4:36 PM · Release-Engineering-Team (Radar), Traffic, Phabricator
Vgutierrez added a comment to T397661: Phabricator videos fail in Firefox ("Range" request gets 503 from Varnish).

Taking a second look at the curl reproducer, I'm seeing the following behavior:

Oct 8 2025, 4:15 PM · Release-Engineering-Team (Radar), Traffic, Phabricator
Vgutierrez created T406733: registry.cloud.releng.team returning 503s.
Oct 8 2025, 1:10 PM · Patch-For-Review, collaboration-services, GitLab (CI & Job Runners)

Oct 7 2025

Vgutierrez added a comment to T221976: Have CDN edge set the `X-Request-Id` header for incoming external requests.

Do we know what the current behavior is for layers that set X-Request-ID?

Oct 7 2025, 8:24 AM · MediaWiki-Platform-Team (Radar), Traffic, Platform Engineering (Icebox), SRE

Oct 3 2025

Vgutierrez added a comment to T343000: HAProxy metrics go down on config reload.

yes, it's still hapenning https://grafana.wikimedia.org/goto/SHdP6s3HR?orgId=1:

image.png (1×1 px, 72 KB)

Oct 3 2025, 2:28 PM · SRE, observability, Traffic

Sep 11 2025

Vgutierrez added a comment to T403767: Add an Allow header on 405 responses.

you got a nice mix of uses cases there @A_smart_kitten.

Sep 11 2025, 4:38 PM · Traffic

Sep 9 2025

Vgutierrez added a comment to T390813: Upgrade End Of Support Junos.

@Vgutierrez @ssingh could that be a good opportunity to see how drmrs handles the loss of a switch/rack ?

With the site depooled, and while one ToR switch is upgrading, maybe we could see if the other rack could handle all the traffic properly ?

Sep 9 2025, 10:24 AM · Traffic, netops, Infrastructure-Foundations

Sep 8 2025

Vgutierrez added a comment to T401383: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests.

probably unrelated but I've found what it could be a HAProxy bug related to %rt being increased twice per request: https://github.com/haproxy/haproxy/issues/3107

Sep 8 2025, 11:00 AM · Traffic, Data-Engineering
Vgutierrez closed T403767: Add an Allow header on 405 responses as Resolved.
vgutierrez@cp6016:~$ curl -X TRACE -i https://en.wikipedia.org
HTTP/2 405 
content-length: 146
cache-control: no-cache
content-type: text/html
server: HAProxy
x-cache: cp6016 int
x-cache-status: int-tls
allow: DELETE, GET, HEAD, OPTIONS, PATCH, POST, PUT
Sep 8 2025, 10:13 AM · Traffic
Vgutierrez added a comment to T401383: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests.

local tests show that HAProxy issued 46410639 but it never reached the kafka cluster, probably because haproxykafka failed to parse it for some reason, if this happens systematically after a BADREQ I think we could have a bug on HPK

Sep 8 2025, 10:12 AM · Traffic, Data-Engineering

Sep 5 2025

Vgutierrez closed T403616: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep as Resolved.

puppet is now happy on deployment-cache-text08.

Sep 5 2025, 1:28 PM · Traffic, Beta-Cluster-Infrastructure
Vgutierrez closed T403616: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep, a subtask of T398161: FY 25/26 WE 5.4.3: CDN (text) filtering rationalization, as Resolved.
Sep 5 2025, 1:28 PM · SRE
Vgutierrez added a comment to T403616: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep.

puppet is happier in deployment-cache-text08 but not 100%:

Sep  5 10:22:01 deployment-cache-text08 puppet-agent[1978923]: (/Stage[main]/Profile::Cache::Haproxy/File[/usr/share/GeoIP/datacenter.mmdb]) Could not evaluate: Could not retrieve information from environment production source(s) puppet:///volatile/datacenter_vendors/datacenter.mmdb
Sep 5 2025, 10:45 AM · Traffic, Beta-Cluster-Infrastructure
Vgutierrez added a comment to T221976: Have CDN edge set the `X-Request-Id` header for incoming external requests.

FWIW HAProxy provides UUIDv4 out of the box so it should be as easy as http-request set-header X-Request-Id %[uuid()].

Sep 5 2025, 10:08 AM · MediaWiki-Platform-Team (Radar), Traffic, Platform Engineering (Icebox), SRE

Sep 4 2025

Vgutierrez created T403767: Add an Allow header on 405 responses.
Sep 4 2025, 5:23 PM · Traffic
Vgutierrez reassigned T403695: Grant Access to <wmde and nda>for <mahmoud-abdelsattar> from Vgutierrez to JMeybohm.

assigning the task to @JMeybohm, he is the SRE on clinic duty this week

Sep 4 2025, 9:08 AM · SRE, Wikidata, Wikidata Omega Product, SRE-Access-Requests, LDAP-Access-Requests

Sep 3 2025

Vgutierrez added a comment to T402512: Export development_network_probe data to Puppet servers for CDN deployment.

That's great, thanks @brouberol

Sep 3 2025, 11:01 AM · Infrastructure-Foundations, Data-Engineering, Traffic
Vgutierrez added a comment to T402512: Export development_network_probe data to Puppet servers for CDN deployment.

I think we have 3 upcoming DAGs:

  • the one covered by this task
  • probenet data for the GeoDNS pipeline (T380626)
  • Curate lists of well-known JA3N hashes (part of T400270)
Sep 3 2025, 10:25 AM · Infrastructure-Foundations, Data-Engineering, Traffic

Sep 2 2025

Vgutierrez updated subscribers of T402512: Export development_network_probe data to Puppet servers for CDN deployment.

@brouberol hey! it looks like we split airflow airflow instances by team and we don't currently have an instance for SRE so I'm guessing we would need to create it as well?

Sep 2 2025, 10:42 AM · Infrastructure-Foundations, Data-Engineering, Traffic

Sep 1 2025

Vgutierrez added a comment to T400119: Block traffic from user-agents not honoring our policy.

Re the comment: "Allow user-agents with contact information" - implies blocking UAs with no contact information. Is this referring to a subset of queries? I understood from earlier that a legacy client-side app with a UA modeled on a browser UA would be OK (unless it runs into a rate limit). Still true?

You're totally right, that's referring to library defaults UAs like python-requests

Sep 1 2025, 3:25 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Vgutierrez added a comment to T400119: Block traffic from user-agents not honoring our policy.

I still have an error in my unit tests from Gitlab CI when trying to access https://upload.wikimedia.org/wikipedia/commons/d/d2/Epichlorhydrin_vzorec.webp

[ERROR]   ImageUtilsTest.testReadImage:29 » IO GET /wikipedia/commons/d/d2/Epichlorhydrin_vzorec.webp => Forbidden -- [content-length:"92", content-type:"text/plain", x-analytics:"", server:"HAProxy", x-cache:"cp1105 int", x-cache-status:"int-tls"] -- Please set a user-agent and respect our robot policy https://w.wiki/4wJS. See also T400119.

The same test works for two other URLs with the same user agent, why?

Sep 1 2025, 9:32 AM · User-notice-archive, Patch-For-Review, Traffic, SRE

Aug 30 2025

Vgutierrez added a comment to T400119: Block traffic from user-agents not honoring our policy.

Re-upping a question I had earlier - will the servers' "Retry-After" header use seconds, or http-date, or potentially either? Of course it would be easy to figure out in my code, but it would still be good to know.

Aug 30 2025, 6:01 PM · User-notice-archive, Patch-For-Review, Traffic, SRE

Aug 29 2025

Vgutierrez added a comment to T400119: Block traffic from user-agents not honoring our policy.

@Joe Could I ask for a two week exemption for diff.wikimedia.org until we have our next sprint with our devs? Right now folks can't log into the community blog to share their stories and updates and I won't have developer time until the second week of September.

Aug 29 2025, 3:08 PM · User-notice-archive, Patch-For-Review, Traffic, SRE

Aug 26 2025

Vgutierrez closed T402634: varnish-frontend-slowlog service restarts with decoding error as Resolved.
Aug 26 2025, 3:24 PM · Traffic
Vgutierrez closed T401383: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests as Resolved.

I'm closing this since we've fixed the wrong behavior on HAProxy regarding sequence numbers, please feel to re-open it if you're still detecting issues on your side with sequence numbers.

Aug 26 2025, 3:23 PM · Traffic, Data-Engineering
Vgutierrez added a comment to T400119: Block traffic from user-agents not honoring our policy.

Our goal is to block all traffic from unidentified clients and not coming from authorized actors, like toolsforge or our internal APIs.

FWIW, this seems to not be the case 100%. It seems one of our GitLab pipelines was affected by this, see T402801: CI in schemas-event-secondary fails because the tests do not follow WMF's Robot policy. Fortunately, fixing the UA string was simple enough (and reasonable anyway, in case someone runs the test locally), but we do seem to block requests coming from our own infra?

Aug 26 2025, 1:59 PM · User-notice-archive, Patch-For-Review, Traffic, SRE

Aug 25 2025

Vgutierrez updated the task description for T400119: Block traffic from user-agents not honoring our policy.
Aug 25 2025, 2:57 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Vgutierrez updated the task description for T400119: Block traffic from user-agents not honoring our policy.
Aug 25 2025, 2:57 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Vgutierrez updated the task description for T400119: Block traffic from user-agents not honoring our policy.
Aug 25 2025, 2:56 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Vgutierrez triaged T402792: Consider rate limiting non-standard thumbnail sizes as Medium priority.
Aug 25 2025, 12:34 PM · Traffic
Vgutierrez created T402792: Consider rate limiting non-standard thumbnail sizes.
Aug 25 2025, 12:33 PM · Traffic
Vgutierrez added a comment to T352245: Migrate the etcd main cluster to cfssl-based PKI.

I'd like to get this moving again early next week, ideally Tuesday or Wednesday during Europe / Americas overlap hours (starting ~ 14:00 UTC or before).

@Vgutierrez - If you might be available to assist in verifying Liberica control-plane health on either of those days, that would be greatly appreciated.

Aug 25 2025, 10:42 AM · Patch-For-Review, serviceops

Aug 22 2025

Vgutierrez closed T402546: Unable to log into LinguaLibre due to user-agent / rate limit as Resolved.
Aug 22 2025, 6:03 PM · Lingua-Libre-Legacy
Vgutierrez added a comment to T402546: Unable to log into LinguaLibre due to user-agent / rate limit.

From Special:Version it appears Lingua Libre is running mediawiki/oauthclient 1.1.0. The ability to set a custom User-Agent (via Config::setUserAgent) was added in 1.2.0 (commit 81edea5f545ef551cb1fb3e8937fd81c549fa94b, task T293609).

Aug 22 2025, 3:37 PM · Lingua-Libre-Legacy
Vgutierrez added a comment to T402546: Unable to log into LinguaLibre due to user-agent / rate limit.

@Yug thanks, no need to keep appending user reports till @mickeybarber reports back. This is a well-known and expected behavior of the CDN as announced on https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/APW6FQBGIVLCEN7WZ65D4NVZ6XQIWGCW/ back in 2025-07-30

Aug 22 2025, 11:25 AM · Lingua-Libre-Legacy
Vgutierrez lowered the priority of T401383: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests from High to Medium.

sampling on webrequest_sampled has been fixed by merging https://gerrit.wikimedia.org/r/1181033

Aug 22 2025, 10:23 AM · Traffic, Data-Engineering
Vgutierrez closed T402022: Superset / LDAP access for aude as Resolved.

looks good, I took care of merging it, thanks for the patch @Dzahn

Aug 22 2025, 8:00 AM · Data-Engineering, SRE-Access-Requests, SRE
Vgutierrez closed T402384: Requesting access to analytics for Dima_Koushha_WMDE as Resolved.

change has been merged, please allow 30 minutes to let puppet apply the changes on the required systems. Thanks

Aug 22 2025, 7:57 AM · SRE, SRE-Access-Requests

Aug 21 2025

Vgutierrez moved T401383: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests from Backlog to Actively Servicing on the Traffic board.
Aug 21 2025, 5:57 PM · Traffic, Data-Engineering
Vgutierrez triaged T401383: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests as High priority.

flagging as high cause this is already making the downsampling in benthos fail (nice catch by @CDanis):

root = if this.ip != "-" && this.sequence != "-" && this.sequence % env("SAMPLING").number() != 0 { deleted() }
Aug 21 2025, 5:56 PM · Traffic, Data-Engineering
Vgutierrez added a comment to T401383: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests.

Right now we get the sequence number from haproxy %rt log format, that's request_counter (HTTP req or TCP session) according to its documentation. On early stages of the TCP connection it seems like the request counter isn't accesible so it gets logged as 0.

Aug 21 2025, 5:43 PM · Traffic, Data-Engineering
Vgutierrez added a comment to T401383: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests.

I think I've identified the issue, right now haproxy always log sequence: 0 for <BADREQ> requests

Aug 21 2025, 5:22 PM · Traffic, Data-Engineering
Vgutierrez updated the task description for T402512: Export development_network_probe data to Puppet servers for CDN deployment.
Aug 21 2025, 12:40 PM · Infrastructure-Foundations, Data-Engineering, Traffic
Vgutierrez triaged T402512: Export development_network_probe data to Puppet servers for CDN deployment as Medium priority.
Aug 21 2025, 12:36 PM · Infrastructure-Foundations, Data-Engineering, Traffic
Vgutierrez created T402512: Export development_network_probe data to Puppet servers for CDN deployment.
Aug 21 2025, 12:36 PM · Infrastructure-Foundations, Data-Engineering, Traffic
Vgutierrez moved T402384: Requesting access to analytics for Dima_Koushha_WMDE from Awaiting User Input to Patch in Review on the SRE-Access-Requests board.
Aug 21 2025, 12:27 PM · SRE, SRE-Access-Requests
Vgutierrez edited projects for T402014: Add ipblock-source objects and logic, added: Hiddenparma, Traffic; removed SRE.
Aug 21 2025, 12:17 PM · Patch-For-Review, Traffic, Hiddenparma
Vgutierrez updated the task description for T402384: Requesting access to analytics for Dima_Koushha_WMDE.
Aug 21 2025, 12:14 PM · SRE, SRE-Access-Requests
Vgutierrez added a comment to T402384: Requesting access to analytics for Dima_Koushha_WMDE.

SSH key verified out of band via https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/1180839

Aug 21 2025, 12:13 PM · SRE, SRE-Access-Requests
Vgutierrez closed T401118: Grant Access to <wmde and nda>for <sadiyamohammed13> as Resolved.
vgutierrez@ldap-maint1001:~$ ldapsearch -x cn=nda |grep sad
member: uid=sadiyamohammed13,ou=people,dc=wikimedia,dc=org
vgutierrez@ldap-maint1001:~$ ldapsearch -x cn=wmde |grep sad
member: uid=sadiyamohammed13,ou=people,dc=wikimedia,dc=org
Aug 21 2025, 11:14 AM · Wikidata, Wikidata Omega Product, SRE, LDAP-Access-Requests
Vgutierrez claimed T401118: Grant Access to <wmde and nda>for <sadiyamohammed13>.
Aug 21 2025, 9:16 AM · Wikidata, Wikidata Omega Product, SRE, LDAP-Access-Requests
Vgutierrez moved T401118: Grant Access to <wmde and nda>for <sadiyamohammed13> from NDA Pending to Code Review Pending on the LDAP-Access-Requests board.
Aug 21 2025, 9:16 AM · Wikidata, Wikidata Omega Product, SRE, LDAP-Access-Requests

Aug 20 2025

Vgutierrez moved T402384: Requesting access to analytics for Dima_Koushha_WMDE from Untriaged to Awaiting User Input on the SRE-Access-Requests board.

@Dima_Koushha_WMDE could you create a gerrit change that contains your public SSH key (it can be immediately abandoned)? we can use that as a way of verifying the SSH key out-of-band, thanks

Aug 20 2025, 4:08 PM · SRE, SRE-Access-Requests
Vgutierrez closed T402191: Requesting access to analytics-wmde-users and analytics-privatedata-users for dang as Resolved.

The change granting access to the requested groups has been merged, please allow up to 30 minutes to let puppet apply the changes on the impacted servers. Thanks

Aug 20 2025, 4:01 PM · SRE, SRE-Access-Requests
Vgutierrez updated the task description for T402191: Requesting access to analytics-wmde-users and analytics-privatedata-users for dang.
Aug 20 2025, 8:13 AM · SRE, SRE-Access-Requests
Vgutierrez added a comment to T402191: Requesting access to analytics-wmde-users and analytics-privatedata-users for dang.

@Vgutierrez Yes you can re-use it, it's better that way, and that's my account anw

Aug 20 2025, 8:10 AM · SRE, SRE-Access-Requests

Aug 19 2025

Vgutierrez added a comment to T402191: Requesting access to analytics-wmde-users and analytics-privatedata-users for dang.

I'm already seeing an account (https://ldap.toolforge.org/user/dang) requested on T288355 with some privileges:

dang:
  ensure: present
  realname: Tien Dat Nguyen
  email: dat.nguyen@wikimedia.de
  uid: 32183
  gid: 500
  ssh_keys:
    - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICnlB6UtmPKPJZOXl/2fkAC88ccb9dn15upi0SsifFg5 dang@C353
Aug 19 2025, 4:33 PM · SRE, SRE-Access-Requests
Vgutierrez added a comment to T402191: Requesting access to analytics-wmde-users and analytics-privatedata-users for dang.

I'm seeing you have 3 LDAP accounts at the moment:

Aug 19 2025, 3:01 PM · SRE, SRE-Access-Requests
Vgutierrez moved T402191: Requesting access to analytics-wmde-users and analytics-privatedata-users for dang from Untriaged to Awaiting User Input on the SRE-Access-Requests board.

@dang could you create a CR on gerrit with your public SSH key to confirm it? thanks!

Aug 19 2025, 2:52 PM · SRE, SRE-Access-Requests
Vgutierrez triaged T402014: Add ipblock-source objects and logic as Medium priority.
Aug 19 2025, 2:40 PM · Patch-For-Review, Traffic, Hiddenparma

Aug 18 2025

Vgutierrez added a comment to T401246: Impact analysis for Haproxykafka data loss.

sorry for the delay @Mayakp.wiki.

Aug 18 2025, 4:05 PM · Movement-Insights
Vgutierrez closed T401902: Possible SSL certificate expiration as Declined.

Thanks, please feel free to re-open it if needed

Aug 18 2025, 2:53 PM · Traffic
Vgutierrez edited projects for T390087: eqiad: VMs requested for Data Persistence automation and testbeds, added: Infrastructure-Foundations; removed SRE.
Aug 18 2025, 1:46 PM · Infrastructure-Foundations, vm-requests
Vgutierrez triaged T402022: Superset / LDAP access for aude as Medium priority.
Aug 18 2025, 10:24 AM · Data-Engineering, SRE-Access-Requests, SRE

Aug 14 2025

Vgutierrez added a comment to T352956: Handling inbound IPIP traffic on low traffic LVS k8s based realservers.

you got that available as part of the sre.loadbalancer.migrate-service-ipip cookbook on https://gerrit.wikimedia.org/r/plugins/gitiles/operations/cookbooks/+/refs/heads/master/cookbooks/sre/loadbalancer/migrate-service-ipip.py#131:

def _ipip_traffic_accepted(self,  *,
                           outer_src_ip: str, outer_dst_ip: str,
                           inner_src_ip: str, inner_dst_ip: str,
                           dport: int) -> bool:
    """Send a single SYN packet using IPIP encapsulation"""
    s = socket(AF_INET, SOCK_STREAM)
    s.bind((inner_src_ip, 0))
    sport = s.getsockname()[1]
    syn_packet = (
        IP(src=outer_src_ip, dst=outer_dst_ip) /
        IP(src=inner_src_ip, dst=inner_dst_ip) /
        TCP(sport=sport, dport=dport, flags="S", seq=1000)
    )
    response = sr1(syn_packet, timeout=3, verbose=self.dry_run)
    s.close()
    return response is not None
Aug 14 2025, 2:02 PM · Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops, Traffic
Vgutierrez triaged T401902: Possible SSL certificate expiration as Low priority.

Do you know which specific hostname the volunteer is asking about?

Aug 14 2025, 10:32 AM · Traffic

Aug 13 2025

Vgutierrez moved T399502: Test/benchmark Mellanox NICs for LVS usage from Backlog to Actively Servicing on the Traffic board.
Aug 13 2025, 3:35 PM · Liberica, Traffic
Vgutierrez added a comment to T352956: Handling inbound IPIP traffic on low traffic LVS k8s based realservers.

Yup, scheduling it for the weeks of either August 11th or August 18th.

Aug 13 2025, 3:24 PM · Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops, Traffic
Vgutierrez claimed T401824: Get ready to upgrade liberica LBs to trixie.
Aug 13 2025, 12:31 PM · Traffic, Liberica
Vgutierrez triaged T401824: Get ready to upgrade liberica LBs to trixie as Medium priority.
Aug 13 2025, 12:30 PM · Traffic, Liberica
Vgutierrez created T401824: Get ready to upgrade liberica LBs to trixie.
Aug 13 2025, 12:29 PM · Traffic, Liberica

Aug 7 2025

Vgutierrez added a comment to T400270: Browser behaviour detection at the edge.

https://phabricator.wikimedia.org/P80962 for future reference

Aug 7 2025, 11:22 AM · Patch-For-Review, Hiddenparma, Traffic, SRE