In T356412#9766840, @MatthewVernon wrote:

I think I have two questions:

Where is it defined what should and shouldn't get its own intermediate? (e.g. I see cassandra has one)

Fri, May 3, 2:11 PM · Patch-For-Review, SRE, SRE-swift-storage

CDanis updated the name of F49977482: Brazil RTT broken down by subdivision from "image.png" to "Brazil RTT broken down by subdivision".

Fri, May 3, 1:23 PM

CDanis added a comment to F49974214: Initial user-measured magru latency as violin plots, per country, Latin/South America.

python
import wmfdata
spark = wmfdata.spark.create_session(type='yarn-regular')

Fri, May 3, 1:12 PM

CDanis updated the name of F49974214: Initial user-measured magru latency as violin plots, per country, Latin/South America from "Initial user-measured magru latency, per country, Latin/South America" to "Initial user-measured magru latency as violin plots, per country, Latin/South America".

Fri, May 3, 1:00 PM

CDanis added a comment to T359054: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps.

magru is a clear win for:
UY, CL, AR, BR, PY

Fri, May 3, 12:59 PM · Infrastructure-Foundations, SRE, Traffic

CDanis updated the name of F49974214: Initial user-measured magru latency as violin plots, per country, Latin/South America from "image.png" to "Initial user-measured magru latency, per country, Latin/South America".

Fri, May 3, 12:55 PM

Thu, May 2

CDanis added a comment to T356412: Consolidate TLS cert puppetry for ms and thanos swift frontends.

That sounds good to me @elukey . I don't think a new intermediate is needed.

Thu, May 2, 8:47 PM · Patch-For-Review, SRE, SRE-swift-storage

CDanis added a comment to T363971: scap should not run mediawiki-image-download on pooled=inactive servers.

FYI this happened for me again, despite the above patch

19:48:44 /usr/bin/sudo /usr/local/sbin/mediawiki-image-download 2024-05-02-194555-publish (ran as mwdeploy@mw2382.codfw.wmnet) returned [255]: ssh: connect to host mw2382.codfw.wmnet port 22: Connection timed out

Thu, May 2, 8:27 PM · Release-Engineering-Team, Scap

CDanis added parent tasks for T362902: Add probenet configuration for magru: T359054: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps, T363722: Craft geo-maps file to create lowest-latency routes from south america.

Thu, May 2, 8:09 PM · probenet, MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), Patch-For-Review, netops, Infrastructure-Foundations, SRE

CDanis added a subtask for T359054: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps: T362902: Add probenet configuration for magru.

Thu, May 2, 8:09 PM · Infrastructure-Foundations, SRE, Traffic

CDanis added a subtask for T363722: Craft geo-maps file to create lowest-latency routes from south america: T362902: Add probenet configuration for magru.

Thu, May 2, 8:09 PM · Traffic

CDanis closed T362902: Add probenet configuration for magru as Resolved.

Thu, May 2, 8:09 PM · probenet, MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), Patch-For-Review, netops, Infrastructure-Foundations, SRE

CDanis closed T362902: Add probenet configuration for magru, a subtask of T362421: magru network setup, as Resolved.

Thu, May 2, 8:08 PM · Patch-For-Review, netops, SRE, Infrastructure-Foundations

CDanis created P61742 (An Untitled Masterwork).

Thu, May 2, 3:43 PM

CDanis assigned T362786: Enable dbctl for parsercache to Scott_French.

+1, omit_replicas_in_mwconfig seems like the right way to begin implementing this.

Thu, May 2, 1:19 PM · Infrastructure-Foundations, Data-Persistence, conftool

Wed, May 1

CDanis added a subtask for T350592: EPIC: migrate in use metrics and dashboards to statslib: T363914: Discrepancy between Graphite & Prometheus editResponseTime counts.

Wed, May 1, 3:09 PM · Epic, SRE Observability (FY2023/2024-Q4), MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics

CDanis added a parent task for T363914: Discrepancy between Graphite & Prometheus editResponseTime counts: T350592: EPIC: migrate in use metrics and dashboards to statslib.

Wed, May 1, 3:09 PM · MediaWiki-Platform-Team, Observability-Metrics

CDanis triaged T363914: Discrepancy between Graphite & Prometheus editResponseTime counts as High priority.

Wed, May 1, 3:08 PM · MediaWiki-Platform-Team, Observability-Metrics

CDanis created T363914: Discrepancy between Graphite & Prometheus editResponseTime counts.

Wed, May 1, 3:08 PM · MediaWiki-Platform-Team, Observability-Metrics

Mon, Apr 29

CDanis added a comment to T363407: Proper service names in trace data.

In T363407#9752049, @JMeybohm wrote:

Ok, understood. The only thing I'm really worrying about is that metrics change/get less intuitive with this. For example in here it's pretty clear what the filter means (selecting "local_service"). I think we will loose clarity here if local_service changes to mw-web.eqiad.main. Maybe adding local as suffix/prefix would help here (and you could strip that out again in OTTL?

Mon, Apr 29, 2:36 PM · Observability-Tracing

Fri, Apr 26

CDanis added a comment to T363581: Build a machine-readable catalogue of mariadb tables in production.

I like the idea! A few questions

Fri, Apr 26, 2:26 PM · DBA

Thu, Apr 25

CDanis added a comment to T363407: Proper service names in trace data.

BTW in case it was not clear, my intentions here are basically:

deploy something ASAP (like next week) that everyone is reasonably happy with for the interim
don't do anything to get in the way of the badly-needed Envoy upgrade
don't break anything else

Thu, Apr 25, 4:23 PM · Observability-Tracing

CDanis added a comment to T363407: Proper service names in trace data.

In T363407#9743785, @JMeybohm wrote:

Thanks for the write-up!
What is not very clear to me is what part of the work would need to be done anyways (in case we'd have a envoy version >= 1.24). The reason I'm asking this is that envoy 1.23 is EOL since a year or so, so we need to look at an upgrade anyways.

Thu, Apr 25, 2:19 PM · Observability-Tracing

Wed, Apr 24

Ladsgroup awarded T363407: Proper service names in trace data a Love token.

Wed, Apr 24, 8:06 PM · Observability-Tracing

CDanis added a subtask for T320549: distributed tracing v0 [minimum viable]: T363407: Proper service names in trace data.

Wed, Apr 24, 8:03 PM · Epic, Observability-Tracing

CDanis added a parent task for T363407: Proper service names in trace data: T320549: distributed tracing v0 [minimum viable].

Wed, Apr 24, 8:03 PM · Observability-Tracing

CDanis created T363407: Proper service names in trace data.

Wed, Apr 24, 8:03 PM · Observability-Tracing

Thu, Apr 18

CDanis reopened T360029: Integrate dbctl IP changes as part of VLAN changes. as "Open".

Thu, Apr 18, 2:38 PM · conftool, Data-Persistence, SRE, Infrastructure-Foundations

CDanis reopened T360029: Integrate dbctl IP changes as part of VLAN changes. , a subtask of T354878: Re-IP db servers in codfw row A/B moving to per-rack subnets, as Open.

Thu, Apr 18, 2:37 PM · Data-Persistence, SRE, Infrastructure-Foundations

CDanis closed T360029: Integrate dbctl IP changes as part of VLAN changes. as Resolved.

Anyway I think that all that is needed to unblock VLAN migrations has been done or documented on this ticket? Optimistically closing but please re-open if you disagree.

Thu, Apr 18, 2:20 PM · conftool, Data-Persistence, SRE, Infrastructure-Foundations

CDanis closed T360029: Integrate dbctl IP changes as part of VLAN changes. , a subtask of T354878: Re-IP db servers in codfw row A/B moving to per-rack subnets, as Resolved.

Thu, Apr 18, 2:19 PM · Data-Persistence, SRE, Infrastructure-Foundations

CDanis added a comment to T360029: Integrate dbctl IP changes as part of VLAN changes. .

In T360029#9725627, @Volans wrote:

As for the commit I advocate to add dbctl support in Spicerack but IIRC that requires changes in dbctl as most of its logic is in its CLI part and not exposed as a library, but to be checked.

Thu, Apr 18, 2:19 PM · conftool, Data-Persistence, SRE, Infrastructure-Foundations

CDanis triaged T362893: Spicerack support for dbctl as Low priority.

Thu, Apr 18, 2:17 PM · Infrastructure-Foundations, conftool, SRE-tools, Spicerack

CDanis created T362893: Spicerack support for dbctl.

Thu, Apr 18, 2:16 PM · Infrastructure-Foundations, conftool, SRE-tools, Spicerack

Wed, Apr 17

CDanis added a comment to T359054: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps.

I largely agree with Arzhel's assessment. At a cursory glance, Uruguay or Paraguay look ideal as first candidates.

Wed, Apr 17, 7:12 PM · Infrastructure-Foundations, SRE, Traffic

CDanis added a comment to T360029: Integrate dbctl IP changes as part of VLAN changes. .

I think you should be able to use the existing spicerack interface to confctl to do the set/host_ip=... action -- that should be equivalent to a ConftoolEntity.update call.

Wed, Apr 17, 5:01 PM · conftool, Data-Persistence, SRE, Infrastructure-Foundations

CDanis added a comment to T360029: Integrate dbctl IP changes as part of VLAN changes. .

@Marostegui As it turns out, plain old confctl can be used to do this already.

Wed, Apr 17, 4:07 PM · conftool, Data-Persistence, SRE, Infrastructure-Foundations

CDanis added a comment to T360029: Integrate dbctl IP changes as part of VLAN changes. .

In T360029#9722005, @Ladsgroup wrote:

In T360029#9658042, @CDanis wrote:

Actually the idea is that dbctl should not contain the IPs at all. It should look up the IP via DNS, we should store FQDN instead.

Wed, Apr 17, 3:59 PM · conftool, Data-Persistence, SRE, Infrastructure-Foundations

Tue, Apr 16

CDanis updated the task description for T362719: Upgrade Jaeger to 1.56.0 (latest stable).

Tue, Apr 16, 9:16 PM · Patch-For-Review, User-fgiunchedi, Observability-Tracing

CDanis removed a project from T362719: Upgrade Jaeger to 1.56.0 (latest stable): Epic.

Tue, Apr 16, 9:16 PM · Patch-For-Review, User-fgiunchedi, Observability-Tracing

CDanis created T362719: Upgrade Jaeger to 1.56.0 (latest stable).

Tue, Apr 16, 9:16 PM · Patch-For-Review, User-fgiunchedi, Observability-Tracing

CDanis committed rOSCT2474362169a1: force enable etcd v2 proto.

force enable etcd v2 proto

Tue, Apr 16, 3:05 PM

Mon, Apr 15

CDanis committed rOSCTd2ad7ee548ae: add python 3.11.

add python 3.11

Mon, Apr 15, 10:40 PM

CDanis committed rOSCTf1dd336c7537: Fix nuisance black diffs.

Fix nuisance black diffs

Mon, Apr 15, 10:40 PM

Thu, Apr 11

CDanis updated the title for P60444 tzdump.py from untitled to tzdump.py.

Thu, Apr 11, 7:07 PM

CDanis created P60444 tzdump.py.

Thu, Apr 11, 6:54 PM

Wed, Apr 10

CDanis created P60266 (An Untitled Masterwork).

Wed, Apr 10, 4:15 PM

Mar 27 2024

CDanis closed T359413: Miniature images from og:image not loading in social media links as Resolved.

Mar 27 2024, 7:26 PM · Traffic, PageImages, Regression, WMF-General-or-Unknown

CDanis added a comment to T359413: Miniature images from og:image not loading in social media links.

This has been fixed with this patch, which I forgot to associate with this bug.

Mar 27 2024, 7:26 PM · Traffic, PageImages, Regression, WMF-General-or-Unknown

Mar 26 2024

CDanis added a member for WMF-NDA: fkaelin.

Mar 26 2024, 2:48 PM

CDanis added a member for WMF-NDA: Pablo.

Mar 26 2024, 2:48 PM

Mar 25 2024

CDanis added a comment to T360029: Integrate dbctl IP changes as part of VLAN changes. .

Just to make sure I understand, the request here is an easy-to-automate way of dbctl to change the instance IP address?

Mar 25 2024, 3:37 PM · conftool, Data-Persistence, SRE, Infrastructure-Foundations

CDanis added a project to T360029: Integrate dbctl IP changes as part of VLAN changes. : conftool.

Mar 25 2024, 3:34 PM · conftool, Data-Persistence, SRE, Infrastructure-Foundations

Mar 1 2024

DAlangi_WMF awarded T357050: editResponseTime's port to statslib is not actually backwards-compatible a Barnstar token.

Mar 1 2024, 5:47 PM · MediaWiki-libs-Stats, MW-1.42-notes (1.42.0-wmf.18; 2024-02-13)

Feb 26 2024

CDanis added a comment to T357750: Phase out cergen.

Should this ticket really be "deprecate cergen"? :)

Feb 26 2024, 4:00 PM · Patch-For-Review, Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

CDanis added a comment to T358455: Primary outbound port utilisation over 80% alert muted.

This would best be fixed by extending the haproxy bwlim work done in T317799 -- we've talked about having per-ASN limits in addition to the existing and partially-deployed per-file-URI limits.

Feb 26 2024, 3:45 PM · Traffic, Sustainability (Incident Followup), Infrastructure-Foundations, netops

CDanis claimed T358189: aux-k8s cluster prometheus setup is incomplete.

Feb 26 2024, 3:28 PM · Infrastructure-Foundations, Observability-Tracing

Feb 22 2024

CDanis added a comment to T358111: oauth2-proxy config changes don't cause any change in the helm Deployment.

Sent upstream as https://github.com/jaegertracing/helm-charts/pull/541

Feb 22 2024, 10:15 PM · Observability-Tracing, Patch-For-Review

CDanis closed T358152: troubleshoot why initial pageloads of trace.wikimedia.org are so slow , a subtask of T320549: distributed tracing v0 [minimum viable], as Resolved.

Feb 22 2024, 9:23 PM · Epic, Observability-Tracing

CDanis closed T358152: troubleshoot why initial pageloads of trace.wikimedia.org are so slow as Resolved.

Feb 22 2024, 9:23 PM · Observability-Tracing

CDanis added a comment to T358152: troubleshoot why initial pageloads of trace.wikimedia.org are so slow .

As it turns out, this required a change to the upstream chart:

Feb 22 2024, 9:23 PM · Observability-Tracing

CDanis added a parent task for T358111: oauth2-proxy config changes don't cause any change in the helm Deployment: T321211: distributed tracing v1: tech debt blockers.

Feb 22 2024, 12:46 PM · Observability-Tracing, Patch-For-Review

CDanis added a subtask for T321211: distributed tracing v1: tech debt blockers: T358111: oauth2-proxy config changes don't cause any change in the helm Deployment.

Feb 22 2024, 12:46 PM · Observability-Tracing, Epic

CDanis added a subtask for T320549: distributed tracing v0 [minimum viable]: T358152: troubleshoot why initial pageloads of trace.wikimedia.org are so slow .

Feb 22 2024, 12:45 PM · Epic, Observability-Tracing

CDanis added a parent task for T358152: troubleshoot why initial pageloads of trace.wikimedia.org are so slow : T320549: distributed tracing v0 [minimum viable].

Feb 22 2024, 12:45 PM · Observability-Tracing

Feb 21 2024

CDanis created T358152: troubleshoot why initial pageloads of trace.wikimedia.org are so slow .

Feb 21 2024, 9:53 PM · Observability-Tracing

CDanis updated the task description for T358111: oauth2-proxy config changes don't cause any change in the helm Deployment.

Feb 21 2024, 3:01 PM · Observability-Tracing, Patch-For-Review

CDanis created T358111: oauth2-proxy config changes don't cause any change in the helm Deployment.

Feb 21 2024, 2:56 PM · Observability-Tracing, Patch-For-Review

Feb 16 2024

CDanis added a comment to T320555: cas-sso idp for jaeger-ui on k8s.

I've verified that oauth2-proxy will silently just serve plain HTTP if you specify https_address but don't provide it with TLS key material. So I think I've provided it with such in this patch?

Feb 16 2024, 7:49 PM · User-fgiunchedi, Observability-Tracing

CDanis committed rLPRI6635d0265938: Add faux secret for jaeger in idp.

Add faux secret for jaeger in idp

Feb 16 2024, 4:09 PM