CDanis (Chris Danis)
SRE @ WMF

Projects (11)
View All

conftool
Component
Infrastructure-Foundations
Group
netops
Component
probenet
Component
Security
Tag

Calendar

User Details

User Since: Nov 5 2018, 2:54 PM (293 w, 4 d)
Availability: Available
IRC Nick: cdanis
LDAP User: CDanis
MediaWiki User: CDanis (WMF) [ Global Accounts ]

Recent Activity
View All

Yesterday

CDanis added a comment to T348643: cloudcephosd1021-1034: hard drive sector errors increasing.

Apologies @dcaro but I had less time for this than I expected this week, was only able to do some prep work but not be ready to touch anything in production before Friday.

Fri, Jun 21, 6:40 PM · cloud-services-team (FY2023/2024-Q3-Q4), SRE, ops-eqiad, DC-Ops, Cloud-VPS

CDanis added a comment to T365855: Stop hardcoding k8s master (k8s API) endpoint IP addresses.

@JMeybohm The best way to fix this is by adding a calico definition to the chart directory in a file whose name starts with wmf-, correct?

Fri, Jun 21, 4:48 PM · Observability-Tracing

CDanis created T368162: EXCEPTION: (RuntimeException) Undefined index: subject at [<arcanist>/src/error/PhutilErrorHandler.php:263].

Fri, Jun 21, 4:48 PM · Phabricator, collaboration-services

Thu, Jun 20

CDanis updated the task description for T368064: Some mw-api-int traffic is going cross-DC.

Thu, Jun 20, 3:12 PM · serviceops

CDanis added a comment to T368064: Some mw-api-int traffic is going cross-DC.

From discussion on IRC:

Thu, Jun 20, 3:05 PM · serviceops

CDanis created T368064: Some mw-api-int traffic is going cross-DC.

Thu, Jun 20, 3:02 PM · serviceops

CDanis added a comment to T367915: Roll out randomly-sampled traces on all mw-* clusters.

10:14:32	<@cdanis>	_joe_ akosiaris claime: my plan is to smoke-test {jobrunner,wikifunctions,parsoid,misc} for a few more minutes, then roll out mw-api-int sampled tracing https://gerrit.wikimedia.org/r/1048011 , and then let that bake over the weekend before doing mw-api-ext and then mw-web
10:14:52	<@akosiaris>	👍
10:15:29	<@claime>	ack
10:16:21	<@cdanis>	I also gave o11y a heads up about the increased writes to elasticsearch
10:16:24	<@_joe_>	seems sensible

Thu, Jun 20, 2:52 PM · serviceops-radar, Observability-Tracing

CDanis updated the task description for T367915: Roll out randomly-sampled traces on all mw-* clusters.

Thu, Jun 20, 2:52 PM · serviceops-radar, Observability-Tracing

CDanis updated the task description for T367915: Roll out randomly-sampled traces on all mw-* clusters.

Thu, Jun 20, 2:01 PM · serviceops-radar, Observability-Tracing

CDanis updated the task description for T367915: Roll out randomly-sampled traces on all mw-* clusters.

Thu, Jun 20, 1:48 PM · serviceops-radar, Observability-Tracing

CDanis merged T264667: Define distributed RPC/Request TRACING strategy and tooling recommendation into T340551: distributed tracing epic.

Thu, Jun 20, 1:42 PM · Epic, Observability-Tracing

CDanis merged task T264667: Define distributed RPC/Request TRACING strategy and tooling recommendation into T340551: distributed tracing epic.

Thu, Jun 20, 1:42 PM · Observability-Tracing

Tue, Jun 18

CDanis updated the task description for T367915: Roll out randomly-sampled traces on all mw-* clusters.

Tue, Jun 18, 7:26 PM · serviceops-radar, Observability-Tracing

CDanis created T367915: Roll out randomly-sampled traces on all mw-* clusters.

Tue, Jun 18, 7:26 PM · serviceops-radar, Observability-Tracing

CDanis closed T363407: Proper service names in trace data as Resolved.

If we need to do a version of this for bare-metal hosts, we will, but for now let's not.

Tue, Jun 18, 7:25 PM · Patch-For-Review, Observability-Tracing

CDanis closed T363407: Proper service names in trace data, a subtask of T320549: distributed tracing v0 [minimum viable], as Resolved.

Tue, Jun 18, 7:23 PM · Epic, Observability-Tracing

CDanis added a subtask for T340552: MediaWiki imports OpenTelemetry client instrumentation library for enhanced trace metadata: T367905: Application Security Review Request : OpenTelemetry PHP SDK.

Tue, Jun 18, 6:03 PM · Patch-For-Review, Wikimedia-Hackathon-2024, MediaWiki-Platform-Team (Radar), MediaWiki-libs-HTTP, Observability-Tracing

CDanis added a parent task for T367905: Application Security Review Request : OpenTelemetry PHP SDK: T340552: MediaWiki imports OpenTelemetry client instrumentation library for enhanced trace metadata.

Tue, Jun 18, 6:03 PM · MediaWiki-Vendor, secscrum, Security, Application Security Reviews

CDanis created T367905: Application Security Review Request : OpenTelemetry PHP SDK.

Tue, Jun 18, 6:03 PM · MediaWiki-Vendor, secscrum, Security, Application Security Reviews

CDanis added a subtask for T290536: Serve production traffic via Kubernetes: T367894: update status page latency for mw-on-k8s.

Tue, Jun 18, 5:37 PM · Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s

CDanis added a subtask for T362323: Move 100% of external traffic to Kubernetes: T367894: update status page latency for mw-on-k8s.

Tue, Jun 18, 5:37 PM · Patch-For-Review, MoveComms-Support, SRE, Traffic, serviceops, MW-on-K8s

CDanis closed T367894: update status page latency for mw-on-k8s as Resolved.

💙cdanis@alert1001.wikimedia.org ~ 🕜☕ sudo statograph -c /etc/statograph/config.yml  list_metrics                            
Metric 'Wiki response time' (id lyfcttm2lhw4) with most recent data at Tue, 18 Jun 2024 17:30:00 +0000 (@1718731800.0)

Tue, Jun 18, 5:37 PM · observability

CDanis added parent tasks for T367894: update status page latency for mw-on-k8s: T362323: Move 100% of external traffic to Kubernetes, T290536: Serve production traffic via Kubernetes.

Tue, Jun 18, 5:37 PM · observability

CDanis updated the task description for T367894: update status page latency for mw-on-k8s.

Tue, Jun 18, 5:10 PM · observability

CDanis created T367894: update status page latency for mw-on-k8s.

Tue, Jun 18, 4:46 PM · observability

herron awarded T367466: sre.hosts.downtime, and any other maintenance processes, should use auto-extending silences a Love token.

Tue, Jun 18, 2:03 PM · SRE-tools, Infrastructure-Foundations, Spicerack, Observability-Alerting

CDanis added a comment to T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

I think the last step to do here is to validate that any rsync failures will get reported on IRC. Then we can consider all the immediate followups of this incident done, and more slowly continue on with the larger work at T367119: Install a default timeout for systemd::timer::jobs.

Tue, Jun 18, 1:52 PM · Infrastructure-Foundations, Puppet

Mon, Jun 17

CDanis triaged T367119: Install a default timeout for systemd::timer::jobs as Low priority.

Mon, Jun 17, 3:09 PM · Infrastructure-Foundations, Puppet

CDanis triaged T367466: sre.hosts.downtime, and any other maintenance processes, should use auto-extending silences as Medium priority.

Mon, Jun 17, 3:09 PM · SRE-tools, Infrastructure-Foundations, Spicerack, Observability-Alerting

CDanis added a comment to T367466: sre.hosts.downtime, and any other maintenance processes, should use auto-extending silences.

Suggestions from discussion at I/F meeting:

It's probably not necessary or desirable to add this to all of the contextmanager usages of alertmanager silences, as those are expected to be very short-lived
"Manual" invocations of sre.hosts.downtime should almost certainly do this. Or in general any process where we don't have a pretty deterministic estimated-time-to-completion.
We don't have an equivalent of "check optimal" for alertmananger, only Icinga. We should probably have this.
Would be very good to have a dedicated dashboard for silences that are suppressing active alerts but being auto-extended

Mon, Jun 17, 2:31 PM · SRE-tools, Infrastructure-Foundations, Spicerack, Observability-Alerting

CDanis added a comment to T367119: Install a default timeout for systemd::timer::jobs.

Alternatives to consider:

Make this a required field instead of adding a default [harder up-front but potentially safer]
Make omitting this field wmf puppet style guide violation [slower version of the above]

Mon, Jun 17, 2:28 PM · Infrastructure-Foundations, Puppet

Fri, Jun 14

CDanis added a comment to T340552: MediaWiki imports OpenTelemetry client instrumentation library for enhanced trace metadata.

Mentioning T364280: Add jaeger-ui and other stuff to mwcli here.

Fri, Jun 14, 4:17 PM · Patch-For-Review, Wikimedia-Hackathon-2024, MediaWiki-Platform-Team (Radar), MediaWiki-libs-HTTP, Observability-Tracing

CDanis merged task T223727: Investigate OpenTracing integration with MediaWiki into T340552: MediaWiki imports OpenTelemetry client instrumentation library for enhanced trace metadata.

Fri, Jun 14, 4:12 PM · MediaWiki-General, Wikimedia-Hackathon-2019

CDanis merged T223727: Investigate OpenTracing integration with MediaWiki into T340552: MediaWiki imports OpenTelemetry client instrumentation library for enhanced trace metadata.

Fri, Jun 14, 4:12 PM · Patch-For-Review, Wikimedia-Hackathon-2024, MediaWiki-Platform-Team (Radar), MediaWiki-libs-HTTP, Observability-Tracing

kamila awarded T367466: sre.hosts.downtime, and any other maintenance processes, should use auto-extending silences a Like token.

Fri, Jun 14, 1:56 PM · SRE-tools, Infrastructure-Foundations, Spicerack, Observability-Alerting

Thu, Jun 13

CDanis updated the task description for T367466: sre.hosts.downtime, and any other maintenance processes, should use auto-extending silences.

Thu, Jun 13, 8:44 PM · SRE-tools, Infrastructure-Foundations, Spicerack, Observability-Alerting

CDanis updated the task description for T367466: sre.hosts.downtime, and any other maintenance processes, should use auto-extending silences.

Thu, Jun 13, 8:43 PM · SRE-tools, Infrastructure-Foundations, Spicerack, Observability-Alerting

CDanis created T367466: sre.hosts.downtime, and any other maintenance processes, should use auto-extending silences.

Thu, Jun 13, 8:40 PM · SRE-tools, Infrastructure-Foundations, Spicerack, Observability-Alerting

CDanis added a comment to T348643: cloudcephosd1021-1034: hard drive sector errors increasing.

Very helpful, thanks @dcaro and enjoy the pto!

Thu, Jun 13, 7:38 PM · cloud-services-team (FY2023/2024-Q3-Q4), SRE, ops-eqiad, DC-Ops, Cloud-VPS

CDanis updated subscribers of T348643: cloudcephosd1021-1034: hard drive sector errors increasing.

Hi all. @joanna_borun asked me to do some looking into this. I promise I skimmed the above, but I'm sure I missed things, so please pardon me for the pretty basic questions.

Thu, Jun 13, 7:13 PM · cloud-services-team (FY2023/2024-Q3-Q4), SRE, ops-eqiad, DC-Ops, Cloud-VPS

Wed, Jun 12

CDanis updated subscribers of T367342: useful operation names in traces.

Example trace as processed in codfw production:
https://trace.wikimedia.org/trace/06aabdeeb578a2663034270cf6d4accf

Wed, Jun 12, 7:18 PM · Observability-Tracing

CDanis renamed T367342: useful operation names in traces from useful automatic operation names to useful operation names in traces.

Wed, Jun 12, 5:41 PM · Observability-Tracing

CDanis created T367342: useful operation names in traces.

Wed, Jun 12, 5:40 PM · Observability-Tracing

CDanis updated the task description for T366750: Basic data quality work for traces.

Wed, Jun 12, 5:25 PM · Observability-Tracing

CDanis added a comment to T366750: Basic data quality work for traces.

Remove known PII (sessionstore URL)
Filter out some very noisy spans (e.g. healthchecks, Special:Blankpage etc)

Both of these verified working in production :)

Wed, Jun 12, 5:25 PM · Observability-Tracing

CDanis updated the task description for T366750: Basic data quality work for traces.

Wed, Jun 12, 5:23 PM · Observability-Tracing

CDanis closed T364907: upgrade to latest stable version of otelcol-contrib, a subtask of T320549: distributed tracing v0 [minimum viable], as Resolved.

Wed, Jun 12, 5:23 PM · Epic, Observability-Tracing

CDanis closed T364907: upgrade to latest stable version of otelcol-contrib, a subtask of T366750: Basic data quality work for traces, as Resolved.

Wed, Jun 12, 5:23 PM · Observability-Tracing

CDanis closed T364907: upgrade to latest stable version of otelcol-contrib as Resolved.

Wed, Jun 12, 5:23 PM · Observability-Tracing

CDanis added a parent task for T364907: upgrade to latest stable version of otelcol-contrib: T366750: Basic data quality work for traces.

Wed, Jun 12, 4:08 PM · Observability-Tracing

CDanis added a subtask for T366750: Basic data quality work for traces: T364907: upgrade to latest stable version of otelcol-contrib.

Wed, Jun 12, 4:08 PM · Observability-Tracing

CDanis added a comment to T366750: Basic data quality work for traces.

Patches as written depend upon otelcol v0.102.0, so upgrading again.

Wed, Jun 12, 4:07 PM · Observability-Tracing

CDanis reopened T364907: upgrade to latest stable version of otelcol-contrib, a subtask of T320549: distributed tracing v0 [minimum viable], as Open.

Wed, Jun 12, 2:59 PM · Epic, Observability-Tracing

CDanis reopened T364907: upgrade to latest stable version of otelcol-contrib as "Open".

Wed, Jun 12, 2:59 PM · Observability-Tracing

Tue, Jun 11

CDanis added a comment to T366272: Update puppet configuration to use GeoLite2 (free) instead of GeoIP2-Enterprise data.

In T366272#9878290, @kostajh wrote:

@jijiki @CDanis as a follow-up, could someone from SRE please confirm that the GeoLite2 files at /usr/share/GeoIPInfo/GeoLite2- have been successfully updated today?

Tue, Jun 11, 9:41 PM · Patch-For-Review, Infrastructure-Foundations, Trust and Safety Product Sprint (Sprint Melodica (Jun 3 - Jun 14)), Puppet-Infrastructure

CDanis created P64647 (An Untitled Masterwork).

Tue, Jun 11, 9:38 PM

CDanis added a member for WMF-NDA: SDeckelmann-WMF.

Tue, Jun 11, 4:53 PM

CDanis created T367205: Security Issue Access Request for SDeckelmann-WMF.

Tue, Jun 11, 4:52 PM · SecTeam-Processed, Security-Team, Security

Mon, Jun 10

CDanis created T367119: Install a default timeout for systemd::timer::jobs.

Mon, Jun 10, 8:56 PM · Infrastructure-Foundations, Puppet

CDanis updated subscribers of T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

Mon, Jun 10, 8:19 PM · Infrastructure-Foundations, Puppet

CDanis updated the task description for T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

Mon, Jun 10, 8:17 PM · Infrastructure-Foundations, Puppet

CDanis created T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

Mon, Jun 10, 7:25 PM · Infrastructure-Foundations, Puppet

CDanis updated the task description for T364164: Probenet support for per-ip-block mappings.

Mon, Jun 10, 4:15 PM · Epic, probenet

CDanis created P64540 (An Untitled Masterwork).

Mon, Jun 10, 3:40 PM

Wed, Jun 5

CDanis created T366750: Basic data quality work for traces.

Wed, Jun 5, 8:03 PM · Observability-Tracing

CDanis awarded T239378: Disable parent task metadata by default for new sub tasks a Like token.

Wed, Jun 5, 4:29 PM · Patch-For-Review, User-brennen, Release-Engineering-Team, Phabricator, Developer Productivity

Tue, Jun 4

CDanis lowered the priority of T366563: moss-be1003 "Warning: The current total number of facts: 2830 exceeds the number of facts limit: 2048" from High to Medium.

Tue, Jun 4, 6:46 PM · SRE-swift-storage, SRE, Infrastructure-Foundations

CDanis updated subscribers of T366563: moss-be1003 "Warning: The current total number of facts: 2830 exceeds the number of facts limit: 2048".

I discussed this with @Muehlenhoff in his evening/my morning.

Tue, Jun 4, 6:46 PM · SRE-swift-storage, SRE, Infrastructure-Foundations

CDanis closed T264021: > ~1 request/second to intake-logging.wikimedia.org times out at the traffic/service interface, a subtask of T257527: automatically collect network error reports from users' browsers (Network Error Logging API), as Resolved.

Tue, Jun 4, 4:27 PM · Product-Data-Infrastructure, SRE, Goal, Epic

CDanis closed T264021: > ~1 request/second to intake-logging.wikimedia.org times out at the traffic/service interface as Resolved.

I think you are right @Vgutierrez, thanks

Tue, Jun 4, 4:27 PM · Data-Platform-SRE, Infrastructure-Foundations, Event-Platform, Data-Engineering

Mon, Jun 3

CDanis updated the task description for T364164: Probenet support for per-ip-block mappings.

Mon, Jun 3, 9:07 PM · Epic, probenet

CDanis added a comment to T363722: Craft geo-maps file to create lowest-latency routes from south america.

Results after adding BR.ix are in.

Mon, Jun 3, 8:51 PM · Traffic

CDanis added a comment to T359054: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps.

Results after adding BR.ix are in.

Mon, Jun 3, 8:51 PM · Infrastructure-Foundations, SRE, Traffic

CDanis updated the name of F54913150: user-measured latency towards all datacenters from Central/South America, data 2024-05-31 -- 2024-06-03 from "image.png" to "user-measured latency towards all datacenters from Central/South America, data 2024-05-31 -- 2024-06-03".

Mon, Jun 3, 8:46 PM

CDanis added a subtask for T340551: distributed tracing epic: T366518: migrate otelcol k8sattributes processor to use Kubelet /pods api (when available).

Mon, Jun 3, 6:12 PM · Epic, Observability-Tracing

CDanis added a parent task for T366518: migrate otelcol k8sattributes processor to use Kubelet /pods api (when available): T340551: distributed tracing epic.

Mon, Jun 3, 6:12 PM · Observability-Tracing

CDanis changed the status of T366518: migrate otelcol k8sattributes processor to use Kubelet /pods api (when available) from Open to Stalled.

Mon, Jun 3, 6:12 PM · Observability-Tracing

CDanis created T366518: migrate otelcol k8sattributes processor to use Kubelet /pods api (when available).

Mon, Jun 3, 6:11 PM · Observability-Tracing

CDanis added a comment to T353464: Migrate wikikube control planes to hardware nodes.

In T353464#9854420, @JMeybohm wrote:

In T353464#9845947, @jijiki wrote:

Current status:

As I see it we're currently also still running the ganeti etcd instances in codfw and eqiad which I think does limit the performance of the etcd cluster by quite a bit. Was it a deliberate decision to not remove them?

Mon, Jun 3, 1:20 PM · Patch-For-Review, serviceops, Prod-Kubernetes, Kubernetes

Thu, May 30

CDanis added a comment to T365571: Rename wikikube worker nodes during OS reimage.

18 :58:42	<+jinxer-wm>	RESOLVED: [2x] KubernetesCalicoDown: kubernetes2032.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown

Thu, May 30, 6:59 PM · Kubernetes, Prod-Kubernetes, serviceops

CDanis added a comment to T366094: k8s master capacity issues.

As we expected/hoped, the increase in eqiad TX bytes was only about 10-15%.

Thu, May 30, 2:53 PM · serviceops, SRE

Wed, May 29

CDanis added a comment to T366094: k8s master capacity issues.

In T366094#9842327, @akosiaris wrote:

I am gonna disagree on this one. This says that outgoing traffic from kubemasters can each 272MB/s, which is ~2.1Gbps, which is > 2/3 of the total NIC capacity of the 3 new control plane nodes.

Wed, May 29, 11:02 PM · serviceops, SRE

CDanis created P63596 (An Untitled Masterwork).

Wed, May 29, 6:37 PM

CDanis added a comment to T366094: k8s master capacity issues.

In T366094#9841558, @Stashbot wrote:

Mentioned in SAL (#wikimedia-sre) [2024-05-29T11:23:04Z] <akosiaris> T366094 re-undeploy otel-collector, it being around increased traffic to the API >50%

Wed, May 29, 12:59 PM · serviceops, SRE

CDanis added a comment to T366094: k8s master capacity issues.

A few times we triggered a rolling restart of mw-api-int pods in eqiad to deliberately cause the CPU/network TX spikes on the two original masters, so we could observe more about the behavior. We wanted to figure out "what had changed" such that a scap deploy was now creating a ProbeDown page.
- We captured pcaps on a few different k8s masters during such events, and then ran that through wireshark's Statistics > Conversations feature. Here's one such breakdown. I sorted the list of streams by bytes and manually tagged the top dozen or so streams. I stopped a few entries into a very long run of nearly-identical byte counts from local port 6443 (apiserver) to different node IPs; if you sum up the ones I tagged plus all of those, that makes up 97% of the bytes in the sample.
- Of that 97% portion I inspected, everything sending packets to and from the apiserver machines was reasonable-looking: reading from an etcd's port 2380 was about 5% of overall bytes, then after that, in order, the apiserver sending lots of data to all of: the istiod pod, a calico-kube-controllers pod, a k8s-controller-sidecars pod, various different node IPs (so one of kubelet, kube-proxy, or rsyslogd)... all of them expected, known usages of the API.
- This is 'just' absence of evidence but I'm gonna go ahead and call it evidence of absence here.

Wed, May 29, 2:42 AM · serviceops, SRE

Tue, May 28

CDanis added a comment to T366094: k8s master capacity issues.

Posting a short comment now, before I start drafting a much longer comment (and possibly don't finish before my toddler ends my day):

Tue, May 28, 8:49 PM · serviceops, SRE

Fri, May 24

CDanis created T365855: Stop hardcoding k8s master (k8s API) endpoint IP addresses.

Fri, May 24, 5:57 PM · Observability-Tracing

CDanis added a subtask for T363407: Proper service names in trace data: T365809: deploy otel collector in k8s staging clusters.

Fri, May 24, 12:58 PM · Patch-For-Review, Observability-Tracing

CDanis added a subtask for T320549: distributed tracing v0 [minimum viable]: T365809: deploy otel collector in k8s staging clusters.

Fri, May 24, 12:58 PM · Epic, Observability-Tracing

CDanis added parent tasks for T365809: deploy otel collector in k8s staging clusters: T363407: Proper service names in trace data, T320549: distributed tracing v0 [minimum viable].

Fri, May 24, 12:58 PM · Patch-For-Review, Observability-Tracing

CDanis created T365809: deploy otel collector in k8s staging clusters.

Fri, May 24, 12:57 PM · Patch-For-Review, Observability-Tracing

Thu, May 23

CDanis added a comment to T363407: Proper service names in trace data.

I tested out a simple enabling of the k8s attributes processor in the values in the chart. The diffs all looked quite reasonable, but of course it doesn't work:

Thu, May 23, 9:22 PM · Patch-For-Review, Observability-Tracing

CDanis closed T365626: move k8s opentelemetry-collector from services to admin_ng as Resolved.

Traces are flowing again in eqiad.

Thu, May 23, 8:31 PM

CDanis closed T365626: move k8s opentelemetry-collector from services to admin_ng, a subtask of T363407: Proper service names in trace data, as Resolved.

Thu, May 23, 8:30 PM · Patch-For-Review, Observability-Tracing

CDanis closed T365626: move k8s opentelemetry-collector from services to admin_ng, a subtask of T320549: distributed tracing v0 [minimum viable], as Resolved.

Thu, May 23, 8:30 PM · Epic, Observability-Tracing

CDanis updated subscribers of T365626: move k8s opentelemetry-collector from services to admin_ng.

helmfile apply went seamlessly, but unfortunately this broke trace collection: I realized only in retrospect that this effectively also changes the DNS name of the collector, and that's vendored into a lot of other charts with the full old name: main-opentelemetry-collector.opentelemetry-collector.svc.cluster.local

Thu, May 23, 7:57 PM

CDanis updated subscribers of T365626: move k8s opentelemetry-collector from services to admin_ng.

Thu, May 23, 7:57 PM

CDanis added a comment to T355750: CFSSL gencert "remote error: tls: certificate require".

Hi Arzhel, for when I do have time to look at this, do you have a recommended way of reproducing without breaking anything or potentially actually affecting a network device?

Thu, May 23, 2:20 PM · Infrastructure-Foundations, CFSSL-PKI

May 22 2024

CDanis added a subtask for T363407: Proper service names in trace data: T365626: move k8s opentelemetry-collector from services to admin_ng.

May 22 2024, 3:47 PM · Patch-For-Review, Observability-Tracing

CDanis added a subtask for T320549: distributed tracing v0 [minimum viable]: T365626: move k8s opentelemetry-collector from services to admin_ng.

May 22 2024, 3:47 PM · Epic, Observability-Tracing

CDanis (Chris Danis)
SRE @ WMF

Projects (11)
View All

Calendar

Today

Tomorrow

Monday

User Details

Recent Activity
View All

Yesterday

Thu, Jun 20

Tue, Jun 18

Mon, Jun 17

Fri, Jun 14

Thu, Jun 13

Wed, Jun 12

Tue, Jun 11

Mon, Jun 10

Wed, Jun 5

Tue, Jun 4

Mon, Jun 3

Thu, May 30

Wed, May 29

Tue, May 28

Fri, May 24

Thu, May 23

May 22 2024

CDanis (Chris Danis)SRE @ WMF

Projects (11)View All

Calendar

Today

Tomorrow

Monday

User Details

Recent ActivityView All

Yesterday

Thu, Jun 20

Tue, Jun 18

Mon, Jun 17

Fri, Jun 14

Thu, Jun 13

Wed, Jun 12

Tue, Jun 11

Mon, Jun 10

Wed, Jun 5

Tue, Jun 4

Mon, Jun 3

Thu, May 30

Wed, May 29

Tue, May 28

Fri, May 24

Thu, May 23

May 22 2024

CDanis (Chris Danis)
SRE @ WMF

Projects (11)
View All

Recent Activity
View All