Page MenuHomePhabricator

colewhite (cwhite)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Aug 21 2018, 6:05 PM (114 w, 4 h)
Availability
Available
LDAP User
Cwhite
MediaWiki User
Unknown

Recent Activity

Fri, Oct 23

colewhite added a comment to T262675: Store Kubernetes events for more than one hour.

This is a known issue with the current Logstash configuration and one of the primary drivers behind adopting a Common Logging Schema (T234565).

Fri, Oct 23, 4:06 PM · Patch-For-Review, observability, Prod-Kubernetes, Kubernetes, serviceops
colewhite claimed T266296: Please create operations/ecs Gerrit repository.

Added request to the above link.

Fri, Oct 23, 12:33 AM · Gerrit

Thu, Oct 22

colewhite created T266296: Please create operations/ecs Gerrit repository.
Thu, Oct 22, 11:19 PM · Gerrit

Thu, Oct 15

colewhite closed T210137: Handle unknown stats in rsyslog_exporter as Resolved.

Updated prometheus-rsyslog-exporter deployed to the fleet. If the log message comes up again please let us know.

Thu, Oct 15, 8:46 PM · User-jijiki, serviceops, observability

Wed, Oct 14

colewhite created T265505: Verification email does not reach my inbox.
Wed, Oct 14, 4:54 PM · VPS-project-Phabricator

Thu, Oct 8

colewhite added a comment to T141520: "MediaWiki exceptions and fatals per minute" alarm is too slow (half an hour delay!).

Indeed, there is a bit of delay due to retries and the default retry_interval of 1 (minute) which seems appropriate for most cases.

Thu, Oct 8, 5:32 PM · Release-Engineering-Team (Deployment services), Release-Engineering-Team-TODO, Operations, observability
colewhite moved T210137: Handle unknown stats in rsyslog_exporter from Backlog to In progress on the observability board.
Thu, Oct 8, 5:19 PM · User-jijiki, serviceops, observability
colewhite added a comment to T210137: Handle unknown stats in rsyslog_exporter.

Found a new upstream and have deployed it to netbox-dev2001 and centrallog1001 to run for a few days. If all checks out, we'll roll it to the rest of the fleet.

Thu, Oct 8, 5:18 PM · User-jijiki, serviceops, observability
colewhite claimed T210137: Handle unknown stats in rsyslog_exporter.
Thu, Oct 8, 5:17 PM · User-jijiki, serviceops, observability

Wed, Oct 7

colewhite added a comment to T263624: Grafana error: "parse error at char 1: unexpected character: '\\ufeff'" when copy-pasting metric names.

I followed the replication steps and did not see the \\ufeff or <feff> artifacts appear in either the Grafana explore tool or pasting into the terminal. A few differences though: I'm running Chromium and I'm in locale en_us.UTF-8.

Wed, Oct 7, 3:35 PM · Operations, observability

Tue, Oct 6

colewhite closed T263728: mtail 3.0.0-rc35 doesn't support the histogram type in -oneshot mode. as Resolved.

Patched mtail rolling out to the fleet this morning. Please let me know if you encounter any related issue.

Tue, Oct 6, 2:38 PM · observability, Operations, serviceops, Platform Team Initiatives (API Gateway)
colewhite closed T263728: mtail 3.0.0-rc35 doesn't support the histogram type in -oneshot mode., a subtask of T263727: Separate mediawiki latency metrics by endpoint, as Resolved.
Tue, Oct 6, 2:37 PM · Patch-For-Review, Operations, serviceops, Platform Team Initiatives (API Gateway)

Mon, Oct 5

colewhite moved T262920: Indexing errors / malformed logs for aqs on cassandra timeout from Inbox to Backlog on the observability board.
Mon, Oct 5, 11:13 PM · Analytics, observability
colewhite added a comment to T262920: Indexing errors / malformed logs for aqs on cassandra timeout.

@JAllemandou this is a known issue with the current Logstash configuration and one of the primary drivers behind adopting a Common Logging Schema (T234565).

Mon, Oct 5, 11:12 PM · Analytics, observability
colewhite moved T262078: Create per cluster error rate alerts on Mediawiki servers from Inbox to Radar on the observability board.
Mon, Oct 5, 11:05 PM · Operations, observability, User-jijiki, Parsoid, serviceops
colewhite added a comment to T262078: Create per cluster error rate alerts on Mediawiki servers.

As of T256418, we have removed StatsD outputs from Logstash. Prometheus-ES-Exporter accepts an Elasticsearch query and exports metrics based on those queries.

Mon, Oct 5, 11:04 PM · Operations, observability, User-jijiki, Parsoid, serviceops
colewhite closed T256418: Evaluate alternative to Logstash StatsD outputs as Resolved.

This is resolved with the removal of the statsd outputs from logstash.

Mon, Oct 5, 11:01 PM · Patch-For-Review, Wikimedia-Logstash, observability

Thu, Oct 1

colewhite updated subscribers of T236954: Hieradata yaml style checking.

While debugging https://gerrit.wikimedia.org/r/c/operations/puppet/+/631508, @ssingh uncovered a possible bug around how Puppet yaml parser handles unquoted string values:
Given the yaml:

profile::wikidough::dnsdist::webserver:
  host: 0.0.0.0
  port: 8083
  acl:
    - '0.0.0.0/0'
    - ::/0

the catalog compiler renders:

Error: Evaluation Error: Error while evaluating a Function Call, Lookup of key 'profile::wikidough::dnsdist::webserver' failed: Value for key 'profile::wikidough::dnsdist::webserver', in hash returned from data_hash function 'yaml_data', when using location '/srv/jenkins-workspace/puppet-compiler/25616/change/src/hieradata/role/common/wikidough.yaml', has wrong type, expects Puppet::LookupValue, got Hash[Enum['acl', 'host', 'port'], Any, 3, 3] (file: /srv/jenkins-workspace/puppet-compiler/25616/change/src/modules/profile/manifests/wikidough.pp, line: 6, column: 51) on node malmok.wikimedia.org
Thu, Oct 1, 9:52 PM · Patch-For-Review, Puppet, Operations, User-jbond

Wed, Sep 30

colewhite closed T264111: External Monitoring alerting on 400 Bad Request errors as Resolved.

Notifications re-enabled.

Wed, Sep 30, 2:50 PM · Operations, Traffic
colewhite assigned T264111: External Monitoring alerting on 400 Bad Request errors to ema.
Wed, Sep 30, 2:41 PM · Operations, Traffic

Tue, Sep 29

colewhite added a comment to T264111: External Monitoring alerting on 400 Bad Request errors.

I cannot find any indication that the 400s are originating from our servers either in webrequest log or turnilo.

Tue, Sep 29, 5:05 PM · Operations, Traffic
colewhite created T264111: External Monitoring alerting on 400 Bad Request errors.
Tue, Sep 29, 4:31 PM · Operations, Traffic

Mon, Sep 28

colewhite moved T263728: mtail 3.0.0-rc35 doesn't support the histogram type in -oneshot mode. from Inbox to In progress on the observability board.
Mon, Sep 28, 3:08 PM · observability, Operations, serviceops, Platform Team Initiatives (API Gateway)
colewhite claimed T263728: mtail 3.0.0-rc35 doesn't support the histogram type in -oneshot mode..
Mon, Sep 28, 3:08 PM · observability, Operations, serviceops, Platform Team Initiatives (API Gateway)

Sep 16 2020

colewhite added a comment to T240995: AQS is not OpenAPI 3 compliant.

By all means. The patch was generated by a tool, and I applied some manual stylistic formatting you may or may not want. Have a look and do with it what you see fit.

Sep 16 2020, 2:06 AM · Analytics-Kanban, Patch-For-Review, Analytics

Sep 10 2020

colewhite closed T262492: Logstash-next fails to load properly. as Resolved.

It does not appear to be reproducible today. Will reopen if it comes back.

Sep 10 2020, 2:49 PM · observability, Operations

Sep 9 2020

colewhite created T262492: Logstash-next fails to load properly..
Sep 9 2020, 10:52 PM · observability, Operations

Sep 7 2020

colewhite added a comment to T234854: Upgrade ELK Stack to version 7.

@jcrespo Thanks for bringing this to our attention. The filters on that dashboard indicate they are broken because the filter pattern logstash-* cannot be found on logstash-next.

Sep 7 2020, 1:53 PM · Patch-For-Review, Operations, Wikimedia-Logstash

Sep 3 2020

colewhite added a comment to T230835: Investigate janitor, maintenance emails parser.

I tried this today. It was unable to parse Zayo or Telia new scheduled maintenance emails, but successfully parsed NTT and GTT new scheduled maintenance emails. At this point, the project looks like it would need quite a bit of fixing to fit our use case.

Sep 3 2020, 3:37 PM · User-Elukey, Operations

Sep 2 2020

colewhite closed T261607: Mailing list for UG Greece as Resolved.

The list is now available. Administrative interface can be found here. Subscription interface can be found here. The administrative password should be in your inbox.

Sep 2 2020, 11:20 PM · Operations, Wikimedia-Mailing-lists
colewhite closed T261760: Requesting access to Production for lsobanski as Resolved.

Confirmed access to Icinga fixed via IRC.

Sep 2 2020, 4:35 PM · Operations, SRE-Access-Requests

Sep 1 2020

colewhite claimed T261607: Mailing list for UG Greece.
Sep 1 2020, 10:53 PM · Operations, Wikimedia-Mailing-lists
colewhite triaged T261524: decommission mw2135-mw2147, mw2187-mw2214 - physical / datacenter part as Medium priority.
Sep 1 2020, 10:51 PM · ops-codfw, Operations, serviceops
colewhite updated the task description for T261754: Requesting access to deployment for holger.
Sep 1 2020, 10:49 PM · SRE-Access-Requests, Operations
colewhite updated the task description for T261754: Requesting access to deployment for holger.
Sep 1 2020, 10:48 PM · SRE-Access-Requests, Operations
colewhite claimed T261754: Requesting access to deployment for holger.
Sep 1 2020, 10:46 PM · SRE-Access-Requests, Operations
colewhite triaged T261803: Cookie “WMF-Last-Access-Global” has been rejected for invalid domain. as Medium priority.
Sep 1 2020, 10:45 PM · Analytics-Radar, Traffic, Operations, Wikimedia-General-or-Unknown
colewhite added a member for Trusted-Contributors: LSobanski.
Sep 1 2020, 3:51 PM
colewhite added a member for WMF-NDA: LSobanski.
Sep 1 2020, 3:50 PM
colewhite added a member for acl*sre-team: LSobanski.
Sep 1 2020, 3:49 PM
colewhite updated the task description for T261760: Requesting access to Production for lsobanski.
Sep 1 2020, 3:27 PM · Operations, SRE-Access-Requests
colewhite updated the task description for T261760: Requesting access to Production for lsobanski.
Sep 1 2020, 3:19 PM · Operations, SRE-Access-Requests
colewhite updated the task description for T261760: Requesting access to Production for lsobanski.
Sep 1 2020, 3:18 PM · Operations, SRE-Access-Requests
colewhite claimed T261760: Requesting access to Production for lsobanski.
Sep 1 2020, 3:18 PM · Operations, SRE-Access-Requests

Aug 21 2020

colewhite closed T260927: Degraded RAID on backup2001, a subtask of T260764: backup2001 RAID controller failure, unable to post 2020-08-19, as Declined.
Aug 21 2020, 3:26 PM · Operations, ops-codfw
colewhite closed T260927: Degraded RAID on backup2001 as Declined.

Superseded by parent task.

Aug 21 2020, 3:26 PM · Operations, ops-codfw
colewhite added a subtask for T260764: backup2001 RAID controller failure, unable to post 2020-08-19: T260927: Degraded RAID on backup2001.
Aug 21 2020, 3:26 PM · Operations, ops-codfw
colewhite added a parent task for T260927: Degraded RAID on backup2001: T260764: backup2001 RAID controller failure, unable to post 2020-08-19.
Aug 21 2020, 3:25 PM · Operations, ops-codfw
colewhite triaged T260943: Don't set cookies for api.wikimedia.org at the caching layer as Medium priority.
Aug 21 2020, 3:23 PM · Operations, Traffic
colewhite edited projects for T260974: Request to make Ladsgroup a gerrit manager, added: Diffusion-Repository-Administrators; removed LDAP-Access-Requests, Operations.
Aug 21 2020, 3:19 PM · Gerrit-Privilege-Requests, Release-Engineering-Team

Aug 20 2020

colewhite claimed T256418: Evaluate alternative to Logstash StatsD outputs.
Aug 20 2020, 9:36 PM · Patch-For-Review, Wikimedia-Logstash, observability

Aug 12 2020

colewhite added a comment to T259794: Kibana next sending telemetry to elastic.co.

Can we set a hard CSP on this domain at the web server level so that in general our report will be "oh no, there's a request attempt we didn't notice" (possibly with a "hm.. and feature X is partly not working as a result") - as opposed to "oh no, we're actually making requests we don't want."

Aug 12 2020, 2:56 AM · Patch-For-Review, observability, Privacy, Operations, Wikimedia-Logstash

Aug 10 2020

colewhite closed T247820: Decide on `service-runner` aggregated prometheus metrics and use of `service` label, a subtask of T205870: Fully migrate producers off statsd, as Resolved.
Aug 10 2020, 10:57 PM · Performance-Team (Radar), Patch-For-Review, observability, Operations
colewhite closed T247820: Decide on `service-runner` aggregated prometheus metrics and use of `service` label as Resolved.

Change to router metrics in service-template-node merged.

Aug 10 2020, 10:57 PM · Platform Team Workboards (External Code Reviews), Performance-Team (Radar), observability, Operations
colewhite added a comment to T247820: Decide on `service-runner` aggregated prometheus metrics and use of `service` label.

Change to heap metrics merged into service-runner/prometheus_metrics branch. Thanks @Pchelolo!

Aug 10 2020, 10:53 PM · Platform Team Workboards (External Code Reviews), Performance-Team (Radar), observability, Operations
colewhite added a project to T259020: varnishmtail silently stops working if varnishncsa crashes: observability.
Aug 10 2020, 10:08 PM · observability, Patch-For-Review, Traffic, Operations

Aug 5 2020

colewhite added a comment to T258948: Deploy Alertmanager for alerting infrastructure phase 1.

prometheus-icinga-exporter 0.8 deployed

Aug 5 2020, 10:26 PM · Patch-For-Review, User-fgiunchedi, observability

Jul 29 2020

colewhite added a comment to T255776: mtail "syscall spam" / high cpu usage on logstash1023.

I'm not inclined to upstream the patch. The patch is a terrible, terrible hack that happens to fit our use case(s). It is very likely they would not want it as-is (it adds a dependency) and it might break the file rotation handling feature in subtle ways.

Jul 29 2020, 10:09 PM · Patch-For-Review, observability

Jul 28 2020

colewhite added a comment to T259078: Please create operations/debs/prometheus-es-exporter gerrit repository.

Thanks @MarcoAurelio!

Jul 28 2020, 11:38 PM · User-MarcoAurelio, Diffusion-Repository-Administrators
colewhite added a subtask for T256418: Evaluate alternative to Logstash StatsD outputs: T259078: Please create operations/debs/prometheus-es-exporter gerrit repository.
Jul 28 2020, 9:05 PM · Patch-For-Review, Wikimedia-Logstash, observability
colewhite added a parent task for T259078: Please create operations/debs/prometheus-es-exporter gerrit repository: T256418: Evaluate alternative to Logstash StatsD outputs.
Jul 28 2020, 9:05 PM · User-MarcoAurelio, Diffusion-Repository-Administrators
colewhite created T259078: Please create operations/debs/prometheus-es-exporter gerrit repository.
Jul 28 2020, 9:02 PM · User-MarcoAurelio, Diffusion-Repository-Administrators
colewhite added a comment to T258931: hiera_lookup failing to preform lookups after hiera5 upgrade.

sudo puppet lookup works! Thanks!

Jul 28 2020, 1:46 PM · Patch-For-Review, User-jbond, Operations, Puppet

Jul 27 2020

colewhite added a comment to T258931: hiera_lookup failing to preform lookups after hiera5 upgrade.

utils/hiera_lookup shows me the same error.

Jul 27 2020, 1:17 PM · Patch-For-Review, User-jbond, Operations, Puppet

Jul 20 2020

colewhite triaged T256418: Evaluate alternative to Logstash StatsD outputs as Medium priority.
Jul 20 2020, 10:49 PM · Patch-For-Review, Wikimedia-Logstash, observability
colewhite added a comment to T256418: Evaluate alternative to Logstash StatsD outputs.

We (Observability) decided that we would like to explore querying Elasticsearch directly. It has promise due to its flexibility and gives us a clear option to alert on logs.

Jul 20 2020, 3:30 PM · Patch-For-Review, Wikimedia-Logstash, observability

Jul 13 2020

colewhite added a comment to T257679: "Failed to fork" errors on kubernetes100[1,3,4].

/proc/sys/kernel/threads-max and /proc/sys/kernel/pid_max don't look particularly probable as they are well below the limits (5k vs 32k and 512k respectively), but at the same time we don't have good data for those in prometheus. I am thinking of passing --collector.processes to node collector, at least for kubernetes boxes. It's an expensive collector per https://github.com/prometheus/node_exporter/pull/950, so perhaps even for just 1 box (if possible?). @fgiunchedi , @colewhite, @herron objections?

Jul 13 2020, 6:41 PM · Product-Infrastructure-Team-Backlog (Kanban), Proton, Patch-For-Review, Prod-Kubernetes, serviceops
colewhite triaged T257861: Pipe SAL entries into Logstash as Low priority.
Jul 13 2020, 5:19 PM · observability
colewhite added a subtask for T222826: Leverage Grafana annotations to show events in graphs: T257861: Pipe SAL entries into Logstash.
Jul 13 2020, 5:19 PM · Patch-For-Review, observability, Operations
colewhite added a parent task for T257861: Pipe SAL entries into Logstash: T222826: Leverage Grafana annotations to show events in graphs.
Jul 13 2020, 5:19 PM · observability
colewhite created T257861: Pipe SAL entries into Logstash.
Jul 13 2020, 5:18 PM · observability

Jul 7 2020

colewhite closed T255776: mtail "syscall spam" / high cpu usage on logstash1023, a subtask of T251466: Upgrade mtail to 3.0.0-rc35, as Resolved.
Jul 7 2020, 7:36 PM · Patch-For-Review, observability
colewhite closed T255776: mtail "syscall spam" / high cpu usage on logstash1023 as Resolved.

+wmf2 has been deployed

Jul 7 2020, 7:36 PM · Patch-For-Review, observability

Jul 6 2020

colewhite added a subtask for T222826: Leverage Grafana annotations to show events in graphs: T257226: Please create operations/debs/grafana-loki gerrit repository.
Jul 6 2020, 4:40 PM · Patch-For-Review, observability, Operations
colewhite added a parent task for T257226: Please create operations/debs/grafana-loki gerrit repository: T222826: Leverage Grafana annotations to show events in graphs.
Jul 6 2020, 4:40 PM · Diffusion-Repository-Administrators
colewhite created T257226: Please create operations/debs/grafana-loki gerrit repository.
Jul 6 2020, 4:40 PM · Diffusion-Repository-Administrators
colewhite moved T247732: recommendation api's test on scb nodes are flapping from Inbox to Radar on the observability board.
Jul 6 2020, 3:43 PM · observability, Patch-For-Review, Operations, Research

Jun 30 2020

colewhite closed T230030: Prometheus failing to ingest some mtail samples, a subtask of T251466: Upgrade mtail to 3.0.0-rc35, as Resolved.
Jun 30 2020, 8:31 PM · Patch-For-Review, observability
colewhite closed T230030: Prometheus failing to ingest some mtail samples as Resolved.

this has been cleaned up with the rc35 upgrade. no more instances since 15/Jun.

Jun 30 2020, 8:31 PM · observability
colewhite closed T251466: Upgrade mtail to 3.0.0-rc35 as Resolved.

mtail rc35 is now deployed across the fleet

Jun 30 2020, 8:28 PM · Patch-For-Review, observability
colewhite closed T225604: log spam from mtail 3.0.0~rc19 on wezen, a subtask of T243591: varnishmtail panics on buster, as Resolved.
Jun 30 2020, 8:27 PM · Operations, Traffic
colewhite closed T225604: log spam from mtail 3.0.0~rc19 on wezen as Resolved.

wezen is no longer around and mtail has been upgraded to rc35 across the fleet. this message does not appear to be spamming logs on centrallog.

Jun 30 2020, 8:27 PM · Operations, Patch-For-Review, observability
colewhite closed T225604: log spam from mtail 3.0.0~rc19 on wezen, a subtask of T251466: Upgrade mtail to 3.0.0-rc35, as Resolved.
Jun 30 2020, 8:27 PM · Patch-For-Review, observability

Jun 29 2020

colewhite updated the task description for T241176: Review and release service-runner 2.8.0.
Jun 29 2020, 10:17 PM · Platform Team Workboards (Clinic Duty Team), service-runner
colewhite added a comment to T233448: Review prometheus ORES rules for completeness.

There is a redis exporter available and installed on the rdb servers. However, there are no instances of the redis exporter configured to export key length/size.

Jun 29 2020, 10:05 PM · Machine Learning Platform (Current), Patch-For-Review, ORES

Jun 25 2020

colewhite updated subscribers of T84845: improve cron spam visibility.

An option we discussed recently was to ingest mail generated by the servers into Logstash by either pulling events from a mailbox or piping off events at the mail servers. Once in ES, queries could be run and aggregated emails generated as a daily report and/or alerts generated via log alerting.

Jun 25 2020, 7:41 PM · observability, Operations
colewhite updated the task description for T84845: improve cron spam visibility.
Jun 25 2020, 7:32 PM · observability, Operations
colewhite created T256418: Evaluate alternative to Logstash StatsD outputs.
Jun 25 2020, 7:27 PM · Patch-For-Review, Wikimedia-Logstash, observability

Jun 24 2020

colewhite added a comment to T255776: mtail "syscall spam" / high cpu usage on logstash1023.

After installing +wmf2 on logstash1007 CPU usage appears to max around 1.3% as opposed to around 4% on +wmf1.

Jun 24 2020, 10:30 PM · Patch-For-Review, observability

Jun 23 2020

colewhite triaged T255776: mtail "syscall spam" / high cpu usage on logstash1023 as Medium priority.
Jun 23 2020, 10:20 PM · Patch-For-Review, observability

Jun 19 2020

colewhite added a comment to T255776: mtail "syscall spam" / high cpu usage on logstash1023.

Submitted a bug upstream.

Jun 19 2020, 10:03 PM · Patch-For-Review, observability

Jun 15 2020

colewhite added a comment to T255508: The UI timestamp in Kibana should be based on source not Logstash intake (for PHP-FPM and MW).

This sounds a lot like something we identified during the audit phase of T234565: a number of fields are created (and ultimately passed through transparently) that have essentially the same data, just different keys. IIRC, we want to consolidate on the source's timestamp and only provide our own if one is not available.

Jun 15 2020, 11:11 PM · Wikimedia-Logstash, observability, Developer Productivity

Jun 12 2020

colewhite updated the task description for T255044: Many new metrics in Graphite for WDQS-Streaming-Updater-POC.
Jun 12 2020, 3:31 PM · Discovery-Search (Current work), Discovery, Wikidata-Query-Service, Wikidata

Jun 11 2020

colewhite updated subscribers of T255044: Many new metrics in Graphite for WDQS-Streaming-Updater-POC.

From discussion in -analytics, @dcausse indicated that they are safe to remove.

Jun 11 2020, 12:38 AM · Discovery-Search (Current work), Discovery, Wikidata-Query-Service, Wikidata

Jun 10 2020

colewhite created T255044: Many new metrics in Graphite for WDQS-Streaming-Updater-POC.
Jun 10 2020, 5:24 PM · Discovery-Search (Current work), Discovery, Wikidata-Query-Service, Wikidata

Jun 8 2020

colewhite closed T254192: mtail rc35 stops incrementing atsmtail counters, a subtask of T251466: Upgrade mtail to 3.0.0-rc35, as Resolved.
Jun 8 2020, 3:15 PM · Patch-For-Review, observability
colewhite closed T254192: mtail rc35 stops incrementing atsmtail counters as Resolved.

This issue hasn't resurfaced since disabling fsnotify. Moving forward with the upgrade.

Jun 8 2020, 3:15 PM · observability, Operations
colewhite closed T239833: StatsD Exporter drops relayed metrics, a subtask of T233448: Review prometheus ORES rules for completeness, as Declined.
Jun 8 2020, 3:14 PM · Machine Learning Platform (Current), Patch-For-Review, ORES
colewhite closed T239833: StatsD Exporter drops relayed metrics as Declined.

Still a problem, but probably not big enough to warrant the effort.

Jun 8 2020, 3:13 PM · Patch-For-Review, observability, Operations
colewhite moved T247820: Decide on `service-runner` aggregated prometheus metrics and use of `service` label from In progress to Externally blocked on the observability board.
Jun 8 2020, 3:11 PM · Platform Team Workboards (External Code Reviews), Performance-Team (Radar), observability, Operations