Page MenuHomePhabricator
Feed Advanced Search

Today

fgiunchedi moved T280257: Thanos compaction stopped due to local filesystem space shortage from Backlog to Doing on the User-fgiunchedi board.
Mon, Apr 19, 9:28 AM · Patch-For-Review, User-fgiunchedi, observability
fgiunchedi closed T280371: creation of raju@wikipedia.org for fundraising team as Resolved.

Hi @MNoorWMF, this is implemented now! Resolving the task but feel free to reopen if something is amiss.

Mon, Apr 19, 8:19 AM · SRE

Fri, Apr 16

fgiunchedi added a project to T280257: Thanos compaction stopped due to local filesystem space shortage: User-fgiunchedi.
Fri, Apr 16, 12:25 PM · Patch-For-Review, User-fgiunchedi, observability
fgiunchedi added a comment to T280257: Thanos compaction stopped due to local filesystem space shortage.

Issue has been mitigated by reimaging thanos-fe2001 (the host that runs thanos-compact) with a raid0 /srv, I'll reimage the other frontends next week

Fri, Apr 16, 12:25 PM · Patch-For-Review, User-fgiunchedi, observability
fgiunchedi triaged T279804: Visits to Wikimedia properties should not be used for Google ad targeting (FLoC) as Medium priority.
Fri, Apr 16, 8:45 AM · fundraising-tech-ops, Patch-For-Review, SRE, Traffic, Privacy Engineering, Privacy
fgiunchedi triaged T280203: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) as Medium priority.
Fri, Apr 16, 8:45 AM · Patch-For-Review, SRE, serviceops
fgiunchedi triaged T280210: Package php-ast in {stretch,buster}-wikimedia/component as Medium priority.
Fri, Apr 16, 8:45 AM · Packaging, SRE
fgiunchedi triaged T280232: Uncached wiki requests partially unavailable due to excessive request rates from a bot as High priority.
Fri, Apr 16, 8:44 AM · SRE, Wikimedia-Incident
fgiunchedi triaged T280253: Allow bast1003 in management routers (and drop bast1002) as Medium priority.
Fri, Apr 16, 8:44 AM · SRE, netops
fgiunchedi triaged T280162: NDA for Superset Request from WMDE Employee Manuel as Medium priority.
Fri, Apr 16, 8:39 AM · Patch-For-Review, SRE, LDAP-Access-Requests
fgiunchedi closed T280242: Requesting access to graphite hosts for awight as Resolved.

This is implemented now! @awight I've expanded a little https://wikitech.wikimedia.org/wiki/Graphite#Deleting_metrics on how to delete metrics, please reach out if you have questions!

Fri, Apr 16, 8:37 AM · SRE, SRE-Access-Requests, Graphite, observability
fgiunchedi updated the task description for T280242: Requesting access to graphite hosts for awight.
Fri, Apr 16, 8:34 AM · SRE, SRE-Access-Requests, Graphite, observability
fgiunchedi closed T280177: Requesting deployment access for HMonroy as Resolved.

@HMonroy you are now a member of deployment group! Resolving task, please reopen if something is amiss.

Fri, Apr 16, 8:26 AM · Release-Engineering-Team, SRE, SRE-Access-Requests
fgiunchedi updated the task description for T280177: Requesting deployment access for HMonroy.
Fri, Apr 16, 8:24 AM · Release-Engineering-Team, SRE, SRE-Access-Requests
fgiunchedi added a comment to T280257: Thanos compaction stopped due to local filesystem space shortage.

Since the frontends are meant to be stateless, I think I prefer #2 over #1 to avoid special-casing a data partition on one of the backends. Double the space for compaction should buy us quite some time.

Fri, Apr 16, 8:16 AM · Patch-For-Review, User-fgiunchedi, observability
fgiunchedi updated the task description for T280257: Thanos compaction stopped due to local filesystem space shortage.
Fri, Apr 16, 8:08 AM · Patch-For-Review, User-fgiunchedi, observability

Thu, Apr 15

fgiunchedi created T280257: Thanos compaction stopped due to local filesystem space shortage.
Thu, Apr 15, 2:28 PM · Patch-For-Review, User-fgiunchedi, observability
fgiunchedi updated the task description for T273064: Setup Analytics team in VO/splunk oncall.
Thu, Apr 15, 9:47 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters, User-fgiunchedi, observability
fgiunchedi added a comment to T273064: Setup Analytics team in VO/splunk oncall.

Alright! @fgiunchedi I added the alert to superset, and when it alerted on Icinga, @Ottomata and I got an alert from Splunk OnCall (text message, but now I've customized it to be a push notification).

Thu, Apr 15, 9:46 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters, User-fgiunchedi, observability
fgiunchedi closed T279517: Add herald rules to include #observability in related tags/projects as Resolved.

This is done! (see subtask)

Thu, Apr 15, 9:36 AM · User-fgiunchedi, observability
fgiunchedi updated subscribers of T280177: Requesting deployment access for HMonroy.

Thank you @Dzahn ! We're indeed seeking approval from Release-Engineering-Team (cc @thcipriani perhaps?)

Thu, Apr 15, 9:32 AM · Release-Engineering-Team, SRE, SRE-Access-Requests
fgiunchedi updated the task description for T280177: Requesting deployment access for HMonroy.
Thu, Apr 15, 9:30 AM · Release-Engineering-Team, SRE, SRE-Access-Requests

Wed, Apr 14

fgiunchedi updated the task description for T280119: Add #observability to related projects.
Wed, Apr 14, 2:50 PM · User-RhinosF1, phabricator maintenance bot, Phabricator
fgiunchedi moved T279517: Add herald rules to include #observability in related tags/projects from Backlog to Doing on the User-fgiunchedi board.
Wed, Apr 14, 1:02 PM · User-fgiunchedi, observability
fgiunchedi closed T279531: Add Lena Meintrup to the ldap/wmde and ldap/nda group as Resolved.

@Lena_WMDE you are now in nda and wmde groups, please verify access and reopen the task if something is amiss!

Wed, Apr 14, 12:41 PM · SRE, LDAP-Access-Requests
fgiunchedi closed T280073: Grant Access to wmf for HNordeen as Resolved.

User added to wmf group (chatted on IRC with @jbond), @HNordeenWMF you should have access now!

Wed, Apr 14, 12:37 PM · SRE, LDAP-Access-Requests
fgiunchedi added a parent task for T280119: Add #observability to related projects: T279517: Add herald rules to include #observability in related tags/projects.
Wed, Apr 14, 10:06 AM · User-RhinosF1, phabricator maintenance bot, Phabricator
fgiunchedi added a subtask for T279517: Add herald rules to include #observability in related tags/projects: T280119: Add #observability to related projects.
Wed, Apr 14, 10:06 AM · User-fgiunchedi, observability
fgiunchedi created T280119: Add #observability to related projects.
Wed, Apr 14, 10:06 AM · User-RhinosF1, phabricator maintenance bot, Phabricator
fgiunchedi triaged T280073: Grant Access to wmf for HNordeen as Medium priority.
Wed, Apr 14, 9:57 AM · SRE, LDAP-Access-Requests
fgiunchedi closed T279245: Degraded RAID on ms-be2028 as Resolved.

Thank you @Papaul, all good

Wed, Apr 14, 7:44 AM · User-fgiunchedi, SRE, ops-codfw

Tue, Apr 13

fgiunchedi added a project to T279517: Add herald rules to include #observability in related tags/projects: User-fgiunchedi.
Tue, Apr 13, 3:12 PM · User-fgiunchedi, observability
herron awarded T279517: Add herald rules to include #observability in related tags/projects a Like token.
Tue, Apr 13, 3:12 PM · User-fgiunchedi, observability
fgiunchedi moved T279517: Add herald rules to include #observability in related tags/projects from Inbox to In progress on the observability board.
Tue, Apr 13, 3:12 PM · User-fgiunchedi, observability
fgiunchedi moved T279601: reclaim icinga1001.wikimedia.org from Inbox to In progress on the observability board.
Tue, Apr 13, 3:11 PM · observability, decommission-hardware
fgiunchedi moved T279602: reclaim icinga2001.wikimedia.org from Inbox to In progress on the observability board.
Tue, Apr 13, 3:11 PM · observability, decommission-hardware
fgiunchedi moved T279245: Degraded RAID on ms-be2028 from Backlog to Doing on the User-fgiunchedi board.
Tue, Apr 13, 2:21 PM · User-fgiunchedi, SRE, ops-codfw
fgiunchedi added a project to T279245: Degraded RAID on ms-be2028: User-fgiunchedi.
Tue, Apr 13, 2:12 PM · User-fgiunchedi, SRE, ops-codfw
fgiunchedi triaged T277064: Packaging PostGIS 3.1 for the new Maps stack as Medium priority.
Tue, Apr 13, 1:53 PM · Product-Infrastructure-Team-Backlog, SRE, Packaging, serviceops, Maps
fgiunchedi triaged T279307: Jenkins fails onCI puppet with: EnvironmentError: 404 Client Error: Not Found for url: https://pypi.org/simple/pkg-resources/ as Medium priority.
Tue, Apr 13, 1:53 PM · serviceops, SRE
fgiunchedi triaged T279380: Add Traffic's notion of "from public cloud" to Analytics webrequest data as Medium priority.
Tue, Apr 13, 1:04 PM · Patch-For-Review, SRE, Analytics, Traffic
fgiunchedi triaged T279503: Unable to load en.wikipedia.org from 84.19.61.192/26 as Medium priority.
Tue, Apr 13, 1:03 PM · Traffic, netops, SRE
fgiunchedi triaged T279664: Decide on details of progressive Multi-DC roll out as Medium priority.
Tue, Apr 13, 1:01 PM · SRE, Traffic, serviceops, Performance-Team
fgiunchedi triaged T279701: Figure out mailman3 search index config as Medium priority.
Tue, Apr 13, 1:01 PM · SRE, Wikimedia-Mailing-lists
fgiunchedi moved T279531: Add Lena Meintrup to the ldap/wmde and ldap/nda group from Awaiting User Input to NDA Pending on the LDAP-Access-Requests board.
Tue, Apr 13, 1:00 PM · SRE, LDAP-Access-Requests
fgiunchedi triaged T263027: Missing 'notify' for some Icinga configuration files as Low priority.
Tue, Apr 13, 12:54 PM · SRE, observability
fgiunchedi triaged T279764: Requesting access to deployment for Silvan Heintze as Medium priority.
Tue, Apr 13, 12:51 PM · SRE, SRE-Access-Requests
fgiunchedi added a comment to T279764: Requesting access to deployment for Silvan Heintze.

As a WMDE Engineering Manager a approve this request. How approves it on WMF's end these days? @thcipriani or @greg?

Tue, Apr 13, 12:34 PM · SRE, SRE-Access-Requests
fgiunchedi updated the task description for T279764: Requesting access to deployment for Silvan Heintze.
Tue, Apr 13, 12:29 PM · SRE, SRE-Access-Requests
fgiunchedi updated the task description for T279764: Requesting access to deployment for Silvan Heintze.
Tue, Apr 13, 12:26 PM · SRE, SRE-Access-Requests

Thu, Apr 8

fgiunchedi merged T279644: Degraded RAID on ms-be2028 into T279245: Degraded RAID on ms-be2028.
Thu, Apr 8, 11:04 AM · User-fgiunchedi, SRE, ops-codfw
fgiunchedi merged task T279644: Degraded RAID on ms-be2028 into T279245: Degraded RAID on ms-be2028.
Thu, Apr 8, 11:04 AM · SRE, ops-codfw
fgiunchedi reopened T279245: Degraded RAID on ms-be2028 as "Open".

@Papaul I'm running into troubles with the disk I haven't seen before (xfs crashes after a while, log below). Can we try another spare disk just to exclude the disk itself as faulty (or just plain old)? Thank you!

Thu, Apr 8, 10:19 AM · User-fgiunchedi, SRE, ops-codfw
fgiunchedi created T279637: Upgrade Swift ms cluster to Buster (or Bullseye) and revisit mkfs.xfs options.
Thu, Apr 8, 9:31 AM · SRE-swift-storage
fgiunchedi added a comment to T279192: Server side upload for Sturm.

Failed for the second time with An unknown error occurred in storage backend "local-swift-eqiad". @fgiunchedi I'm not sure if I can see the detailed logs for this one, could you either find them or help me locate them? Thanks!

Thu, Apr 8, 8:47 AM · User-Urbanecm, video2commons, Commons, Wikimedia-Site-requests
fgiunchedi created T279621: Set up Misc Object Storage Service (moss).
Thu, Apr 8, 7:55 AM · SRE-swift-storage

Wed, Apr 7

fgiunchedi created T279517: Add herald rules to include #observability in related tags/projects.
Wed, Apr 7, 9:35 AM · User-fgiunchedi, observability
fgiunchedi closed T279245: Degraded RAID on ms-be2028 as Resolved.

Thank you @Papaul !

Wed, Apr 7, 8:30 AM · User-fgiunchedi, SRE, ops-codfw

Tue, Apr 6

fgiunchedi moved T278280: Setup monitoring for mailman3 from Inbox to Radar on the observability board.
Tue, Apr 6, 3:21 PM · observability, SRE, Wikimedia-Mailing-lists
fgiunchedi added a project to T278309: Move librenms deployment to Debian package: User-fgiunchedi.
Tue, Apr 6, 3:20 PM · User-fgiunchedi, Patch-For-Review, observability
fgiunchedi moved T278309: Move librenms deployment to Debian package from Inbox to Backlog on the observability board.
Tue, Apr 6, 3:20 PM · User-fgiunchedi, Patch-For-Review, observability
fgiunchedi moved T276697: Implement central logging for mailman3 from Inbox to Backlog on the observability board.
Tue, Apr 6, 3:20 PM · observability, SRE, Wikimedia-Mailing-lists
fgiunchedi moved T275920: icinga login case mismatch from Inbox to Backlog on the observability board.
Tue, Apr 6, 3:19 PM · observability, SRE, Icinga
fgiunchedi added a project to T278514: Wishlist for AlertManager alerts from Grafana: User-fgiunchedi.
Tue, Apr 6, 3:16 PM · User-fgiunchedi, Performance-Team (Radar), observability
fgiunchedi moved T278514: Wishlist for AlertManager alerts from Grafana from Inbox to In progress on the observability board.
Tue, Apr 6, 3:16 PM · User-fgiunchedi, Performance-Team (Radar), observability
fgiunchedi moved T278906: Change cpjobqueue "processing" time metrics from pre-aggregated quantile to native Prometheus histogram bucket from Inbox to Radar on the observability board.
Tue, Apr 6, 3:16 PM · Platform Team Workboards (Clinic Duty Team), observability, WMF-JobQueue
fgiunchedi moved T278923: Resolved emails sometimes as new email threads and sometimes not from Inbox to Radar on the observability board.
Tue, Apr 6, 3:15 PM · Performance-Team (Radar), observability
fgiunchedi moved T278946: Add alerting for Memcached timeout errors from Inbox to Radar on the observability board.
Tue, Apr 6, 3:15 PM · observability, SRE, serviceops, Sustainability (Incident Followup)
fgiunchedi moved T279112: meta.domain in Logstash seems to usually not like doing term matches from Inbox to Radar on the observability board.
Tue, Apr 6, 3:14 PM · Instrument-ClientError, observability, Wikimedia-Logstash
fgiunchedi moved T279342: Migrate colocated kafka-logging brokers to dedicated kafka-logging hosts from Inbox to In progress on the observability board.
Tue, Apr 6, 3:13 PM · Patch-For-Review, observability
fgiunchedi moved T273064: Setup Analytics team in VO/splunk oncall from In progress to Radar on the observability board.
Tue, Apr 6, 2:41 PM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters, User-fgiunchedi, observability
fgiunchedi assigned T279245: Degraded RAID on ms-be2028 to Papaul.

@Papaul please replace the failed 4TB disk, led should be blinking, thank you !

Tue, Apr 6, 8:18 AM · User-fgiunchedi, SRE, ops-codfw
fgiunchedi updated the task description for T268435: Add ms-be106[0-3] to swift.
Tue, Apr 6, 7:24 AM · Patch-For-Review, User-fgiunchedi, SRE-swift-storage
fgiunchedi closed T268435: Add ms-be106[0-3] to swift as Resolved.

With the last rebalance the hosts are now fully in service (at weight 8000), netbox is updated.

Tue, Apr 6, 7:24 AM · Patch-For-Review, User-fgiunchedi, SRE-swift-storage
fgiunchedi closed T268435: Add ms-be106[0-3] to swift, a subtask of T266016: Refresh and expand Swift hardware capacity, as Resolved.
Tue, Apr 6, 7:24 AM · User-fgiunchedi, SRE-swift-storage

Thu, Apr 1

fgiunchedi created T279049: Envoy (admin) logs are not rotated/expired.
Thu, Apr 1, 10:51 AM · serviceops
fgiunchedi added a comment to T271140: Some Data Persistence DB clusters apparently do not support IPv6.

@fgiunchedi Is there any process we should follow to test/make sure everything is okay if we add ipv6 DNS for ms-be and ms-fe?

Thu, Apr 1, 7:56 AM · IPv6, DBA, SRE-tools
fgiunchedi added a comment to T265435: codfw: Testing Out Sample PDUs.

Sounds good @Papaul ! So in Icinga we're monitoring each phase to see if it hits 80%/85% of the 30A breaker, and in Prometheus we're collecting most of what we can via snmp (current, voltage, sensors).

Thu, Apr 1, 7:49 AM · observability, ops-codfw, DC-Ops, SRE

Wed, Mar 31

fgiunchedi added a comment to T278923: Resolved emails sometimes as new email threads and sometimes not .

I think gmail's threading logic groups messages with "similar enough" subjects and "close enough" in time, which would explain the behavior above. Have you experienced counter-examples to this theory ?

Wed, Mar 31, 12:13 PM · Performance-Team (Radar), observability
fgiunchedi closed T278908: Degraded RAID on logstash2022 as Resolved.

Tentatively resolving, will get reopened if it happens again.

Wed, Mar 31, 7:52 AM · SRE, ops-codfw
fgiunchedi added a comment to T247364: Forward port Python2 files to Python3 in Puppet Repository.

Thank you for taking care of the Python 3 migration in Puppet !

Wed, Mar 31, 7:51 AM · Patch-For-Review, User-MoritzMuehlenhoff, User-crusnov, User-jbond, Python3-Porting, SRE-tools, Puppet
fgiunchedi added a comment to T278908: Degraded RAID on logstash2022.

Mhh sdc2 got booted off md0 but stayed in md1, I didn't see any obvious messages/failures about sdc in dmesg so I added the disk back, let's see what happens

Wed, Mar 31, 7:47 AM · SRE, ops-codfw

Tue, Mar 30

fgiunchedi updated the task description for T272453: Create phabricator tasks from Alertmanager alerts.
Tue, Mar 30, 1:55 PM · Patch-For-Review, User-fgiunchedi, observability

Mon, Mar 29

fgiunchedi added a comment to T267650: LibreNMS supports more than one Alertmanager address.

Upstream merged the PR, will be included in the next LibreNMS release \o/

Mon, Mar 29, 8:52 AM · Upstream, User-fgiunchedi, observability
fgiunchedi added a project to T225140: Icinga alerts that should open tasks instead of alerting: User-fgiunchedi.
Mon, Mar 29, 8:26 AM · User-fgiunchedi, observability
fgiunchedi updated the task description for T266016: Refresh and expand Swift hardware capacity.
Mon, Mar 29, 7:48 AM · User-fgiunchedi, SRE-swift-storage

Fri, Mar 26

fgiunchedi added a comment to T225140: Icinga alerts that should open tasks instead of alerting.

Since we've set up task opening for AM alerts this quarter we can definitely tackle some of these.

Fri, Mar 26, 11:28 AM · User-fgiunchedi, observability
fgiunchedi added a comment to T275752: Jobrunner on Buster occasional timeout on codfw file upload.

@fgiunchedi could you re-run your analysis to see if mw1307 (10.64.0.169) is still exhibiting the issue?

Fri, Mar 26, 10:20 AM · Sustainability, serviceops, SRE

Thu, Mar 25

fgiunchedi closed T272977: Self-service deployment for alerting rules as Resolved.

This is complete! Alerts will get deployed from operations/alerts to Prometheus instances

Thu, Mar 25, 7:40 AM · Patch-For-Review, User-fgiunchedi, observability
fgiunchedi updated the task description for T272977: Self-service deployment for alerting rules.
Thu, Mar 25, 7:40 AM · Patch-For-Review, User-fgiunchedi, observability

Wed, Mar 24

fgiunchedi added a comment to T278315: global http_proxy setting.

I also have a shell alias (proxy-on / proxy-off) for convenience and use it in a few cases when building packages require internet. +1 to have a shared alias available (since in practice we already have that, just sprinkled in a few places) (and -1 to have proxy enabled by default, for reasons already mentioned)

Wed, Mar 24, 2:10 PM · SRE, Puppet, User-jbond
fgiunchedi created T278309: Move librenms deployment to Debian package.
Wed, Mar 24, 11:12 AM · User-fgiunchedi, Patch-For-Review, observability
fgiunchedi added a comment to T273716: Improve Alertmanager/LibreNMS notifications.

Thank you for the feedback! Unfortunately I think addressing some of the feedback will need a librenms patch

Wed, Mar 24, 11:00 AM · Patch-For-Review, observability
fgiunchedi closed T278210: Change repeat interval for performance team alerts as Resolved.

This is complete! Please reopen if sth is amiss

Wed, Mar 24, 7:37 AM · Performance-Team (Radar), User-fgiunchedi, observability
fgiunchedi closed T278210: Change repeat interval for performance team alerts, a subtask of T272979: Onboard Perf Team to new Alerting Toolset , as Resolved.
Wed, Mar 24, 7:37 AM · Performance-Team (Radar), User-fgiunchedi, observability

Mar 18 2021

fgiunchedi added a comment to T184744: Improve access to Commons image data for research and development.

I picked this up last week again, and ran a more substantial test job using 50 workers downloading ~1million commons images (400px thumbnails) using a spark job. Some more questions before I run a job on the full datasets (~53M image files). Looking at the grafana dashboard,

  • what does the increase in put 201 in the object state-changing? cache misses for the thumbnails that get filled?

Possible but hard to say from that graph, when did the job start/finish ? I'm assuming ~23:30 to ~1:40 but best to confirm
Something else to check for thumbnailing activity is the Thumbor dashboard (for the same timeframe):
https://grafana.wikimedia.org/d/Pukjw6cWk/thumbor?orgId=1&from=1615330800000&to=1615341600000

Thanks for that dashboard, that is useful to look at. I should have clarified the period, your assumption is right ~23:30 to ~1:40.

  • client errors chart. we do expect to see a lot of 404 since some images we query for will now be deleted. however, I also notice a high number of timeouts with a timeout of 5 seconds. Is this to be expected? I am doing retries and will increase the timeout but it seems high.

Yes some timeouts are to be expected for sure; were the timeouts for a certain kind of file types? It might be a thumb miss plus long thumb regeneration time, or it might be Swift timing out while fetching the image. Indeed timeout + exponential retries should get you basically all the way there.

@fgiunchedi Does this dashboard and approach look ok to you from the swift perspective? If so, I kick off the main job this week, it is expected to run for ~6days.

It seems to generally be fine, although I'm surprised at the thumbnailing activity being higher than expected, as in 400px should be pre-generated at upload time by MW for most wikis ($wgThumbLimits in mediawiki-config).

Looking at the errors, I noticed that I actually used a 3s timeout - but still there are so many timeouts, with no retries almost 25% of requests fail. All errors are timeout errors, and the distribution of file types of successful and failed attempts are roughly the same. Is it possible that somehow the servers are overloaded?

Mar 18 2021, 2:44 PM · User-ArielGlenn
fgiunchedi committed rODGZ92a1a5404e07: Add debian/ packaging (authored by fgiunchedi).
Add debian/ packaging
Mar 18 2021, 2:15 PM
fgiunchedi added a comment to T277739: rsyslog-kubernetes missing in buster-wikimedia.

After a chat with Filippo, IIUC 8.2008.0-1 is used only on centrallog nodes (hence the component) but we might want to use 8.19011 provided on Buster and add the custom bits for rsyslog-kubernetes, uploading all to main. The alternative would be to modify 8.2008.0 and use the component instead (so adding rsyslog-kubernetes to it).

Any preference? :)

I have no idea why 8.2008.0-1 is used on centrallog , but if it is compatible with 8.1901 and can have kubernetes support, I 'd just upload it to main.

Mar 18 2021, 2:10 PM · SRE, observability
fgiunchedi updated the task description for T272977: Self-service deployment for alerting rules.
Mar 18 2021, 8:40 AM · Patch-For-Review, User-fgiunchedi, observability
fgiunchedi added a comment to T224579: Migrate irc.wikimedia.org/kraz to Buster.

And connections from Prometheus kept piling up. AFAIK the service/exporter is not owned ATM, I've restarted the exporter but this is obviously bound to happen again.

Mar 18 2021, 8:34 AM · Patch-For-Review, User-notice, Wikimedia-IRC-RC-Server, SRE