Both the processes on alert1001 and alert2001 have been stuck for a while.
Fri, Nov 18
Thu, Nov 17
Would be great to get the struct in place in Puppet for ripeatlas_measurements.
Wed, Nov 16
service-template-node patch merged: https://github.com/wikimedia/service-template-node/commit/c4dc28c699190dec5f95725e454695306f80cabc
Wed, Nov 9
Nov 4 2022
Oct 27 2022
Oct 24 2022
Oct 20 2022
Oct 19 2022
Oct 18 2022
@jhathaway have you had the opportunity to work with our Ganeti installation yet? if not please take a look at the instructions and start turning up some nodes :) You can file the provisioning tickets as sub-tasks of this one
Oct 13 2022
Oct 11 2022
It would be good to get this done before there's much further progress on T308371: Migrate node-based services in production to node16
Happy quarterly planning season; I was wondering if there was any updated estimates on when this might happen?
Oct 6 2022
Oct 4 2022
Here's my jupyter notebook with a rough analysis of a very impactful hotlink incident (on 2022-09-13) and our biggest organic traffic surge to date (Queen Elizabeth's passing on 2022-09-08):
Sep 30 2022
Just pinging this task as OKR season is upon us and this might be a useful and fun thing to sneak in
Sep 28 2022
Sep 22 2022
Sep 16 2022
Going to be bold and append to the task description with what we discussed in the cachebust WG meeting today (so that anyone can update it and tick boxes as we go).
Sep 12 2022
As a note, such sites also include "everything on WMCS / toolserver" and it would probably be good to extend NEL to that as well.
Sep 8 2022
Sep 7 2022
Sep 6 2022
Sep 4 2022
Sep 2 2022
Sep 1 2022
This is ready, was tested by hand on cumin2002, and is now deployed to both cumin hosts.
Aug 31 2022
@Marostegui I actually implemented this not as a new flavor, but instead as a boolean attribute omit_replicas_in_mwconfig on the section object. Once the patch is merged and deployed I'll let you know.
I've written a patch, which is hopefully correct.
Aug 29 2022
@Marostegui That looks correct to me.
Aug 24 2022
also cc @EChetty
Aug 23 2022
ping @JAllemandou -- did I put this on the right phab tag? It'd be really awesome to have and I suspect is a pretty easy change
Aug 19 2022
Looks like this is resolved...?
Aug 17 2022
Summary of a conversation that ori, joe, and I had on IRC today:
- You get some of this "for free" once the appservers are on k8s -- you can add labels to your pods that will be automatically propagated to logstash/prometheus
- However, it would be valuable to have a framework like this beyond just the appservers or k8s services
- For instance, Traffic has done a lot of that kind of experimentation on cp nodes with ad-hoc mechanisms in the past, same for some other teams
- Any Puppet+Prometheus plumbing should be reusable, at least
- in prometheus you can have those same puppet facts exported by node-exporter after having puppet generate a textfile for it, and then, you can join metrics together at query time
- Logstash might be more difficult that Prometheus (although I don't know for sure, maybe there's an easy mechanism with a filter script)
- Perhaps those tags could be injected via rsyslog (as configured via puppet)?
Aug 11 2022
Aug 9 2022
As a followup to this past weekend's misconfiguration that delayed paging, victorops.py now has a check_esc_policy_config subcommand.