Mon, Mar 23
DogStatsD shows some promise here. It's a statsd extension that statsd_exporter supports and enables dynamic labels. In testing, the statsd proxy doesn't support the extension, but translation is trivial if necessary.
Fri, Mar 20
Adjusting the timeout resolved the issue. The graphs are clean now.
Thu, Mar 19
Tue, Mar 17
Good idea forking the original task. Thanks for that!
Mon, Mar 16
That CSP works well. I think cas needs to respond with an appropriate Access-Control-Allow-Origin. https://apereo.github.io/cas/5.2.x/installation/Configuration-Properties.html#http-web-requests
Fri, Mar 13
It appears a reload does resolve the issue, but it takes some time for Prometheus to fetch and store an update. I used kill -HUP <PID> to reload.
Thu, Mar 12
It looks like most of the issues stems from CSP blocking mixed-content. idp.wikimedia.org is redirecting to http per this changeset.
Fri, Mar 6
I created a PR to service-runner for the updates to heapwatch metrics. Thanks for the feedback!
... with 2M docs indexed it looks like the change might only be from ~800 bytes/doc to ~950 bytes/doc.
Thu, Mar 5
I have concerns about re-implementing the _all field given that it is no longer "free." This means if we use copy_to, each log will take twice the disk space and the index cost in kind. With stack traces and request/response logs including response bodies, I can see this adding up quickly (unless we omit these from the new _all field).
It looks like the issue has been run into before in the Beats family of software. There is a template setting that allows us to define an array of fields that are default query fields:
Mon, Mar 2
Feb 28 2020
In response to @Joe's concerns:
As I think about it more, it's the wire format being wholly incompatible with Prometheus format. In order to make it work, StatsD requires a lot of configuration to adequately convert, and managing that configuration will be burdensome.
Feb 27 2020
One alternative is to adopt a sidecar in the form of statsd_exporter and have it do the heavy lifting of translating MediaWiki and MW Extension metrics into Prometheus-compatible format. I see two major pain points with this solution. The first is settling on a pattern of mapping metrics to Prometheus metrics, and second is managing change over time.
Per @fgiunchedi recommendation, I put together a very basic mockup of how DirectFileStore might look in prometheus_client_php.
Feb 13 2020
I went ahead and updated this dashboard and added the Prometheus version next to the Graphite version as an example. During the process, I amended a couple metrics that were missed or misconfigured.
I see the value in a refactor/cleanup if what is currently being captured is not everything we need to (at least) recreate the current dashboards.
Feb 3 2020
@MoritzMuehlenhoff doing that shouldn't hurt anything AFAIK.
Jan 24 2020
Today we had the same error in varnishmtail on a new buster host (cp4032).
Jan 18 2020
The latest patch appears to help a lot. There is still a discrepancy that I haven't been able to track down.
$ touch forwarded_new.txt && socat -t 0 FILE:forwarded_new.txt udp-listen:9125,fork $ ./statsd_exporter_gerrit_554544 --statsd.mapping-config=statsd_exporter.conf --statsd.listen-udp=:8125 --statsd.relay-address=127.0.0.1:9125 $ ./udpreplay --pps 2000 --host localhost --port 8125 ores1001.pcap
Jan 17 2020
Dec 20 2019
I've moved ahead and added you to the wmf ldap group.
Dec 19 2019
We recently had a conversation about this.
Great idea. Lets raise it at the next SRE meeting.
Dec 18 2019
@jcrespo that sounds bad to me. Perhaps query monitoring is a great candidate for a more specific and limited group?
Dec 17 2019
The changesets look great and appear to do the right thing.
Dec 16 2019
Dec 13 2019
Dec 12 2019
Dec 11 2019
Dec 10 2019
Dec 9 2019
needs to be done in codfw as well
Dec 6 2019
Dec 5 2019
@Rxy I've added you to the NDA group which should grant you access to Logstash. Please let me know if you encounter any related issue.
@DannyH I've moved ahead and added you to the wmf ldap group on the basis of your status as staff. We still need to know what you need this access for though.
It seems clear that db1062 shouldn't be pooled anywhere. Ran the dbctl depool utility and it's gone from s7.
Since it's not used in dashboards, what do we do with the model? I imagine it's useful, but I'm not sure how.
Dec 4 2019
@Mstyles is now in the wmf ldap group. Please let me know if you encounter any related issue.
I did more research and found a usage pattern that didn't initially occur to me.