In T406005: train-presync failed due to git clone failing with gnutls_handshake() failure we saw 2 issues :
- some servers are not properly identified as VIP by httpd.
- mtail configuration for mod_qos is missing some QOS events
In T406005: train-presync failed due to git clone failing with gnutls_handshake() failure we saw 2 issues :
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | ABran-WMF | T406005 train-presync failed due to git clone failing with gnutls_handshake() failure | |||
| Resolved | ABran-WMF | T406017 gerrit: mod_qos allowlist and monitoring improvements | |||
| Resolved | ABran-WMF | T417615 remove mod_qos on Gerrit |
Change #1192831 had a related patch set uploaded (by Arnaudb; author: Arnaudb):
[operations/puppet@production] gerrit: fix allowlist for mod_qos
mtail was not the source of the observability issue. I've fixed the promql query that renders the QoS event rates using this uptick as reference from the logs.
Change #1192831 merged by Arnaudb:
[operations/puppet@production] gerrit: fix allowlist for mod_qos
Change #1192839 had a related patch set uploaded (by Arnaudb; author: Arnaudb):
[operations/puppet@production] Revert^2 "gerrit: fix allowlist for mod_qos"
Change #1192839 merged by Arnaudb:
[operations/puppet@production] Revert^2 "gerrit: fix allowlist for mod_qos"
Change #1192845 had a related patch set uploaded (by Arnaudb; author: Arnaudb):
[operations/puppet@production] Revert^4 "gerrit: fix allowlist for mod_qos"
Change #1192845 merged by Arnaudb:
[operations/puppet@production] Revert^4 "gerrit: fix allowlist for mod_qos"
Change #1192854 had a related patch set uploaded (by Arnaudb; author: Arnaudb):
[operations/puppet@production] Revert^6 "gerrit: fix allowlist for mod_qos"
Change #1192854 merged by Arnaudb:
[operations/puppet@production] Revert^6 "gerrit: fix allowlist for mod_qos"
Change #1192882 had a related patch set uploaded (by Arnaudb; author: Arnaudb):
[operations/puppet@production] gerrit: toggle mod_qos log only
Change #1192882 merged by Arnaudb:
[operations/puppet@production] gerrit: toggle mod_qos log only
We had an outage yesterday related to the QoS limit, it was only reported on IRC/Slack. For the record:
16:25:38 <bearloga> Is anyone else experiencing Gerrit being so weird and not always loading today? 17:54 <aude> is it just me or is gerrit slow today and intermittently not loading? (and to lesser extent maybe phabricator too) 19:08:48 <James_F> Also specifically I'm getting "Plugin install error: https://gerrit.wikimedia.org/r/plugins/wm-motd/static/wm-motd.js load error from https://gerrit.wikimedia.org/r/plugins/wm-motd/static/wm-motd.js " errors. 19:42:33 <bearloga> My experience has been a mix of: Gerrit not loading at all, Gerrit loading after a while, Gerrit loading but blank and errors about a bunch of plugins
The reason is the number of allowed concurrent connections was reduced from 25 to 20 at 9:00 UTC with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193013
The limit has been raised back at 19:45 UTC with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193212