Tue, Jan 14
Thu, Jan 9
A simple option: if puppet-merge.sh is given a treeish, it *only* does the ops repo or the labsprivate repo (depending on what flag was passed).
Believe this has been worked around for now.
Wed, Jan 8
boldly re-opening this, now that the POPs have Ganeti clusters available.
Tue, Jan 7
Nothing in racadm getsel or racadm lclog view (latter just has me logging in over SSH).
Mon, Jan 6
The steps outlined in Filippo's comment happened, with the difference that I chose to use the netmon* machines for this role.
Tue, Dec 31
Mon, Dec 23
Sat, Dec 21
I'll keep an eye on this and close if there's no other noise.
Thanks! I had been confused by Attack protocol: tcp in the reports.
Fri, Dec 20
Thu, Dec 19
Dec 18 2019
BTW I'm curious about both 1) the current need (or lack thereof) for cross-DC Mediawiki DB traffic, and also about 2) the future need. As it would be fairly trivial to make dbctl output this if/when we do need it -- but enabling hostsByName from etcd now is even easier.
+1 to doing #1 and revisiting if it becomes a problem again.
Dec 17 2019
Seems to be working well.
Dec 16 2019
Dec 13 2019
I tested this on mwdebug1001 by manually installing my patch there.
Dec 12 2019
Dec 11 2019
We also have two probes on Comcast's network constantly performing pings towards our RIPE Atlas anchor in ulsfo. Their network performance looks relatively stable over the past 24h: https://w.wiki/DiS
I did some ICMP pings and TCP port 443 traceroutes from RIPE Atlas probes on Comcast's network that had IPv6 enabled. There were a few that couldn't reach Phabricator, including one in the Kirkland area (so probably reasonably close in network-space to @brion ), but it's hard to be sure what this means -- there's always going to be some ambient background number of probes that are malfunctioning in some way. (And most of the probes that failed weren't on the west coast, but in the midwest or on the east coast, which will be a different part of Comcast's network that shows as congested in your mtr.)
Dec 9 2019
Looking at some data in grafana explore, this would have solved most cases of noise in the past few months. So calling it resolved for now.
Dec 6 2019
Dec 5 2019
Dec 4 2019
another instance of P9808
Dec 3 2019
Nov 27 2019
Nov 26 2019
Nov 25 2019
Grafana 6.4.4 is now in use at https://grafana.wikimedia.org.
I've a proposal for doing this:
Nov 22 2019
At ~18:36 there was another spike in long-tail latency, but then, latency seemed to return to 'normal':
Nov 21 2019
I've heard no complaints, and can verify from the logs that it's seen at least some testing by others. Planning to do a final snapshot and move traffic over on Monday afternoon my time.
Nov 20 2019
Sure, I can create the configuration patch.
I grepped through both the swiftrepl logs on ms-fe1005 and also
the aggregated Swift mutation-operation logs on centrallog1001 and
found no mention of the file.
Nov 19 2019
Tim suggested 2 as a concurrency limit. I think we can start less conservative than that, though -- let's say 10? It feels pretty hard for that to hurt normal user traffic, while it should still prevent excessive usage.
Nov 18 2019
upgraded the pie chart plugin to a recent version that actually works with 6.x:
❌email@example.com ~ 🕕🍺 sudo http_proxy=http://webproxy.eqiad.wmnet:8080 grafana-cli plugins install grafana-piechart-panel
Nov 15 2019
Nov 14 2019
Nov 12 2019
Nov 8 2019
+1 to @tstarling's proposal.
Nov 6 2019
Generated a list of all $hosts_allow arguments from rsync::server::module invocations across all of Puppet: P9544
Another thing that just came up: not all users of rsync::server::module are actually passing an array to the $hosts_allow argument: https://gerrit.wikimedia.org/r/c/operations/puppet/+/549142
Need to go through PuppetDB and look for this.