For the record, just saying pointing out that the question of a new VM versus mwmaint1002 is probably irrelevant here. We can do both with what looks like minimal repercussions.
Fri, Feb 15
Thu, Feb 14
Wed, Feb 13
Tue, Feb 12
Mon, Feb 11
Seems like we forgot to close this one
Fri, Feb 8
Thu, Feb 7
How is the data going to make it from Hadoop, which resides in the analytics cluster and is firewalled at the router level (aka network ACLs) to whichever machine is chosen for this? Has this been already worked out (cause I see no mention of this)?
I 've removed the memory part cause https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&var-server=mwdebug1001&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&from=now-7d&to=now shows that mwdebug1002 is never pressed for more memory. I 've also bumped vpu count to 4. I 'll resolve this for now, if we need more resources feel free to reopen.
I 've added the capacity to varnish puppet code to augment the wikimedia_trust and wikimedia_nets constructs, followed by a patch adding the new WMCS IP space to wikimedia_nets in order to exempt that IP space from rate limiting. @BBlack lemme know what you think.
Wed, Feb 6
Assuming we go for option (1), how would we go around and install these packages? And how would we instruct the app to load them?
It seems like using NODE_PATH is discouraged these days and would anyway require changes to blubber to set NODE_PATH. We used to have that variable set and have moved away from it in https://gerrit.wikimedia.org/r/#/c/blubber/+/460997/
curl -s https://blubberoid.wikimedia.org/?spec |head -5 --- openapi: '3.0.0' info: title: Blubberoid description: >
The restart of varnish-frontend on cp3030 indeed resolved the issue. I 'll lower priority but leave task open. Feel free to resolve however.
cp3030 seems to be in some trouble since approximately 04:30 
Graphs in codfw mail and eqiad mail point out that this behavior has not reemerged since Jan 25, so I 'll tentatively close this as resolved. Feel free to reopen
Tue, Feb 5
Wed, Jan 30
Fri, Jan 25
Cleaned up some 10k emails from 2 more host with the same pattern as yesterday and blocked them as well.
Thu, Jan 24
info-en-c seems to be down to 167 messages now and the hosts participating in the storm remain blocked. I 'll lower priority for now.
The email storm can be witnessed at https://grafana.wikimedia.org/d/000000451/mail?orgId=1&from=1548346803405&to=1548357438520&var-datasource=codfw%20prometheus%2Fops (this is the secondary DC) and https://grafana.wikimedia.org/d/000000451/mail?orgId=1&from=1548346740714&to=1548358041101&var-datasource=eqiad%20prometheus%2Fops (for the primary DC). The 2 distinct phases are there because of me freezing a ton of emails which later got thawed and eventually delivered.
Yes indeed. I 've switched the ownership of the list to the email provided and issued a new password for it. It should arrive with an email to the new admin email. Let me know if anything goes wrong. I am closing this as resolved. Thanks!
Change has been merged and deployed but up to now no data has been exported. The celery key currently looks empty so I guess this is expected?
Wed, Jan 23
Adding that a, from our esams DC, traceroute to this IP seems to stop before beelive.ru
Tue, Jan 22
We can probably get away with reusing https://github.com/oliver006/redis_exporter that we already use. It does have a check-keys parameter that allow us to count a lists elements. It's a bit slower as an implementation that in the next version as it uses scan but in my tests it did return within ~1s. I get in prometheus the following as an example
Mon, Jan 21
Jan 19 2019
Jan 18 2019
Jan 17 2019
The Graph extension could potentially use an MCR slot to store the Vega JSON rather than embedding it in the wikitext inside a <graph> tag. But that wouldn't support the existing uses where templates and modules are being used to generate the Vega JSON.
Jan 16 2019
Adding performance-team and core platform team per SoS recommendation to request for help.