@soworu Ping on my question above. :)
Fri, May 22
Hi CPT, adding you back -- is it possible this is due to this section of security_response_header_filter.js in Restbase?
Thu, May 21
Trying to route this -- @ssingh, should this be assigned to you?
@MBinder_WMF On the offchance you'd already tried logging into phab1001, and it didn't work, but you hadn't gotten around to saying anything yet -- that was because of my mistake, just fixed it, so try again. :)
Wed, May 20
Thanks! I've removed mnoor but not mshaver from the wmf LDAP group, let me know if anything suddenly stops working.
Hi @MNoorWMF, just checking on this -- can you please confirm that the mshaver account is working for you? If it is, I'll remove access from mnoor so that we don't leave both lying around.
Hi @MBinder_WMF! I've created your shell account and added it to the group that @Dzahn created for you. I just ran puppet on bast4002 (your nearest bastion) and both Phab hosts, so you should be able to test your access now by sshing to phab1001.eqiad.wmnet, after following the Setting_up_your_SSH_config steps in the page that Daniel linked.
After chatting with @colewhite (thanks!) I've moved Rodolfo from the wmf group over to the nda group, which should still provide all the access necessary here. Let me know if you have any trouble.
This uses your LDAP password -- the same one you use to log into Wikitech. The username should be either your Wikitech username (Rodolfo Valentim) or your uid/"shell username" (rodolfovalentim) -- the login prompt should tell you which.
@dcipoletti Thanks, looks good! All I need you to do now is:
- sign L3 (there should be an option to sign it right on that page) and
- let me know that you've read https://wikitech.wikimedia.org/wiki/Analytics/Data_access#User_responsibilities (by just saying so here).
Tue, May 19
Looks like this comes from docker_pkg/cli.py line 76, which is the last line here:
I see @Nuria's authorization on the parent task, so going ahead with this.
@dcipoletti Can you please ask your manager to comment on this task, approving your access?
Agree we ought to do this, and I think it's something Envoy can do nearly out of the box. If there's an element Envoy doesn't support, I agree that it's the over-limit FIFO queue. (Out of all the reverse proxies we could use, I'm focusing on Envoy because we're actively deploying it in other parts of the infrastructure, and if it's feasible I think we'd like to use it here too for consistency.)
Mon, May 18
Hi Daniel, welcome to the Foundation! I've added you to the wmf LDAP group, feel free to reopen if you need anything else.
Hi Carly, I've added you to the wmf LDAP group -- feel free to reopen if you need anything else.
Thanks @bd808! It doesn't sound like there's presently anything for the SRE clinic-duty person to do here, so I'm optimistically removing SRE-Access-Requests, but feel free to add it back if you need anything from me (or subsequent clinicians).
Hi Segun, thanks for the clear and complete request! I've granted you "restricted user" access for almost all those domains, which I think is the right level of access for your needs. If you need "full user," let me know and I'll upgrade you (or file another task, if you find out later).
Thu, May 14
Wed, May 6
Tue, May 5
Mon, May 4
We've ruled out a switchover in Q4. We'll continue to do all the non-user-impacting prep work we can, so we might be ready to go early in Q1, if all goes well -- but obviously that's way past the horizon for predicting the continuing impact of COVID-19 on our work capacity, so I'll keep you updated.
Thu, Apr 30
Wed, Apr 29
Tue, Apr 28
Apr 23 2020
Apr 21 2020
Certs renewed! I still need to merge the script for next time, and maybe set it up to run periodically unattended, but I'm resolving this.
Apr 20 2020
The renewal script works as expected, but the procedure as written caused problems because not every mcrouter host is listed in /etc/cergen/mcrouter.manifests.d/mediawiki-hosts.certs.yaml on puppetmaster1001. That meant the missing hosts didn't have certs re-created by cergen, so they were deleted without replacement, which meant puppet failed on those hosts. I reverted the cert renewal, so it still needs to be done.
Apr 17 2020
Apr 14 2020
Apr 13 2020
Apr 10 2020
Yeah, as you can imagine with some folks working reduced hours, SRE is mostly focusing on critical work and we've pushed this off. I haven't asked around in a little bit, but at a guess I'd say that late Q4 is still possible but extremely uncertain.
Apr 3 2020
Welcome Wolfgang! We've already been chatting about some of this stuff, but this Phab task will track progress as we get it taken care of.
Mar 31 2020
Reopening this -- we had some alerts today from Citoid that turned out to be latency associated with only 404s (graph). We suspect the root cause was elevated latency from example.com.
Mar 30 2020
Just snapshot1006 left, it's on my list for this morning.
Mar 26 2020
Mar 25 2020
All kubernetes services are updated in all clusters. (T246868#6000068 turned out to be operator error, there were no unexpected diffs.)
Mar 23 2020
All hosts updated except snapshot1006, held back until later this week per @ArielGlenn.
Mar 19 2020
All remaining MW hosts are updated. That leaves parsoid, snapshot hosts, and a few other odds and ends.
Marking this done: https://gerrit.wikimedia.org/r/581616 deleted apache-fast-test, as httpbb is now complete.
Mar 18 2020
Deployed 1.13.1 to all hosts where we're using envoy as a TLS proxy, that is, C:profile::tlsproxy::envoy. Exception: mendelevium.eqiad.wmnet is still running Jessie for now, and Envoy 1.11.1. (fyi @Dzahn)
Mar 17 2020
Belated update: we decided to upgrade to 1.13.1 (not 1.12.3). So far it's deployed to all MW hosts in codfw, plus the MW canaries in eqiad. Monitoring for impact, then we'll proceed to the rest of MW, then Kubernetes hosts.
Mar 10 2020
Mar 4 2020
I'm pretty sure our envoy-build expert in residence just assigned this to me, but I'm happy to give this a shot anyway.
Mar 3 2020
Feb 24 2020
"On Monday" turned out to be two weeks later -- sorry about that. Conclusions from today's SRE meeting, documented for posterity:
Feb 18 2020
Feb 12 2020
Maybe separate from a script, is there any way we can do this via DNS? Something like grafana.cp-ulsfo.wikimedia.org to specify a site rather than accepting geoip routing.
Feb 10 2020
Feb 7 2020
Thanks! I'll bring this up in the SRE meeting on Monday and go ahead if no one objects.