Mon, Oct 19
Note October 27 is also scheduled for the MediaWiki datacenter switchback -- please let's not have both events going at the same time. :) The switchback is scheduled to wrap up by 15:00 UTC, but it's possible we'll still be doing cleanup work for a while after, depending on how things go.
Wed, Oct 7
Tue, Oct 6
This should be done! Thanks one last time for your patience.
Thu, Oct 1
October 27 confirmed, and I just filed T264364. Meet you over there. :)
Tue, Sep 29
Apologies, I've had a hectic few days. :) It turns out the engprod offsite is the week of Oct 26, so there already won't be a MW release train that week -- releng is understandably reluctant to skip the release two weeks in a row, which is a good argument against Oct 21. We don't need anyone from engprod online for the switchover itself, though, so we could run it during their offsite.
Mon, Sep 28
Tue, Sep 22
October 21 looks good, tentatively. Let me confirm with folks.
Sep 17 2020
I see you already put this on the agenda for the next SRE meeting on Monday, thanks. :) I expect it'll be uncontroversial, but if you don't need it urgently, we'll bring it up at the meeting and get it merged that afternoon.
rzl@lists1001:~$ sudo disable_list daily-article-hy /var/lib/mailman/data/heldmsg-daily-article-hy-314.pck /var/lib/mailman/data/heldmsg-daily-article-hy-312.pck /var/lib/mailman/data/heldmsg-daily-article-hy-325.pck /var/lib/mailman/data/heldmsg-daily-article-hy-309.pck /var/lib/mailman/data/heldmsg-daily-article-hy-318.pck /var/lib/mailman/data/heldmsg-daily-article-hy-321.pck /var/lib/mailman/data/heldmsg-daily-article-hy-320.pck /var/lib/mailman/data/heldmsg-daily-article-hy-323.pck /var/lib/mailman/data/heldmsg-daily-article-hy-315.pck /var/lib/mailman/data/heldmsg-daily-article-hy-319.pck /var/lib/mailman/data/heldmsg-daily-article-hy-324.pck /var/lib/mailman/data/heldmsg-daily-article-hy-310.pck /var/lib/mailman/data/heldmsg-daily-article-hy-316.pck /var/lib/mailman/data/heldmsg-daily-article-hy-317.pck /var/lib/mailman/data/heldmsg-daily-article-hy-313.pck /var/lib/mailman/data/heldmsg-daily-article-hy-311.pck /var/lib/mailman/data/heldmsg-daily-article-hy-322.pck daily-article-hy disabled. Archives should be available at current location, all mail should be moderated and the list should not be on the listinfo page.
Thank you @Trizek-WMF! Really appreciate all your work on this, and your team's.
Yeah, this does look like it was related to the eventgate-main deployments at 13:20 and 13:33 -- traffic may not have shifted gracefully from the old instances to the new ones, resulting in a transient spike of errors.
Sep 15 2020
Any list administrator can disable archiving new messages, at https://lists.wikimedia.org/mailman/admin/wikidata-bugs/archive -- just change "Archive messages?" to "no" and submit.
rzl@lists1001:~$ sudo disable_list localisation-team /var/lib/mailman/data/heldmsg-localisation-team-375.pck localisation-team disabled. Archives should be available at current location, all mail should be moderated and the list should not be on the listinfo page.
And @bete one request (at the same time, to speed things along): Could you please ask your manager to comment on this Phab ticket, giving approval?
Hello Bereket! I can handle the LDAP changes for you.
Sep 14 2020
rzl@mwmaint2001:~$ ldapsearch -x cn=wmf | grep sudhanshugautam member: uid=sudhanshugautam,ou=people,dc=wikimedia,dc=org
Added to the agenda for the next SRE meeting, 2020-09-21.
Sep 9 2020
I purged /api/rest_v1/page/mobile-html/ and /api/rest_v1/page/mobile-html-offline-resources/ in ATS, then re-ran Joe's command from T262437#6448216 for both. This should now be fully expunged from cache, although it may persist in your browser cache for up to a day or until you refresh.
Sep 1 2020
Aug 26 2020
Aug 25 2020
Yeah, sorry that's later than I expected -- we're meeting today to confirm the timing details and I'll post the update immediately after, so a little over two hours from now.
Aug 24 2020
@Papaul Over to you, thanks!
Aug 19 2020
I know it's 2020, but Jessie for uniformity please. Upgrading those hosts is a work in progress. If we weren't about to do the switchover, I'd say install something newer and see what happens, but let's do one thing at a time.
Aug 17 2020
Aug 12 2020
I bet we can do 14:00 UTC. I'm finishing up the timeline with my SRE colleagues this week, I'll confirm and get back to you. After that we'll post on Wikitech, it'll be on the same page.
No, it'll be roughly a month -- there's a variety of maintenance we'd like to do in eqiad while we're serving from codfw.
Aug 5 2020
The simple version of this is done. We might eventually want to do something more elaborate -- the advantage would be that httpbb could be run without explicitly passing a test file, and by default it could deduce (via Puppetry) the appropriate set of tests to run. But this is good enough for now.
Aug 4 2020
Jul 31 2020
The only user-impacting section of the process will be a read-only period for all wikis while we move MediaWiki itself -- that should last about 3-5 minutes, somewhere between 13:30 and 15:30 UTC on 2020-09-01.
Jul 30 2020
Yep, we're all set. Please revert the RAM quota at your convenience, and if there are any Ceph resources to reclaim, feel free to do that too -- we won't be needing it for the foreseeable. Thanks!
Jul 29 2020
Thanks for the update; I was going to check back in here soon. We've actually rearranged this build a bit, and in the end we won't need the expanded disk after all, so we'll be settling back into a regular m1.xlarge instance.
Jul 28 2020
Jul 23 2020
Jul 21 2020
Thanks @Trizek-WMF! It took a moment to get everything else lined up, but we're moving forward with September 1.
Jul 16 2020
Sorry for the delay, but this is still in progress -- I've checked in with Legal and they're still working on it. Thanks for your patience, still.
Jul 2 2020
Jun 24 2020
It looks like we'll try to do this: ideally we'll aim to do the switchover from eqiad to codfw in either mid-to-late August or early September, and the switch back to eqiad about a month later. (I can promise we won't do anything in July; we wouldn't ask you to work on that kind of short notice, and we won't have our act together yet anyway.)
Jun 23 2020
Side note: This question is also interesting from a DC switchover perspective (T243316) since that will also effectively be a Redis flush. In previous switchovers we only explicitly handled replication for sessions data, and now that's out of Redis. If there's anything else in there that we can't afford to drop and recreate, now would be a great time to know that.
Jun 22 2020
Summarizing here a conversation @elukey and I had in #wikimedia-serviceops:
Jun 5 2020
Thanks for checking -- not sure yet, but as we're planning out Q1 on our side too, I'm starting to take everyone's temperature about it. I'll let you know as soon as I have some idea whether it will happen, and I'll make sure to clear any potential dates with you.
Jun 2 2020
May 26 2020
@soworu Ping on my question above. :)
May 22 2020
Hi CPT, adding you back -- is it possible this is due to this section of security_response_header_filter.js in Restbase?
May 21 2020
Trying to route this -- @ssingh, should this be assigned to you?
@MBinder_WMF On the offchance you'd already tried logging into phab1001, and it didn't work, but you hadn't gotten around to saying anything yet -- that was because of my mistake, just fixed it, so try again. :)
May 20 2020
Thanks! I've removed mnoor but not mshaver from the wmf LDAP group, let me know if anything suddenly stops working.
Hi @MNoorWMF, just checking on this -- can you please confirm that the mshaver account is working for you? If it is, I'll remove access from mnoor so that we don't leave both lying around.
Hi @MBinder_WMF! I've created your shell account and added it to the group that @Dzahn created for you. I just ran puppet on bast4002 (your nearest bastion) and both Phab hosts, so you should be able to test your access now by sshing to phab1001.eqiad.wmnet, after following the Setting_up_your_SSH_config steps in the page that Daniel linked.
After chatting with @colewhite (thanks!) I've moved Rodolfo from the wmf group over to the nda group, which should still provide all the access necessary here. Let me know if you have any trouble.
This uses your LDAP password -- the same one you use to log into Wikitech. The username should be either your Wikitech username (Rodolfo Valentim) or your uid/"shell username" (rodolfovalentim) -- the login prompt should tell you which.
@dcipoletti Thanks, looks good! All I need you to do now is:
- sign L3 (there should be an option to sign it right on that page) and
- let me know that you've read https://wikitech.wikimedia.org/wiki/Analytics/Data_access#User_responsibilities (by just saying so here).
May 19 2020
Looks like this comes from docker_pkg/cli.py line 76, which is the last line here:
I see @Nuria's authorization on the parent task, so going ahead with this.
@dcipoletti Can you please ask your manager to comment on this task, approving your access?
Agree we ought to do this, and I think it's something Envoy can do nearly out of the box. If there's an element Envoy doesn't support, I agree that it's the over-limit FIFO queue. (Out of all the reverse proxies we could use, I'm focusing on Envoy because we're actively deploying it in other parts of the infrastructure, and if it's feasible I think we'd like to use it here too for consistency.)
May 18 2020
Hi Daniel, welcome to the Foundation! I've added you to the wmf LDAP group, feel free to reopen if you need anything else.
Hi Carly, I've added you to the wmf LDAP group -- feel free to reopen if you need anything else.
Thanks @bd808! It doesn't sound like there's presently anything for the SRE clinic-duty person to do here, so I'm optimistically removing SRE-Access-Requests, but feel free to add it back if you need anything from me (or subsequent clinicians).
Hi Segun, thanks for the clear and complete request! I've granted you "restricted user" access for almost all those domains, which I think is the right level of access for your needs. If you need "full user," let me know and I'll upgrade you (or file another task, if you find out later).