Page MenuHomePhabricator

Issues with server access, assistance requested
Closed, ResolvedPublic

Description

I need to use the server (sorry, can't be more specific than that because I have no idea what I'm talking about) as part of my Trust & Safety work. I was granted server access in 2018 as karen, but when I try to access the server I am getting nothing except either error messages or absolutely nothing. I apologize for the vagueness of this request; I genuinely have no clue what I'm doing other than trying to blindly follow the documentation T&S has for this.

Here's what I've tried (following instructions written for T&S specialists who need server access) in my terminal:

  • Entering "ssh -J karen@bast4001.wikimedia.org:22" to try to connect. I get an error that says "unknown option -J" (which as best I can tell means the system doesn't recognize J as an available flag for the ssh command.
  • If I take out the -J and just put the rest of the command, I get ssh: Could not resolve hostname bast4001.wikimedia.org:22: Name or service not known
  • Switching in my config file from bast4001 to another server (sorry, I can't find the page I took the new server name off of and I'm writing this from my laptop rather than desktop where I attempted this stuff, but I think it was [some other number]001) . Nothing.

*"ssh mwmaint1001.eqiad.wmnet" gets me no error message, but also no prompt or anything - just nothing happening and a blank line

I'm using an Ubuntu terminal on Windows 10, if that makes a difference.

Please ask me whatever you need to know to actually troubleshoot this; I apologize again for having no idea what I'm talking about and thus not being able to explain well.

Event Timeline

Hi,

You should copy the config from https://wikitech.wikimedia.org/wiki/Production_access#Setting_up_your_SSH_config

The correct bastion would now be bast4003.

You'll then be able to ssh into mwmaint2002 (we're in codfw) now.

You seem to have some old documentation lying round from a while back.

It might be easier to say hi on IRC as then people can help real time.

https://wikitech.wikimedia.org/wiki/Bastion

You're definitely using the wrong bastion, 4001 has been gone for a while. Based on your location, you might want to use bast1003.wikimedia.org which is in Virginia.

Also, mwmaint1001 has been gone since July 2018. So you probably want ssh mwmaint1002.eqiad.wmnet, however we're in DC switchover, so you actually want ssh mwmaint2002.codfw.wmnet (for now).

Are the T&S docs semi-public, ie on officewiki or similar? Feel free to email/ping me on slack with a link and I can look over it and see what else might be out of date...

ssh -J is ProxyJump, but needs ssh >= 7.3.

I was granted server access in 2018 as karen

For the records, that was T201668: Requesting access to restricted production access and analytics-privatedata-users for Karen Brown.

4001 has been gone for a while

I'd love decommissioning workflow to set {{obsolete}} on pages like https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/bast4001.wikimedia.org ...

RLazarus triaged this task as Medium priority.
RLazarus moved this task from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
RLazarus added a subscriber: RLazarus.

Claiming this as the SRE on clinic duty -- thanks all for the suggestions.

I chatted with @Kbrown and we agreed troubleshooting this in real-time is probably going to be more effective than via slower roundtrips on Phab, but I'll keep the ticket updated for posterity. For now we're going to leave it until Monday, when Karen will be back on the right machine.

@Kbrown Whenever you have a moment, some troubleshooting information to collect -- please open a terminal on that computer, run each of these commands, and just paste all the results here (or in a phab paste, and paste the link here). None of the output is sensitive or private.

  • First let's make sure your SSH keys are in the right place and your configuration file looks right:
    • ls ~/.ssh
    • cat ~/.ssh/config
  • Then, try this command, which just attempts to connect you as usual but with a little extra debugging information:
    • ssh -v mwmaint2002.codfw.wmnet

Then we'll chat again on Monday and use that information to get this sorted out. (I'm happy to hang onto this ticket after the end of the clinic duty shift.)

After that, I agree with both @Reedy and @Aklapper that we should take a look at both the T&S-internal documentation, and the decom workflow, to make it easier on everyone in the long run. But I'd like to focus on getting Karen's work unblocked first.

All set! For posterity, the issue turned out to be twofold -- one, exactly as @Reedy guessed, SSH was one version too old to understand ProxyJump so we replaced it with the equivalent ProxyCommand (and I verified there are no known security vulnerabilities for the version in use) and two, the SSH key filename didn't match the one in the example config, so we corrected it, and @Kbrown was able to connect. (Please do reopen this, or file another under SRE-Access-Requests, if you have any more trouble!)

Karen has also already taken care of updating the T&S docs to explain about using mwmaint1002 most of the time but mwmaint2002 during a switchover when the "do not use this server" MOTD pops up.

Finally, I'll open a separate task to see if we can improve documentation updates in the decom workflow. I'm not sure if {{obsolete}} on the bast4001 fingerprint page specifically would have helped much in this case, but it couldn't have hurt. In the case of mwmaint1001 maybe we could have searched and updated references in wikitech and officewiki (or maybe we did that, and we just missed one). I'm resolving this access ticket but I'll open a separate task to see about improvements in that area.