Page MenuHomePhabricator

access request on cumin[1-2]001 for John Clark
Closed, ResolvedPublic

Description

As a Dc-ops, I think it will be great for John to be able to do OS install since we are using a script now for this and pull up IDRAC/ILO report for technical support.

Please see below for list of command that he needs.

on cumin*:

  • Tunnel to IDRAC/ILO

ssh -L 8000:hostname.mgmt.codfw/eqiad.wmnet:443 cumin[1-2]001.codfw/eqiad.wmnet

  • Doing OS installs sudo -i wmf-auto-reimage-host

Event Timeline

Papaul created this task.Apr 10 2020, 2:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 10 2020, 2:37 PM
Papaul triaged this task as Medium priority.Apr 10 2020, 2:50 PM
Restricted Application added a project: Operations. · View Herald TranscriptApr 10 2020, 2:51 PM
Volans added a subscriber: faidon.Apr 10 2020, 2:57 PM
Papaul updated the task description. (Show Details)Apr 10 2020, 3:01 PM

@Volans I realized that the sudo -i wmf-auto-reimage-host command is possible by just giving the user the right to run just that command on cumin and not any other command. Thanks

Dzahn added a comment.Apr 10 2020, 3:06 PM

Can we use the existing dcops admin group and just adjust the sudo privs to include the wmf-auto-reimage commands?

It's not up to me to decide.
For context: with the current setup, being able to run the reimage script is equivalent of having global root. There have been some start work towards the possibility to run some of those tasks without requiring root (see T244840) but we're still far from it at the moment.

Dzahn added a comment.Apr 10 2020, 3:08 PM

John is already in that group and the group has the following sudo privileges.

So this is not about a new shell user, it's just about adding a command to the list below.

488     members: [wpao, jclark]
489     privileges: ['ALL = NOPASSWD: /usr/local/bin/install_console *',
490                  'ALL = NOPASSWD: /usr/sbin/megacli *',
491                  'ALL = NOPASSWD: /usr/sbin/hpssacli *',
492                  'ALL = NOPASSWD: /usr/bin/puppet cert *',
493                  'ALL = NOPASSWD: /usr/bin/puppet agent -t -v',
494                  'ALL = NOPASSWD: /bin/journalctl *',
495                  'ALL = (syslog) NOPASSWD: ALL']
Dzahn removed Dzahn as the assignee of this task.Apr 14 2020, 9:52 AM
Dzahn added a subscriber: Dzahn.
faidon added a subscriber: jbond.Apr 14 2020, 6:04 PM

So breaking down the (very reasonable!) ask, I think there are afew different things at play here:

  • Access to iDRAC/iLO so that John can e.g. look at HW status and get reports that vendors ask for. This in turn requires:
    • Access to the password store. There is already a "dcops" group with the right access, so we can have John added there. Should be simple, as far as I can tell.
    • Access to the mgmt IP network remotely. Right now that's firewalled to the cumin hosts, access to which ties to a bigger project (see below). However, that's perhaps an unnecessary dependency and maybe we can easily work around that (e.g. with a separate bastion for mgmt?). @MoritzMuehlenhoff, @jbond any thoughts here?
  • Access to execute cumin cookbooks, like reimaging. That right now is tied to global root, which is a privilege that we can't easily grant. Fixing that limitation has been on our radar, including the PoC work that was part of our Q3 OKRs (T244840). It's definitely not there yet and it's going to take a few months to fully materialize, unfortunately.

So breaking down the (very reasonable!) ask, I think there are afew different things at play here:

  • Access to iDRAC/iLO so that John can e.g. look at HW status and get reports that vendors ask for. This in turn requires:
    • Access to the password store. There is already a "dcops" group with the right access, so we can have John added there. Should be simple, as far as I can tell.

That's indeed straightforward. @Jclark-ctr simply create a PGP key and get it signed by an existing SRE, then I'll add the key to pwstore.

  • Access to the mgmt IP network remotely. Right now that's firewalled to the cumin hosts, access to which ties to a bigger project (see below). However, that's perhaps an unnecessary dependency and maybe we can easily work around that (e.g. with a separate bastion for mgmt?). @MoritzMuehlenhoff, @jbond any thoughts here?

We're limiting access to the mgmt to minimise the impact of security issues in the idrac/ILO implementations. We can do is to add datacenter-ops as an access group for the Cumin hosts, which allows them to log in there and then access the mgmt tooling. That adds virtually no risk, but unblocks this use case.

  • Access to the mgmt IP network remotely. Right now that's firewalled to the cumin hosts, access to which ties to a bigger project (see below). However, that's perhaps an unnecessary dependency and maybe we can easily work around that (e.g. with a separate bastion for mgmt?). @MoritzMuehlenhoff, @jbond any thoughts here?

We're limiting access to the mgmt to minimise the impact of security issues in the idrac/ILO implementations. We can do is to add datacenter-ops as an access group for the Cumin hosts, which allows them to log in there and then access the mgmt tooling. That adds virtually no risk, but unblocks this use case.

I was about to make a patch and noticed that setting is already present; @Jclark-ctr can you confirm that you can log into cumin1001.eqiad,wmnet? Then you should also be able to tunnel to IDRAC/ILO.

Also, please create a PGP key for access to pwstore, see https://wikitech.wikimedia.org/wiki/PGP_Keys and sync up with a colleague to sign the key. Then we'll add you to the pwstore.

@MoritzMuehlenhoff So i am unable to ssh to cumin1001

ssh: Could not resolve hostname cumin1001.eqiad.wmnet: nodename nor servname provided, or not known

Below is the only thing i have been able to ssh to previously

ssh -vvv bast1002.wikimedia.org
CDanis added a subscriber: CDanis.May 6 2020, 7:42 PM

@Jclark-ctr You should edit your ~/.ssh/config file as detailed at https://wikitech.wikimedia.org/wiki/Production_access#Setting_up_your_SSH_config and then it should work :)

@CDanis I have edited config file I am still having issues connecting. requesting password

Dzahn added a comment.EditedMay 14 2020, 8:35 AM

@Jclark-ctr Could you paste your SSH config on a pastebin? (https://phabricator.wikimedia.org/paste/ f.e.) or right here on the ticket is also ok.

Yes, and please, also the output of:

ssh -v bast1002.wikimedia.org

and

ssh -v cumin1001.eqiad.wmnet

We only use SSH public keys, and we do not set passwords on accounts.

Dzahn added a comment.May 21 2020, 7:06 AM

Hi @Jclark-ctr,

from the log files on bast1002 and cumin1001 I can see there are 2 different keys involved.

The first one is the one in /Users/jclark/.ssh/id_rsa with the SHA256 fingerprint g/h8MFTuDioZAH80iPzkjbhCu4dOd0xTLdOgVtDi6y4.

This is _not_ the right one. It appears as rejected in logs on both bast1002 and cumin1001.

Then there is another one, with the SHA256 fingerprint gBLZyFzko2+x+ENOd2eLtOeki75nKtb5q94UDyP9IqM. (I can't see a file system path this gets loaded from on your side, so I assume it is loaded in an agent.)

This _is_ the right one. It gets accepted on bast1002.

But this key only appears on bast1002 and never showed up on cumin1001.

This makes me think the issue is still with the ProxyCommand or ProxyJump config. If you can paste your config we can verify that.

Other things you can do is:

  • move the file /Users/jclark/.ssh/id_rsa out of the way so it does not attempt to use that wrong key before trying the right one.
  • attempt to verify which is the correct key with the fingerprint gBLZyFzko2+x+ENOd2eLtOeki75nKtb5q94UDyP9IqM. You can do this with the command ssh-keygen -lf <path to key> and compare the checksum. Or you can look at the key with a text editor and compare it to this:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC+1duGbT11VE4IV3KKFzdmHhSl2fAA0CkL93edalw2yqroMxzHjah7GwKB5csjrrbqhn+po0478jsU8OG8hgJBRKSq2cG04ryQk8     MVSIy6gnqQ75/5gC4U6wJ50y8MKeMyZCHzMsjs+4xdh9WvJH4cfliPRWYp1JBJpE6E22KE+HK07HYX0TkvyfMf2cLaA0pz1Ovbll8gWb9L9vyKDRmv8+NkaJcLTuKoqSFpxz/UCjVGyBJckDyJ     bX9FEUyjjMclg+c6C8s2aNgfe3gMKmkKxSEfEqbXuNWfJEVAqJ667MhtsGV92pp6rSWdtAgm6IqGE19hUNFqAy3XiX8vEQY3

Once you identified it add it to your config explicitly with an IdentityFile ~/.ssh/your_production_ssh_key line to tell your ssh client to try this (and only this) key.

@CDanis If you want to look as well the comments were based on the rotated auth.log.1:

root@bast1002:~# grep jclark /var/log/auth.log.1 | grep publickey
vs
root@cumin1001:~# grep jclark /var/log/auth.log.1 | grep publickey

ayounsi moved this task from In Discussion to Awaiting User Input on the SRE-Access-Requests board.
jbond added a comment.Jun 8 2020, 2:54 PM

@Jclark-ctr ping: Are you able to respond to the comments and questions from Daniel above, thanks

@jbond have recently had issues with computer have reached out to IT will be reimaged

Dzahn changed the task status from Open to Stalled.Jul 9 2020, 5:36 PM
herron closed this task as Resolved.Jul 28 2020, 4:54 PM
herron added a subscriber: herron.

Assuming no news is good news, and transitioning this to resolved. If any assistance is still needed please re-open. Thanks!