Page MenuHomePhabricator

Permissions to upload data to the analytics cluster from a machine at Drexel
Closed, ResolvedPublic

Description

As a requirement for formally publishing a dataset of rich citation contexts we've been working on (see parent task, one of our Q2 goals), we'll need to host a large JSON dump on the analytics cluster, in order to expose it via https://analytics.wikimedia.org/datasets/
(this is a dataset processed from the dumps that doesn't contain any PII)

@Halfak has a copy of the data on his machine at UMN. Could we get Ops approval for the following one-off process:

  • whitelist the univ external IP address for ssh on the analytics cr1/cr2 firewall rules
  • allow @Halfak to copy his temp ssh key on stat1006 (the one to log in to the univ)

(relaying this from a discussion on IRC)

Details

Related Gerrit Patches:
operations/puppet : productionRemoves temp ssh key for halfak.
operations/puppet : productionRemoves outdated ssh keys for halfak.
operations/puppet : productionAdd temp ssh key for halfak to copy data from university server

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 5 2017, 5:06 PM

Adding a bit of context here after a chat with @MoritzMuehlenhoff:

  • For Aaron it would be a bit of a problem to use his workstation as ssh proxy (so UMN -> workstation -> stat1006) since he currently has some connection limitations/agreements with his ISP.
  • The Analytics firewall on cr1/cr2 at the moment blocks outgoing tcp connections to port 22 (except few internal hosts), so doing a scp from stat1006 is not viable at the moment. Not sure if even without this firewall rule it would work, not familiar with our border rules.

So the last viable solution would be to create a temporary ssh key for Aaron's account in puppet, let him use it on UMN hosts only for the scp transfer and then delete it immediately after the 1TB of data is copied. Since it is a potentially risky operation, we are asking feedback/advice/approvals from @mark or @faidon :)

Thanks in advance!

Halfak renamed this task from Permissions to upload data to the analytics cluster from a machine at UMN to Permissions to upload data to the analytics cluster from a machine at Drexel.Oct 10 2017, 3:16 PM
mark added a comment.Oct 12 2017, 5:32 PM

So the last viable solution would be to create a temporary ssh key for Aaron's account in puppet, let him use it on UMN hosts only for the scp transfer and then delete it immediately after the 1TB of data is copied. Since it is a potentially risky operation, we are asking feedback/advice/approvals from @mark or @faidon :)

Yes, that's fine.

@Ottomata, @Halfak - I believe that it would be easier for you guys to coordinate during your US daytime on one of the following days, but let me know if you need help!

@Halfak cool, ya post the public key on office wiki somewhere, let me know, and I'll get on it.

Change 384045 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add temp ssh key for halfak to copy data from university server

https://gerrit.wikimedia.org/r/384045

Change 384045 merged by Ottomata:
[operations/puppet@production] Add temp ssh key for halfak to copy data from university server

https://gerrit.wikimedia.org/r/384045

@Halfak, your key is added, it should be available on all relevant hosts within 30 mins. Let me know when the transfer is complete so I can remove it.

BTW, you've got a few keys already. Can you confirm that you still need them all?

https://github.com/wikimedia/puppet/blob/production/modules/admin/data/data.yaml#L944

Change 384058 had a related patch set uploaded (by Halfak; owner: halfak):
[operations/puppet@production] Removes outdated ssh keys for halfak.

https://gerrit.wikimedia.org/r/384058

I don't need the following keys anymore:

  • halfak@halfak@tako-umh
  • halfak@carbon

See https://gerrit.wikimedia.org/r/384058

I should note that the xfer has started and I expect it to finish by EOD for @Ottomata.

Change 384058 merged by Ottomata:
[operations/puppet@production] Removes outdated ssh keys for halfak.

https://gerrit.wikimedia.org/r/384058

xfer finished. Keys deleted from University VM. Will submit a patchset shortly to delete the key from puppet data.

Change 384084 had a related patch set uploaded (by Halfak; owner: halfak):
[operations/puppet@production] Removes temp ssh key for halfak.

https://gerrit.wikimedia.org/r/384084

Change 384084 merged by Ottomata:
[operations/puppet@production] Removes temp ssh key for halfak.

https://gerrit.wikimedia.org/r/384084

DarTar closed this task as Resolved.Jan 17 2018, 2:31 AM