Design how we'll train models which depend on private data
Open, LowPublic
Actions

Assigned To

None

Authored By

	awight
	Jun 26 2017, 7:41 PM

Description

For example, the draftquality models require non-public, deleted article content. We shouldn't be copying that to labs boxes.

One workaround might be to calculate features on our development machine, then export just the evaluated feature values to ores-compute, omitting the article text.

A longer-term solution would be to secure the training compute box, but that would have to be in the production cluster. I'm not sure if the security tradeoff makes sense there.

Related Objects

Mentioned Here: T165366: rack/setup/install replacement stat1006 (stat1003 replacement)

Event Timeline

awight created this task.Jun 26 2017, 7:41 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 26 2017, 7:41 PM

I checked on the new incoming stat* boxes. They will be using Debian Stretch -- so we'll be able to use them to train model. We should use the labs boxes and permissions in the meantime.

We discussed a two-step solution,

For now, protect the files on labs by making them readable by a Un*x group including only NDA users.
Extract and build these models on stat machines in the future.

The new stat machines are ready. See T165366: rack/setup/install replacement stat1006 (stat1003 replacement). We should be able to start training models there. Note that this machine is in the prod cluster -- all who have access are NDA'd.

We'll need to double check our enchant dictionaries. They are the only thing that is *really* OS dependent.

Ladsgroup triaged this task as Low priority.Nov 26 2018, 2:41 PM

awight unsubscribed.Mar 21 2019, 4:01 PM

• ACraze moved this task from Maintenance/cleanup to Backlog/Other on the Machine-Learning-Team board.Jan 19 2021, 10:32 PM

Design how we'll train models which depend on private dataOpen, LowPublicActions

Description

Related Objects

Event Timeline

Design how we'll train models which depend on private data
Open, LowPublic
Actions