Page MenuHomePhabricator

Move private wikis to a dedicated cluster
Closed, DeclinedPublic

Description

In talking through ways we could make extracting sensitive data from the cluster harder for an attacker, segmenting the data for private wikis seemed like a potential project.

The rough idea would be,

  • Move private wikis to a dedicated group of app servers, those app servers can hold a different set of db/cache credentials
  • Point private wikis at their own redis/memcache servers
  • Move private wikis to a new database cluster
  • Limit network connections to caching / db servers to the set of dedicated app servers

Joe thought the varnish setup to direct private wikis to their own app servers wouldn't be too difficult. @Springle, do you know roughly how much work setting up a new db cluster would be, if we decided to do this?

Event Timeline

csteipp raised the priority of this task from to Low.
csteipp updated the task description. (Show Details)
csteipp added a project: Security-Other.
csteipp added subscribers: csteipp, Springle.

This would need a bit of downtime to migrate data, but in the order of hours, not days. Not especially difficult, and would slightly simplify the sanitarium/labs setup (though most of the complexity there is not the private wikis, which are easy to blanket-filter). +1

Would they be getting different code and deployer access too?

Would they be getting different code and deployer access too?

Yes

So they'd also not be getting security patches at the same time as everything else

And you'd be breaking the ability to set some user rights on private wikis

And probably other things

I haven't planned out the separation yet-- this was mostly a Task that I want to get to in Q3 or Q4 of this year.

I should have said, "it depends". It depends somewhat on how we design the whole thing, and if we have a decent secret management system in the cluster by the time we do this work.

If everything is as it currently is, then it would be best to deploy these wikis as separate deployers, or only have a subset of the deployers have access to the DB password to access the backend DB. Or we can put the secrets on the subset of machines that can access the separate DB cluster (preferably have access audited too), and let code deployments flow as normal, but those machines will have a different DB config.

But in general, deployers shouldn't have access to the data inside those wikis. And probably shouldn't be able to deploy security-patch style code changes to those app servers. Assuming I can convince Ops to give dedicated app servers to this, which has not been allocated.

I think you're going to have to get significant buy-in from RelEng, since it'll mean a lot of duplicated work, in some cases having to be done by different people.

Nemo_bis changed the task status from Open to Stalled.Dec 5 2015, 3:21 PM
Nemo_bis added a subscriber: Nemo_bis.

At this stage this seems a mere idea, tagging as stalled. If this happened to become a goal of the WMF then it would need a whole series of blockers in DBA, Deployments, Trust-and-Safety (to set up a new SRP system) and who knows how many others.

What would be the requirements to be a deployer inside this separate cluster? Would it mean anything would be different for non-root volunteers?

Boldly tagging DBA to hopefully receive feedback if this task (and its parent task) is still a good and feasible idea nowadays, and if this task should still be stalled by definition ("If a report is waiting for further input (e.g. from its reporter or a specific third party) and can currently not be acted on") or should just be open with low priority instead. Thanks in advance for any input.

Marostegui added a subscriber: Marostegui.

Unfortunately, what was said on 2015 (T101915#1351273) no longer applies.
This would be a complicate effort nowadays (see how difficult and risky is to move wikis from s3 to s5 (T226950)), and would require extra hardware as well.

Thanks for the explanation! Does that mean this task should be declined? Or be open with lowest priority?
(Asking as tasks shouldn't have "stalled" status for unclear reasons for years.)

I would go for the decline, I don't think we have such an amount of private wikis that could justify having more hardware just for them (with the same production specs). Not to mention the difficult and risk described earlier that the move itself would require.

Thanks! Boldly declining.