Page MenuHomePhabricator

Plan for access control with opensearch
Open, Needs TriagePublic

Description

Currently the elasticsearch clusters can be read or written by more places than strictly need to, as elasticsearch didn't build in any forms of access control. OpenSearch comes with access control enabled by default, this can be disabled to retain the old behaviour.

We have a couple use cases for access control:

  • cloudelastic is a read-only service, currently enforced by an http proxy. This could likely be solved by opensearch anonymous authentication. That http proxy also has no other use now, as opensearch does tls termination directly and that was it's prior primary use case.
  • Limiting write access provides stronger guarantees about where writes came from
  • The clusters contain private data in the form of both private wikis and deleted title archives. It would be reasonable and prudent to limit read access from our infra to use cases that require it.

A couple questions to answer regarding our migration:

  • Do we want to enable Access Control, or continue with the status quo? They can always be enabled at a later date.
  • If we enable Access Controls, how should we apply them?
  • We could have a single account. We could have separate accounts for reading and writing. Each configured use case could have a separate account. Cirrus will need connections configured for both purposes, but is likely flexible enough to do that fully through configuration.
  • If we enable Access Controls, how do we manage the accounts? Puppet may need some abstraction added.
  • This is mostly an operational question, Cirrus needs only to be provided the appropriate credentials in the connections named to be used for various operations.
  • What are all the places that need credentials? cirrus, sup, mjolnir, translate, api-feature-usage, puppet cluster settings updates, more?

Event Timeline

Initial Proposal, didn't deeply consider everything:

  • Enable security plugin on initial deployment
    • Docs claim you can rolling restart from elasticsearch to opensearch+security. But they suggest turning it off. Docs claim you can not rolling restart from opensearch to opensearch+security. By proxy, that means the only opportunity to turn on security in a rolling fashion is during the migration.
    • We need to verify in a test environment if/how inter-node transport works during elastic->opensearch transition. Docs don't clarify.
  • Anonymous authentication configured to give all access by default to a pre-selected list of index patterns that match our use cases.
    • We need to verify how this gets configured. Do we migrate one node, then create the auth through that node? Does the master have to be opensearch?
    • We can migrate use cases one at a time to individual accounts, taking the appropriate index patterns off the anonymous auth account until the anon account eventually has no index patterns remaining.

Scratch the above, testing shows that to migrate from elasticsearch -> opensearch with security enabled you would need elastic to already be using tls for inter-node transport (likely via x-pack). We will need to do the initial migration with the security plugin disabled and plan to turn it on in the future.

This will not get done until after the first deployment of OpenSearch. Let's move it back to the backlog

testing shows that to migrate from elasticsearch -> opensearch with security enabled you would need elastic to already be using tls for inter-node transport

Yup. I ran into that when I put up my instance on opensearch2.spi-tools.eqiad1.wikimedia.cloud. The first time through, I tried to skip all the TLS/cert stuff and eventually discovered I was running into a brick wall. So I tossed all that and started from scratch with TLS enabled.

It's a little unfortunate that they require all the overhead of generating certificates and running TLS just to turn on access control, but it is what it is.