Page MenuHomePhabricator

Support change propagation for private wikis
Closed, DeclinedPublic

Description

ChangeProp normally sends unauthenticated update requests to RESTBase & backend services. For private wikis, this won't work as access to content is restricted to specific authenticated users.

Requirements

  • All content is updated: We need to make sure that stored content is kept up to date, regardless of the exact access restrictions that apply to this content.
  • Minimum privilege: Restrict elevated access to services that actually need it, like ChangeProp. Do not open up access to "all internal services" or similar.

Candidate solutions

Authenticate changeprop as "super user"

  • Create a user for the changeprop service, and give it all the rights needed to access content stored in RESTBase.
  • Use this user to authenticate requests from ChangeProp, using credentials from the private puppet repo.
  • In RESTBase, continue to:
    • enforce user-specific restrictions on each read access, and
    • drop authentication information for end points that do not have any restrictions set up.

Event Timeline

Restricted Application added a subscriber: Zppix. · View Herald Transcript

There was a discussion with @dpatrick regarding possible solutions to this, and the conclusion was that a super-user approach is the best we can do.

Here's an outline of what needs to be done:

  • Create a change-prop user in all private wikis, store it's username and password in private puppet
  • Create a special HTTP filter for change-prop that would support logging on the wiki and supplying the cookie to the request. The filter would contain a config with a list of private wiki domains so that it doesn't do authentication when it's not needed.

After my experience with managing system users during SULF, I would like to object to introducing new ones that do not go through User::newSystemUser(), which this one will not. Even for just private wikis, auditing and keeping track of what's a system user or not is hard.

Instead, I would propose doing something similar to what we did in BounceHandler and whitelist a specific (or set of) IP addresses in configuration, so no credentials are needed to be verified, we just ensure the request is coming from a trusted source.

@Legoktm, from what I have seen User::newSystemUser() is targeted at CLI tools, and doesn't actually set up a full user in the user table. What we are looking for here is closer to a special bot user, ideally created by the system as part of the install / update scripts.

auditing and keeping track of what's a system user or not is hard

The idea we had in mind is to have a specific group for such system users, which should make it relatively easy to list & audit them.

Overall, a major goal is to minimize privileges across the cluster. While limiting IPs to individual addresses is better than nothing, this would still be problematic given the fact that this IP is currently shared between many services in the SCB cluster. Furthermore, other services like Parsoid will need to authenticate with MW for specific changeprop initiated requests, but should not generally be trusted with unfettered access outside the context of a request with a specific user session.

User::newSystemUser() does set up a proper user, but it disables all login methods so it is only useful via CLI.

You probably want to use bot passwords to make access granular (well; somewhat granular: it will still be able to read everything, but not edit) and limit to a specific IP range.

Instead, I would propose doing something similar to what we did in BounceHandler and whitelist a specific (or set of) IP addresses in configuration, so no credentials are needed to be verified, we just ensure the request is coming from a trusted source.

The request chain looks like this: ChangeProp -> RESTBase -> Parsoid -> MediaWiki, and neither Parsoid nor RESTBase IPs should be white-listed, so we'd need to track the X-Forwarded-For header and white-list the ChangeProp IPs. But Change-Prop is located on SCB cluster, and a lot of requests that don't need to have elevated permissions will have these IPs in the XFF header, so it doesn't really seem feasible to do the IP-based whitelist and making a bot user seems like a safer/cleaner solution.

However, if you have strong objections we can try to investigate the XFF header option more.

It's a question which login API to use actually.

The PR implements the login flow from https://www.mediawiki.org/wiki/API:Login which is deprecated for everything except the bot passwords. However, on office wiki, where we wanna enable this first, bot passwords are not enabled.

As I understand, a better, more recent way of authenticating a bot (ChangeProp in our case) is to use OAuth like explained here: https://www.mediawiki.org/wiki/OAuth/Owner-only_consumers However, again, OAuth extension is not installed on the office wiki. Should we install the extension on office wiki and use that, more modern, way of authenticating?

I thought this was about all private WMF-hosted wikis, not just office?

@AlexMonk-WMF Right, office wiki is a beginning (testing ground) I didn't look to the others yet.

Enabling bot passwords is just a config flag + DB table creation, enabling OAuth is more complicated on both ends. It's slightly more secure (no passwords to intercept), I don't know how cautious we want to be with officewiki.

A requirement is the ability to easily forward per-request credentials on behalf of the user / update job through services like Parsoid. We would like to avoid entrusting services like Parsoid with blanket access to all trusted content outside a specific request.

This requirement is currently satisfied by forwarding the Cookie header, if set.

A requirement is the ability to easily forward per-request credentials on behalf of the user / update job through services like Parsoid. We would like to avoid entrusting services like Parsoid with blanket access to all trusted content outside a specific request.

Sure, the question is about obtaining those credentials. There are 2 different ways - either OAuth of bot password. Neither of those is enabled on office wiki, so the question is which one do we want to use. From @Tgr comment it seems like bot passwords are simpler, so I propose to go with that option if there're no objections to that.

To clarify, bot passwords would entail forwarding a plain-text static password through some header?

They work like normal passwords, they just skip various checks at authentication to preserve B/C with the old login API. See Manual:Bot passwords.

If you want stateless authentication via request headers, you should use OAuth.

@Tgr, thanks for the background. That sounds great, and will just work™ with the existing session forwarding setup.

Pchelolo edited projects, added Services (next); removed Services.

@Aklapper: any idea why Herald is adding Analytics here? I tried in vain to search for any rules that would apply.

Don't think we will be doing that given the changed strategy regarding the future of RESTBase