Page MenuHomePhabricator

Contributor updates a page
Closed, ResolvedPublic5 Estimated Story Points

Description

As a Contributor, I want to update a page, so that I can include more information or restructure the content.

PUT /page/{title}

Creates a new page or updates an existing page.

Notable request headers: none

Payload: JSON

  • source: source code (usually wikitext) of page
  • comment: summary of the update, to be set for the new revision
  • latest: object with the following element
    • id: ID of the revision this new source was based on; will be used for a three-way merge if applicable. If not present, treat like creating a new page
  • content_model: content model of the main slot of the page; defaults to 'wikitext' for new pages or existing content model for updates to pages
  • csrf_token: Optional CSRF token; required when using Cookie authentication; otherwise forbidden

Status:
200 – it worked
401 – not authenticated
403 – not authorized
404 - no such page (only if revision ID was submitted)
409 – a page with that title already exists (if no revision ID was submitted)
409 - the page has been edited since the revision provided, and an automatic merge failed (see below)
500 - server error, such as when a three-way merge failed

Notable response headers: none

Body: JSON

  • id: numeric id of the page
  • key: prefixed DB key of the page, like "Talk:Main_Page"
  • title: title for display, like "Talk:Main Page"
  • latest: latest revision of the page, object with these properties
    • id: revision ID
    • timestamp: revision timestamp
  • license: Object for the preferred license of the page, including these properties:
    • spdx: SPDX code
    • url: URL for the license
    • title: title of the license
  • other_licenses: array of objects with {spdx, url, title} for other licenses the page is available under
  • content_model: content model of the main slot of the page
  • source: preferred source of the page; usually wikitext

Edit conflicts

If the client tries to create a page that already exists by leaving latest empty, there's a conflict.

To update a page, the client needs to submit the ID of the last revision it's seen. That provides a base revision. If there have been other revisions since that revision (while the user was editing, for example), the system should make an effort to automatically do a 3-way merge of the submitted changes.

If the 3-way merge fails, the error code should be 409 (conflict), and the body should be a regular error, but with the following additional properties:

  • base: revision ID of the base revision
  • local: the difference between the wikitext submitted and the base revision, in the same JSON diff format as T231347
  • remote: the difference between the latest revision of the page and the base revision, in the same JSON diff format as T231347

Clients can use this response data to present a 3-way merge resolution UI to the end user if needed.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

csrf token: I don't believe this is necessary if we only support OAuth authentication

  1. Given that we don't have it now, exposing this without a csrf token would be a security issue.
  2. OAuth and CSRF are not really related. OAuth2 is NOT magically making API invulnerable to CSRF. The mechanism of the token delivery should be different, the RFC is suggesting utilizing the state parameter.
  1. OAuth and CSRF are not really related. OAuth2 is NOT magically making API invulnerable to CSRF.

Probably not, but I think it makes things harder than when there's session cookie authorization.

I think the typical CSRF attack is having a link like this:

<a href="https://wiki.example/action-on-get">Click here for big savings!</a>

or in a form,

<form action="https://wiki.example/action-on-post">
<input type="hidden">bad value</input>
<input type="submit">Click here for savings!</input>
</form>

If the endpoints don't support session cookie auth, I think these attacks are hard to get to work. Where would the OAuth access token come from?

The mechanism of the token delivery should be different, the RFC is suggesting utilizing the state parameter.

That's specifically for the authorization flow, not for using the token when making an API call.

I'm having a hard time finding any other public API (besides Action API and RESTBase) that include CSRF tokens. I'll see if I can do some more research to come up with how other providers do it.

I'm having a hard time finding any other public API (besides Action API and RESTBase) that include CSRF tokens. I'll see if I can do some more research to come up with how other providers do it.

It seems to me it would be up to the Security-Team to decide whether we can allow edits based on OAuth without CSRF protection.

@daniel asked that we include a previous revision ID so that the server can do a 3-way merge if there's an edit conflict; I added that in. It also makes a convenient differentiator for the user; if you post without a version ID, you are signalling that you want to create a new page, which will cause a conflict if the page doesn't already exist already exists.

I also tried to bring the output inline with the schema.

if you post without a version ID, you are signalling that you want to create a new page, which will cause a conflict if the page doesn't already exist.

I like that! Cal we also say that PUT requires the base revision ID, while POST doesn't support it?

if you post without a version ID, you are signalling that you want to create a new page, which will cause a conflict if the page doesn't already exist.

I like that! Cal we also say that PUT requires the base revision ID, while POST doesn't support it?

Yes. I think that is the case.

if you post without a version ID, you are signalling that you want to create a new page, which will cause a conflict if the page doesn't already exist.

Re-reading that, I assume you mean it will cause a conflict if the page *does* already exist. Posting (putting) *with* a base revision would cause a conflict if the page does not yet exist (or, more likely, has been deleted concurrently).

if you post without a version ID, you are signalling that you want to create a new page, which will cause a conflict if the page doesn't already exist.

Re-reading that, I assume you mean it will cause a conflict if the page *does* already exist.

Yes, I got it backwards. I'll change my comment to get it correct.

Posting (putting) *with* a base revision would cause a conflict if the page does not yet exist (or, more likely, has been deleted concurrently).

Yes.

@tstarling I added a parameter to pass in the base revision ID, requirements for doing an automated merge if possible, and notification of an edit conflict with data needed to manage a client-side manual merge if necessary. Please give a thumbs up if this is what you were thinking about.

I also added a content_model property to the page

Regarding edit conflicts, after talking with @daniel, providing the diffs in the case of merge conflict may be a significant effort. Can we spin that portion off into a followup task? A "3-way merge resolution UI" sounds like a cool feature, I can understand why some clients would like to provide it, and I would like to support that. But it also sounds like something that we could add separately as an enhancement to the endpoint, thereby increasing the likelihood that we can complete the basic implementation of both this endpoint and T230842 in one sprint.

WDoranWMF changed the point value for this task from 6 to 2.Feb 26 2020, 7:39 PM

The payload is specified to be JSON. However, MediaWiki\Rest\Validator\Validator currently only supports two content types for the POST body:
application/x-www-form-urlencoded and multipart/form-data.

Was the intention here to change this in the REST framework, or to encode the JSON in a from field?

However, MediaWiki\Rest\Validator\Validator currently only supports two content types for the POST body

If I understand correctly, these are NOT passed to BodyValidator as they are parsed by PHP into $_POST, see this code. For other content-types, you override Handler::getBodyValidator and validate there. See how Clara did it in EventBus::ExecuteJob handler.

However, MediaWiki\Rest\Validator\Validator currently only supports two content types for the POST body

If I understand correctly, these are NOT passed to BodyValidator as they are parsed by PHP into $_POST, see this code. For other content-types, you override Handler::getBodyValidator and validate there. See how Clara did it in EventBus::ExecuteJob handler.

Thanks, yea. I realized my mistake, but Phab didn't let me delete my post. Had to reset 2FA first :P And then Phab went down for a couple of minutes. Gotta love technology.

Change 576121 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] page/update endpoint

https://gerrit.wikimedia.org/r/576121

source: preferred source of the page; usually wikitext

It's not clear to me why this is here. Do we have to send the entire page content back to the client after an edit? Why?

WDoranWMF changed the point value for this task from 2 to 5.Mar 4 2020, 7:46 PM

401 – not authenticated

When should this be raised? Normally, anon users are allowed to edit.
When anon users are not allowed to edit, should this cause a 401 or a 403?

409 - the page has been edited since the revision provided, and an automatic merge failed (see below)
500 - server error, such as when a three-way merge failed

500 indicates an internal error, e.g. a bug or a service failure. That is, it requires attention from engineers. That seems to be a bad fit for a failed 3-way merge, which is expected behavior.
Also, "failed 3-way merge" seems to be covered by the 409 above.

Change 577345 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] ApiEditPage: add baserevid parameter

https://gerrit.wikimedia.org/r/577345

By the way - inside a REST API module, can we assume that a user is authenticated with OAuth? Or do we have to check that?

The handling of edit conflicts was to be split out of this task. Did that happen?

I've added the CSRF token for use when the client is using Cookie authentication. It's required in that case, and forbidden in other cases (using this parameter with OAuth is an error).

My understanding from T237852 is that this is only necessary when a) CORS headers aren't set up and b) there's Cookie authentication.

I think this should be configurable and disabled for WMF servers, since our CORS headers *are* set up correctly.

The handling of edit conflicts was to be split out of this task. Did that happen?

Not yet. I left on vacation before it did.

Is it OK with you if I make two sub-tasks? I usually try to keep the entire behaviour in one user story. I understand that it can be easier to manage separate tasks, though.

Change 577345 merged by jenkins-bot:
[mediawiki/core@master] ApiEditPage: add baserevid parameter

https://gerrit.wikimedia.org/r/577345

Change 579260 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] REST: page/ endpoints: don't use tokens with OAuth

https://gerrit.wikimedia.org/r/579260

Some questions about the response body:

key: prefixed DB key of the page, like "Talk:Main_Page"

The DB key is really an internal representation, why would we want to expose it via the API? The action API doesn't use it anywhere.

title: title for display, like "Talk:Main Page"

Is this needed? Is the expectation that the title is returned in the form it was in the request, or in canonical form? Since the current implementation forward sto the existing action API module, a canonical form of the title is not readily available. It would have to be computed specifically for the output.

spdx: SPDX code

Where would I get this from? And why include it here, as opposed to when loading page content?

other_licenses: array of objects with {spdx, url, title} for other licenses the page is available under

where would this come from?

content_model: content model of the main slot of the page
source: preferred source of the page; usually wikitext

Is it essential that we echo back the entire source? Unless automatic conflict resolution was applied, this will be the same data the client submitted. If conflict resolution applied, the client will usually not care. If it cares, it can load the content with a separate request...

If we don't echo back the source, do we need to include the content_model in the response? Wouldn't it be more useful to include the serialization format (mime type)?

Change 579568 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] page/update: return diffs on conflict

https://gerrit.wikimedia.org/r/579568

I spoke to @daniel by phone, but I want to capture the discussion here:

  • key is supposed to be for plugging into other URLs, so you could do something like sprintf('page/%s/history', page->key) and it should Just Work. I realise our DB keys aren't perfect for this, but they're better than otherwise.
  • title is for display. Canonical form.
  • License info as in T229663. Ideally we'd have the main license for the page as well as all other licenses, and I'd like to keep that part of the page signature. We can only get some of that data right now, and that's OK.
  • Since the source can be modified on update (for example, template substitution, signature expansion, etc.) we echo it back to the client so they can reflect it in any user interface they have.

key is supposed to be for plugging into other URLs, so you could do something like sprintf('page/%s/history', page->key) and it should Just Work. I realise our DB keys aren't perfect for this, but they're better than otherwise.

This will not "Just Work" cause at least you MUST apply URI component encoding to the 'key'. With uri component encoding, using 'title' should be exactly interchangeable with using the 'key'

This will not "Just Work" cause at least you MUST apply URI component encoding to the 'key'. With uri component encoding, using 'title' should be exactly interchangeable with using the 'key'

If you use spaces in titles in the URL, you'll trigger a redirect. Would be nice to give clients a way to avoid that without some custom string manipulation.

But you are right that uri component encoding must be applied. This makes it harder to communicate what key really is. It's the URI component "precursor" form of the title, I guess...

content_model: content model of the main slot of the page; defaults to 'wikitext' for new pages or existing content model for updates to pages

The default content model for new pages is determined based on the title. E.g. the default model for User:Duesentrieb/common.js is JavaScript, not wikitext.

Change 576121 merged by jenkins-bot:
[mediawiki/core@master] page/update endpoint

https://gerrit.wikimedia.org/r/576121

Change 579568 merged by jenkins-bot:
[mediawiki/core@master] page/update: return diffs on conflict

https://gerrit.wikimedia.org/r/579568

@daniel it seems to me that https://gerrit.wikimedia.org/r/576121 introduces an issue with running integration tests:
composer.phar phpunit:integration -- --exclude-group Broken,ParserFuzz,Stub falls with error:

[529604d548fabc5d52a16612] [no req]   Wikimedia\Services\ContainerDisabledException from line 411 of /Users/peter/work/Wiki/gerrit/core/includes/libs/services/ServiceContainer.php: Container disabled!

Seems like T241638

@daniel it seems to me that https://gerrit.wikimedia.org/r/576121 introduces an issue with running integration tests:

Can you tell me which test triggers the error, or provide a full stack trace?
Can you check whether this happens when you run only the tests in tests/phpunit/integration/includes/Rest ?

Also, can you try if this patch fixes the issue? https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/583739

I'm resolving this task. If the testing issue persists, please file a separate task for it.

Change 579260 merged by jenkins-bot:
[mediawiki/core@master] REST: page/ endpoints: don't use tokens with OAuth

https://gerrit.wikimedia.org/r/579260

@daniel I've put together a draft for the docs for this endpoint on mediawiki.org. Feel free to make corrections directly to the wiki page or post your feedback here.

A few questions:

  • It seems like this endpoint can be used to update or create pages. Is there anything we should add to the docs about when to use this endpoint for creating pages as opposed to POST /page?
  • Is there anything we can add to the docs regarding CSRF tokens, authentication, or a Wikimedia policy for using this endpoint? Such as: how to get a CSRF token?
  • The docs are missing an example of the edit conflict response, as well as the 415 and 500 errors. Do you have an example of what these responses looks like?

A few questions:

  • It seems like this endpoint can be used to update or create pages. Is there anything we should add to the docs about when to use this endpoint for creating pages as opposed to POST /page?

I guess POST /page should generally be preferred. But I really don't know, ask Evan.

  • Is there anything we can add to the docs regarding CSRF tokens, authentication, or a Wikimedia policy for using this endpoint? Such as: how to get a CSRF token?

The only way to get a CSRF token is currently from the action API, with action=query&meta=token&type=csrf. But CSRF tokens are not needed when OAuth is used (via the Authorization head), and OAuth is preferred over cookie-based authentication for the REST API.

  • The docs are missing an example of the edit conflict response, as well as the 415 and 500 errors. Do you have an example of what these responses looks like?

Status 500 could be anything - a JSON body with a message key, or with a messageTranslation key, or an empty body, or a HTML page. With a 500, all bets are off, because it can be triggered by any kind of unexpected error anywhere. The behavior for a 500 is generally not specific to an endpoint.

Status 415 is currently an untranslated error, triggered when the request does not specify Content-Type: application/json. The structure is the same as other untranslated errors, plus the content_type field, containing the unexpected type. E.g.:

{
"message":"Unsupported Content-Type",
"content_type":"text/xml",
"httpCode":415,
"httpReason":"Unsupported Media Type"
}

Status 409 with conflict resolution info (which is only available if wikidiff2 is installed in the correct version) looks as follows:

{
"messageTranslation":{
  "de": "Bearbeitungskonflikt.",
  "en": "Edit conflict."
}
"httpCode":409,
"httpReason":"Conflict",
"base": 13,
"current": 14,
"local" => {...},
"remote" => {...},
}

The base and current fields are base revision as specified in the latest section of the request, and the ID of the current revision on the server. The local and remote fields contains diffs (between base and the text we sent, and between the base and the current version on the server) in the same format returned by the revision/{from}/compare/{to} endpoint in the diff field. I don't know the details of that structure, but you can see an example e.g. at https://www.mediawiki.org/w/rest.php/v1/revision/3763409/compare/3763389.

Thanks, @daniel! I've made a few updates based on your feedback. Let me know if you have any additional feedback, or feel free to make changes directly to the wiki page: https://www.mediawiki.org/wiki/API:REST_API/Pages#Update_page

A few questions:

  • It seems like this endpoint can be used to update or create pages. Is there anything we should add to the docs about when to use this endpoint for creating pages as opposed to POST /page?

I guess POST /page should generally be preferred. But I really don't know, ask Evan.

I suspect @eprodromou would point us to this:
https://www.mediawiki.org/wiki/Core_Platform_Team/Initiative/Core_REST_API_in_Mediawiki/Design_principles#Operations

  • Is there anything we can add to the docs regarding CSRF tokens, authentication, or a Wikimedia policy for using this endpoint? Such as: how to get a CSRF token?

The only way to get a CSRF token is currently from the action API, with action=query&meta=token&type=csrf. But CSRF tokens are not needed when OAuth is used (via the Authorization head), and OAuth is preferred over cookie-based authentication for the REST API.

+1

I'm a little unsure whether to consider the possibility of cookie-based authentication to be a bug or a feature. Among other things, rate limiting will depend on the presence of OAuth tokens. That doesn't matter for calls directly to wikis, which skip the API Gateway. And of course, the API Gateway doesn't actually exist yet, so some of this is speculative.

I'm not a fan of undocumented loopholes, so I'm not suggesting that the documentation avoid mentioning this. The current wording in the documentation is:

This endpoint is designed to be used with the OAuth extension authorization process. However, this endpoint also supports cookie-based authentication with the addition of a CSRF token to the request body. See the Action API docs to learn how to retrieve a CSRF token.

As a suggestion, maybe change this slightly to something like:

This endpoint is designed to be used with the OAuth extension authorization process. Callers who instead use cookie-based authentication must add a CSRF token to the request body. See the Action API docs to learn how to retrieve a CSRF token.

I'm trying to make cookie-based authentication sound less attractive, without hiding the fact that it exists.

Thanks, @BPirkle! I've updated the docs to use your suggested wording.

@apaskulin, who needs to signoff this task? You or Evan?

Documentation for this endpoint has been reviewed and completed. (See the staging area for docs for coredev endpoints.) PM signoff should be done by Evan.