RFC: HTML and wikitext save API end-points
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• mobrovac
	Jun 5 2015, 10:11 AM

Description

In order to provide efficient HTML section editing (amongst other things), we need to expose an end-point in RESTBase allowing clients to save a page in wikitext. Here's the proposed end-point:

POST /{domain}/v1/page/wikitext/{title}{/revision}
{
  minor: boolean,
  bot: boolean,
  comment: text,
  text: text,
  token: token
}

This information, together with the cookie sent in the header, will be forwarded to the MW API's edit action. The basetimestamp field will be retrieved from the supplied revision's information (if given).

Default values:

Field	Required?	Default
`minor`	no	`false`
`bot`	no	`false`
`comment`	no	Supplied by Action API?
`text`	yes
`token`	yes

Related Objects
Search...

Status	Assigned	Task
Resolved	• GWicke	T144814 Services Team Goals 2016/2017 Q2: October - December
Resolved	• mobrovac	T136942 Services Team Goals July - September 2016 (Q1 2016-17)
Resolved	• mobrovac	T118871 Services team goals April - June 2016 (Q4 2015/16)
Resolved	None	T118868 Services team goals January - March 2016 (Q3 2015/16)
Resolved	• GWicke	T111819 Services team goals October - December 2015 (Q2 2015/16)
Resolved	• GWicke	T102306 Services team roadmap July - September 2015 (Q1 2015/16)
Resolved	None	T92468 Services Roadmap April - June 2015 (Q4 2014/2015)
Resolved	• GWicke	T91533 Services team Q3 (Jan - March 2015) quarterly goal tracking
Invalid	None	T87556 Thoughts on element IDs, sections, incremental parsing and fast section editing
Resolved	• Pchelolo	T94890 RFC: API for retrieval and saving of top-level HTML elements / sections by element ID
Resolved	• mobrovac	T101501 RFC: HTML and wikitext save API end-points

Event Timeline

• mobrovac created this task.Jun 5 2015, 10:11 AM

• mobrovac claimed this task.

• mobrovac raised the priority of this task from to Unbreak Now!.

• mobrovac updated the task description. (Show Details)

• mobrovac added projects: RESTBase, RESTBase-API.

• mobrovac added subscribers: • mobrovac, • GWicke, Eevans, Anomie.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 5 2015, 10:11 AM

• mobrovac moved this task from Backlog to In progress on the RESTBase board.Jun 5 2015, 10:21 AM

• mobrovac added a parent task: T94890: RFC: API for retrieval and saving of top-level HTML elements / sections by element ID.

If the revision number is not given, the latest for the page will be retrieved, from which the basetimestamp field will be used. Likewise, if no token is provided, a new one will be obtained from the MW API.

This would be a bad idea. CSRF protections are there to prevent malicious sites from abusing somebody's authenticated browser to perform mutating actions like edits on their behalf. This means that the browser needs to retrieve those tokens by visiting the site, and not RESTBase.

Similarly, the base revision is tracked to detect edit conflicts. RESTBase should not assume the latest revision if nothing is passed in.

The use of PUT is normally reserved to operations that let you retrieve the same entity later via GET. For wikitext, we'll probably want to return the plain wikitext as the body, and not some JSON blob. This means that we could use PUT if we move the extra metadata into headers, but not if we send a JSON body as in the provided example. As edit comments and minor edit flags are really a property of the revision and not the wikitext, I think that POST is more appropriate here.

In T101501#1340778, @GWicke wrote:

This would be a bad idea. CSRF protections are there to prevent malicious sites from abusing somebody's authenticated browser to perform mutating actions like edits on their behalf. This means that the browser needs to retrieve those tokens by visiting the site, and not RESTBase.

True that.

Similarly, the base revision is tracked to detect edit conflicts. RESTBase should not assume the latest revision if nothing is passed in.

So make revision a required arg?

So make revision a required arg?

I think we should simply pass through the parameter to the PHP API, and resist the temptation to make it up. There is already handling for the new page case in the API, and otherwise the revision is required.

• mobrovac updated the task description. (Show Details)Jun 5 2015, 11:21 AM

• mobrovac set Security to None.

• mobrovac removed a subscriber: Aklapper.

• mobrovac updated the task description. (Show Details)Jun 5 2015, 11:27 AM

In T101501#1340778, @GWicke wrote:

The use of PUT is normally reserved to operations that let you retrieve the same entity later via GET. For wikitext, we'll probably want to return the plain wikitext as the body, and not some JSON blob. This means that we could use PUT if we move the extra metadata into headers, but not if we send a JSON body as in the provided example. As edit comments and minor edit flags are really a property of the revision and not the wikitext, I think that POST is more appropriate here.

I'm kind of torn between the two. By definition, PUT is idempotent, while POST isn't, so if the MW API is idempotent (i.e. nothing happens if the same text is sent twice), I'm inclined to go with PUT.

On the other hand, POST is used to create sub-resources defined by the server (in our case revisions - we send a new revision, but the server decides on its ID). The exception to that would be creating a new page, as in that case the client is the one setting the ID (the page name).

But yeah, let's go with POST, simply because the server doesn't accept the information as-is, but manipulates it.

• mobrovac updated the task description. (Show Details)Jun 5 2015, 11:44 AM

POST can be idempotent too, but gets to deviate from plain PUT semantics (store an entity body at fixed URL). I think the POST end point should also not include the revision in the path, and instead POST to /{domain}/v1/page/wikitext/{title}. It will then follow fairly common POST behavior of creating sub-resources whose revision can't be known in advance, as you say.

In T101501#1340874, @GWicke wrote:

I think the POST end point should also not include the revision in the path, and instead POST to /{domain}/v1/page/wikitext/{title}. It will then follow fairly common POST behavior of creating sub-resources whose revision can't be known in advance, as you say.

Except for creating new pages, the users know (or should know) on which revision the text they're submitting is based, so we should use it to retrieve the basetimestamp. If we don't, but instead let clients supply it, e.g. in the body, then what is the added value provided here?

I think the current path, with the optional revision, fits both use-cases rather well.

Passing in some indication of "what revision did the user start with?" is needed for proper edit conflict handling, and should be included in this Restbase API. Currently that's basetimestamp in the action API, although @awight has some WIP patches to convert editing to use the revision id directly instead. In the action API this parameter is currently optional (in which case it defaults to "the current revision"), which really should be deprecated even though we'll probably never actually get to make it required.

There's also starttimestamp, which is needed to avoid problems if the page gets deleted in the time between when the editing process starts (i.e. the current page content is fetched) and the edit is submitted. Last I looked @awight was having trouble figuring out what to do to replace starttimestamp in his patches; I recommended using MAX(log_id), but that doesn't seem to have landed as of Gerrit 94584 PS37.

In T101501#1341396, @Anomie wrote:

Passing in some indication of "what revision did the user start with?" is needed for proper edit conflict handling, and should be included in this Restbase API. Currently that's basetimestamp in the action API, although @awight has some WIP patches to convert editing to use the revision id directly instead. In the action API this parameter is currently optional (in which case it defaults to "the current revision"), which really should be deprecated even though we'll probably never actually get to make it required.

If I'm reading the patch right, nothing is stopping us from supplying both basetimestamp and parentrevid, even before this WIP gets merged, right? If so, we can stay compatible with the API changes ahead of time, and simply remove the timestamp once the patch is merged. All of that without changing the RESTBase end-point spec. Pretty neat!

There's also starttimestamp, which is needed to avoid problems if the page gets deleted in the time between when the editing process starts (i.e. the current page content is fetched) and the edit is submitted.

The reason I haven't included starttimestamp in the first place is because I'm thinking that might be complicated for clients to supply (modulo VE & other MW-based clients). Nevertheless, we might consider including it as an optional parameter, but if this is bound to change soon-ish, I wonder whether it's worth it.

Last I looked @awight was having trouble figuring out what to do to replace starttimestamp in his patches; I recommended using MAX(log_id), but that doesn't seem to have landed as of Gerrit 94584 PS37.

If log_ids refer to log entries of when a page was fetched and when the edit button was clicked, then MAX(log_id) does seem like a viable solution, as it guarantees to return a value basetimestamp <= MAX(log_id) < current_timestamp. Note, however, that if clients (VE does that, e.g.) fetch the page via RESTBase, these logs will not be available.

In T101501#1341444, @mobrovac wrote:

If I'm reading the patch right, nothing is stopping us from supplying both basetimestamp and parentrevid, even before this WIP gets merged, right? If so, we can stay compatible with the API changes ahead of time, and simply remove the timestamp once the patch is merged. All of that without changing the RESTBase end-point spec. Pretty neat!

Assuming parentrevid doesn't get renamed to baserevid or something like that, anyway ;)

I wonder whether [starttimestamp]'s worth it.

It prevents accidentally recreating the page if it gets deleted while you're editing it, or reposting the deleted top revision if the page gets deleted then partially undeleted.

If log_ids refer to log entries of when a page was fetched and when the edit button was clicked, then MAX(log_id) does seem like a viable solution,

log_ids refers to the logging database table. The idea is that you'd fetch the current maximum log_id (likely the API would just return you the value from SELECT MAX(log_id) FROM logging) when starting the edit, then when submitting if the edited page's most recent deletion log entry has a greater log_id you know there was a conflict.

That's how starttimestamp works now, actually, except it uses log_timestamp instead (and the curtimestamp parameter just does time() instead of querying the database at all). The potential drawbacks of that are basically clock skew (especially if the client calculates starttimestamp locally) and the possibility of a need for sub-second resolution.

In T101501#1341512, @Anomie wrote:

In T101501#1341444, @mobrovac wrote:

Assuming parentrevid doesn't get renamed to baserevid or something like that, anyway ;)

Right, but we can easily change that as that would be internal to the way RESTBase calls the MW API, so wouldn't affect anything really.

I wonder whether [starttimestamp]'s worth it.

It prevents accidentally recreating the page if it gets deleted while you're editing it, or reposting the deleted top revision if the page gets deleted then partially undeleted.

I meant due to its possible and probable disappearance, is it worth to put it in the RB API now, not whether it's useful.

log_ids refers to the logging database table. The idea is that you'd fetch the current maximum log_id (likely the API would just return you the value from SELECT MAX(log_id) FROM logging) when starting the edit, then when submitting if the edited page's most recent deletion log entry has a greater log_id you know there was a conflict.

That's how starttimestamp works now, actually, except it uses log_timestamp instead (and the curtimestamp parameter just does time() instead of querying the database at all). The potential drawbacks of that are basically clock skew (especially if the client calculates starttimestamp locally) and the possibility of a need for sub-second resolution.

Yeah, so, basically it's bad to rely on the starttimestamp provided by clients, as they might have no relation at all to MW.

FWIW, I stumbled over starttimestamp when I learned that restored revisions actually have the revision ID of the old record they were restored from, so you can't assume that the IDs are incrementing, and you can't estimate the "starttimestamp" from the revision, cos that timestamp could predate delete events that we should be ignoring. I don't expect to get rid of that parameter, it would require refactoring how restore happens.

I'm not sure how to get the start timestamp in your case. For the web UI, it's a hidden input included with in the edit form. Perhaps you can do something similar, the original API response that includes the parent revision text would include a server timestamp... :-/

In T101501#1342277, @awight wrote:

FWIW, I stumbled over starttimestamp when I learned that restored revisions actually have the revision ID of the old record they were restored from, so you can't assume that the IDs are incrementing, and you can't estimate the "starttimestamp" from the revision, cos that timestamp could predate delete events that we should be ignoring. I don't expect to get rid of that parameter, it would require refactoring how restore happens.

Ah, it gets the old rev id? That does indeed complicate things...

I'm not sure how to get the start timestamp in your case. For the web UI, it's a hidden input included with in the edit form. Perhaps you can do something similar, the original API response that includes the parent revision text would include a server timestamp... :-/

Since RESTBase is a stateless service, the main problem is that we cannot trust any of the indicators supplied to us (they may be genuine, i.e. originally provided by RESTBase, but may also be made up by the client).

PR #251 ready for review.

The timestamp & base revision business is still a bit unclear to me.

you can't estimate the "starttimestamp" from the revision, cos that timestamp could predate delete events that we should be ignoring

Let me rephrase to see if I understand this right: User A starts an edit, user B edits & saves, but that edit is deleted, and finally user A saves the edit. Requirement is to ignore the deleted edit. Is this right?

When a revision is restored, is the timestamp associated with that revision updated to reflect the restore?

• GWicke updated the task description. (Show Details)Jun 15 2015, 6:37 PM

In T101501#1365945, @GWicke wrote:

you can't estimate the "starttimestamp" from the revision, cos that timestamp could predate delete events that we should be ignoring

Let me rephrase to see if I understand this right: User A starts an edit, user B edits & saves, but that edit is deleted, and finally user A saves the edit. Requirement is to ignore the deleted edit. Is this right?

No.

User A starts an edit. Meanwhile User B deletes the page and/or undeletes some deleted revisions. User A should receive an edit conflict so they can make sure their edit is still valid after what User B did to the page.

When a revision is restored, is the timestamp associated with that revision updated to reflect the restore?

No. It's not necessarily even made the "top" revision.

User A starts an edit. Meanwhile User B deletes the page and/or undeletes some deleted revisions. User A should receive an edit conflict so they can make sure their edit is still valid after what User B did to the page.

Ah, thank you for the clarification! Makes sense now.

To me it sounds like this case can actually be covered with a slightly different definition of a base revision. In this scheme, the 'baserevision' would be the latest revision of the page at the time the edit started, or null if the page does not exist yet. The edit conflict check is for a change of this latest revision; any change (not just an increment) can trigger an edit conflict.

Consequences of this scheme:

a delete of the latest revision without a restore triggers a conflict
a delete followed by a restore of the latest revision is ignored, since the revision would match again
a revision newer than the latest revision triggers a conflict
a deleted page triggers an edit conflict, as there is no more current revision
the parent revision can still be passed around for informational purposes, but would not be used for edit conflict detection

Does this make sense?

For that to work, you'll have to be removing the automatic merging of edit conflicts when they don't edit the same part of the page. Which means people will be coming after you with torches and pitchforks for drastically increasing the rate of edit conflicts on busy pages or noticeboards. ;)

Flow is not the magic answer to that, BTW, since it also applies to articles when news is breaking.

@Anomie, I didn't mean to imply that conflict resolution should be removed. We need to reliably detect that there is a conflict before we can try to resolve it automatically.

The PR providing RESTBase with the edit action functionality has been merged, so lowering the priority so the conversation can continue :)

• mobrovac mentioned this in rGRES68952dcb5f1f: Comment out the save public API endpoints.Jun 25 2015, 8:31 PM

Relevant thread: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/83422

Fannon subscribed.Jul 31 2015, 7:12 PM

Ricordisamoa subscribed.Aug 1 2015, 4:21 AM

Here is a proposal for a REST-style save API, for either HTML or Wikitext:

Edit the latest revision

Retrieve the original HTML:

GET /page/wikitext/{title}
-> Etag: "675818910/78f2b73f-418b-11e5-b7b5-20b71bcf9823"

POST /page/wikitext/{title}
If-Match: "675818910/78f2b73f-418b-11e5-b7b5-20b71bcf9823"

{
  minor: boolean,
  bot: boolean,
  summary: text,
  body: text,
  token: token
}

Edit an old revision

Retrieve the old revision:

GET /page/wikitext/{title}/{revision}

Get the Etag of the latest revision

HEAD /page/wikitext/{title}
-> Etag: "675818910/78f2b73f-418b-11e5-b7b5-20b71bcf9823"

Save the modified old revision

POST /page/wikitext/{title}
If-Match: "675818910/78f2b73f-418b-11e5-b7b5-20b71bcf9823"

{
  minor: boolean,
  bot: boolean,
  summary: text,
  body: text,
  token: token
}

In both cases, conflicts can be ignored by omitting the If-Match header. On save, RESTBase would convert the revision information (first half of the Etag) into timestamps to be used with the PHP API by hitting action=query.

I'm reading the proposition with the assumption that we are going to be able to identify / mitigate bad or incorrect ETags. I like the idea, but I'm not entirely sure I understand what you mean with:

In T101501#1538276, @GWicke wrote:

In both cases, conflicts can be ignored by omitting the If-Match header.

Are you referring to RESTBase conflicts? Or MW API conflicts? I'm inclined to think that the opposite should be true - if clients/users want to avoid conflicts, they should go the extra mile and provide a valid ETag (If-Match, that is). Without a valid header, we could perhaps assume it's the latest render of the latest revision?

@mobrovac, in the Action API, clients can currently opt to ignore edit conflicts by choosing to not pass in basetimestamp and/or starttimestamp. The equivalent in this API proposal is to omit the If-Match header.

We'll definitely need to support new page creation. To figure out whether it is wise to continue supporting edits to existing pages without conflict detection we'd need to look at the use cases. I could well imagine that there aren't many legitimate ones, but haven't actually checked. @Anomie, what are your thoughts on this?

Right. https://gerrit.wikimedia.org/r/#/c/94584/ sounds like would bring MW API's editing action closer to RESTBase's logic by using the parent revision ID to detect conflicts, but we need to see the developments there.

Until then, I guess this opt-out feature forces us to do the same, but I can already see clients neglecting the ETag header. (This is not a factual statement, but rather a human-nature observation).

@mobroc, agreed on human nature. I'm definitely in favor of revisiting support for ignoring conflicts.

@Anomie, @awight: Does T101501#1538276 sound sane to you?

As long as the ETag contains enough information to serve as both basetimestamp and starttimestamp, it should work for detecting conflicts. I'm guessing the UUID-looking bit in there is equivalent to basetimestamp?

For the "edit an old revision" flow, I'd recommend doing step 1 either before or while loading the data to decide whether to edit and which old revision to edit, rather than waiting until after you've already made that decision, since a subsequent edit might change that decision.

The RFC-defined semantics of If-Match don't seem too conducive to automatically-resolving edit conflicts, though, as was discussed earlier.

In T101501#1571159, @Anomie wrote:

As long as the ETag contains enough information to serve as both basetimestamp and starttimestamp, it should work for detecting conflicts. I'm guessing the UUID-looking bit in there is equivalent to basetimestamp?

It is the equivalent of starttimestamp, and also equivalent to basetimestamp when editing the latest revision. It is fine for detecting conflicts, but you rightly point out that it is not sufficient to resolve them in the "edit old revision" case. To fix this issue, we should perhaps pass along the etag of the base revision in a request parameter.

For the "edit an old revision" flow, I'd recommend doing step 1 either before or while loading the data to decide whether to edit and which old revision to edit, rather than waiting until after you've already made that decision, since a subsequent edit might change that decision.

I think you mean that it's good to fetch the etag of the latest revision as early as possible (request 2), to make sure that edit conflicts (which might have changed the decision on whether to edit) are caught. I completely agree with that recommendation.

The RFC-defined semantics of If-Match don't seem too conducive to automatically-resolving edit conflicts, though, as was discussed earlier.

RFC 7232 leaves some wiggle room:

An origin server MUST NOT perform the requested method if a received
If-Match condition evaluates to false; instead, the origin server
MUST respond with either a) the 412 (Precondition Failed) status code
or b) one of the 2xx (Successful) status codes if the origin server
has verified that a state change is being requested and the final
state is already reflected in the current state of the target
resource (i.e., the change requested by the user agent has already
succeeded, but the user agent might not be aware of it, perhaps
because the prior response was lost or a compatible change was made
by some other user agent).

While the "already reflected in the current state" might not be strictly true in our case, I am inclined to read the "compatible change" to apply to our use case in spirit, especially for POST. It is clear that the intention is to prevent the lost update problem, and conservative merging of conflicts generally preserves updates.

(If the new end-point is supposed to be accessed from client-side JS, you should ensure that ETags are easy to get/set with mw.Api/jQuery.ajax)

In T101501#1574110, @GWicke wrote:

I think you mean that it's good to fetch the etag of the latest revision as early as possible (request 2),

Yes. Too much revising with not enough re-reading.

The RFC-defined semantics of If-Match don't seem too conducive to automatically-resolving edit conflicts, though, as was discussed earlier.

While the "already reflected in the current state" might not be strictly true in our case, I am inclined to read the "compatible change" to apply to our use case in spirit, especially for POST. It is clear that the intention is to prevent the lost update problem, and conservative merging of conflicts generally preserves updates.

The "compatible change" doesn't apply at all, since it's only being given as an example for "and the final state is already reflected in the current state of the target resource" [emphasis added]. In other words, it's only addressing the case where merging the edit conflict results in a null edit, with no wiggle room for a general merge.

I do agree that the intention of If-Match as a whole is to avoid the lost update problem, and merging edit conflicts would be an entirely reasonable extension of RFC 7232 that we should probably do (and possibly even propose to the IETF). But we should document it as an extension to the RFC rather than pretending it's allowed by the RFC.

@Ricordisamoa: XHR, jquery.ajax and https://fetch.spec.whatwg.org/ support retrieving and setting request headers. A convenience wrapper in the mw module shouldn't be too hard to construct based on this.

• GWicke mentioned this in T111491: VE adds <i> tags due to RB sending the wrong data-parsoid to Parsoid(?).Sep 5 2015, 4:40 PM

PR at https://github.com/wikimedia/restbase/pull/319.

I'm having some second thoughts on the parameter naming. Since we are creating a new API, we should see if we can improve the naming, so that future api users have an easier time finding their way around.

Boolean flags like minor or bot might be clearer as is_minor and is_bot.
I fear that the token parameter is easy to confuse with other tokens. Something like csrf_token might avoid this.

Things we have already adjusted:

We use comment instead of summary, which I think is more consistent with expectations and other places in the MW codebase.
We also use wikitext or html instead of text. This lets us add multi-part content types later, but feels a little bit dirty relative to a structure specifying a mime type & body per content part. It does make form-based submission easier, though.

Speaking of form submissions, we should probably offer a fall-back method of supplying the if-match header. One generic option we could consider is treating all post fields named http_override.headers.* or _http.headers.* as header overrides. In RESTBase, we could even implement this generically after post parsing. Examples:

http_override.method: put
http_override.headers.if-match: 1234567/effffff

Alternative, more compact syntax:

_http.method: put
_http.headers.if-match: 1234567/effffff

Prior art in this space looks thin:

X-HTTP-Method-Override header and variants
[_method post parameter](https://github.com/expressjs/method-override#override-using-a-query-value)
nothing for headers I'm aware of

In T101501#1614873, @GWicke wrote:

Boolean flags like minor or bot might be clearer as is_minor and is_bot.

I fear that the token parameter is easy to confuse with other tokens. Something like csrf_token might avoid this.

Why inconsistencies with the Action API?

Things we have already adjusted:

We use comment instead of summary, which I think is more consistent with expectations and other places in the MW codebase.

Not necessarily. T62442#670958

• GWicke mentioned this in T62442: Inconsistent arguments comment/reason/summary for edit messages.Sep 7 2015, 10:04 PM

• GWicke updated the task description. (Show Details)

Why inconsistencies with the Action API?

While inconsistencies do have a cost, the introduction of a new API is a moment where their cost is relatively low, as there aren't any existing users. Confusing naming does have a cost in the longer term as well, so it's worth considering clarifications, taking into account their longer-term cost and benefits.

• GWicke mentioned this in T111748: [RFC] Generalize POST parameter to JSON structure and header mapping in REST APIs.Sep 8 2015, 1:04 AM

In T101501#1614873, @GWicke wrote:

Boolean flags like minor or bot might be clearer as is_minor and is_bot.

While I don't like the is_* naming convention, it is true that it makes it clearer that for the bot parameter a Boolean is sought. minor isn't controversial, IMHO, but for consistency's sake we should employ the same convention.

An alternative could be minor and bot_used.

I fear that the token parameter is easy to confuse with other tokens. Something like csrf_token might avoid this.

edit_token might be even clearer while making it future-proof (in case CRSF is replaced by something else).

Alternative, more compact syntax:

_http.method: put

_http.headers.if-match: 1234567/effffff

That could prove really useful (if unpacked and replaced at one of the edges, e.g. RESTBase). But I think that this approach introduces an ambiguity for _http.body. Consider:

req:
  method: post
  headers:
    h1: v1
    h2: v2
  body:
    _http:
      method: patch
      headers:
        h3: k3
      body:
        b1: bv1
    b2: bv2

Should here the new body contain b1, b2 or both? This can be easily dealt with by saying that if _http contains a body field, then no other fields in the original request's body are allowed, but that looks a bit confusing to me. Perhaps disallowing the definition of _http.body altogether might be better (but that leave the solution somewhat incomplete in the same sense where now there are ways to redefine the method, but not the headers).

Prior art in this space looks thin:

X-HTTP-Method-Override header and variants

[_method post parameter](https://github.com/expressjs/method-override#override-using-a-query-value)

nothing for headers I'm aware of

We should support these as well since there is no current standard for this kind of thing.

In T101501#1615988, @mobrovac wrote:

I fear that the token parameter is easy to confuse with other tokens. Something like csrf_token might avoid this.

edit_token might be even clearer while making it future-proof (in case CRSF is replaced by something else).

Actually MW Tokens API deprecated edittoken and renamed it to csrftoken, so if we call it edit token, we make a lot of misunderstanding if a deprecated one should be used with RESTBase or a new one.

_http.body

I don't think it makes sense to support overriding a body this way. This is the body, after all.

Edit: On reflection, maybe there could be a use case for specifying a body this way for PUT requests. However, I think we can defer that decision until we need it.

We should support these as well since there is no current standard for this kind of thing.

I think we should define one clear and coherent mechanism, and stick with it. Many ways to do the same thing is hard to document and confusing to users. If there are popular tools that favor one of the existing options then we could consider picking that as the one way to do it, but I'm not aware of any such tool.

So, we've settled on using snake_case for POST parameters, but there are still some naming questions that needs to be resolved:

minor vs is_minor
bot vs bot_used vs is_bot
token vs csrf_token vs csrftoken vs edit_token vs edittoken
comment vs summary vs reason

I personally vote for is_minor, is_bot, csrf_token and comment.

Other left-overs are bodyOnly deprecation, which should be done separately, and http header/method overrides, which should also be handled separately under T111748

How are the minor/bot parameters going to be interpreted? Is the presence of the parameters enough, or do they need to be set to a truthy-value?

In T101501#1614873, @GWicke wrote:

We also use wikitext or html instead of text. This lets us add multi-part content types later, but feels a little bit dirty relative to a structure specifying a mime type & body per content part. It does make form-based submission easier, though.

What about content types that are neither? Like JavaScript/CSS/JSON/etc. pages?

I personally vote for is_minor, is_bot, csrf_token and comment.

How are the minor/bot parameters going to be interpreted? Is the presence of the parameters enough, or do they need to be set to a truthy-value?

I think a truth-y value should be required. We should also standardize what we consider truthy/falsy. Perhaps '', '0' or 'false' for falsy, everything else truthy? Only accepting 'true' or 'false' could be attractive as well.

In T101501#1616959, @Legoktm wrote:

In T101501#1614873, @GWicke wrote:

We also use wikitext or html instead of text. This lets us add multi-part content types later, but feels a little bit dirty relative to a structure specifying a mime type & body per content part. It does make form-based submission easier, though.

What about content types that are neither? Like JavaScript/CSS/JSON/etc. pages?

In the current scheme, they'd get 'javascript', 'css' or 'json' entries.

In T101501#1617193, @GWicke wrote:

How are the minor/bot parameters going to be interpreted? Is the presence of the parameters enough, or do they need to be set to a truthy-value?

I think a truth-y value should be required. We should also standardize what we consider truthy/falsy. Perhaps '', '0' or 'false' for falsy, everything else truthy? Only accepting 'true' or 'false' could be attractive as well.

I think it's better to stick with real true/false. I a swagger spec/docs there's no concept of truth-y, so we have now way to document that we also accept 1 or empty object. And supporting undocumented options doesn't seem to have any value.

Fannon unsubscribed.Sep 8 2015, 5:42 PM

@Pchelolo, you make a good point about swagger defaulting to "true" or "false".

@mobrovac, @Eevans, @Legoktm, @Anomie: What's your take on requiring exactly "true" or "false"?

I'd recommend also accepting 0/1, since that's what's typically used in languages that lack a dedicated boolean type. Unless you're going for some sort of SOAP-style XSD that defines the structure and type of every function that every client is expected to download and parse (or else just hard-code) to re-encode the input coming from the user.

We should support these as well since there is no current standard for this kind of thing.

I think we should define one clear and coherent mechanism, and stick with it. Many ways to do the same thing is hard to document and confusing to users.

I meant silently support them for compatibility with other frameworks / REST APIs, but promote (i.e. officially document) only our way(TM).

If there are popular tools that favor one of the existing options then we could consider picking that as the one way to do it, but I'm not aware of any such tool.

X-HTTP-Method(-Override) seems to be accepted and understood by most frameworks.

I personally vote for is_minor, is_bot, csrf_token and comment.

My favourites would be minor, bot_used, write_token and comment but don't feel too strong about it.

I think it's better to stick with real true/false.

Idem. No need to be too liberal here, everybody can understand true vs false. That said, silently accepting "true" / "false" (strings) and 0 / 1 (integers) can go a long way when it comes to compatibility. Ideally:

boolean_field = case boolean_value
  when true, "true", 1 then true
  when false, "false", 0 then false
  else default_value
end

It sounds like we have a quorum for "true", "1" vs. "false", "0". This is proving to be a very clarifying thread ;)

The save API implementation in https://github.com/wikimedia/restbase/pull/319 is now merged. It will likely be deployed to production next week.

Why comment, is_minor, is_bot, csrf_token?

In T101501#1629506, @Ricordisamoa wrote:

Why comment, is_minor, is_bot, csrf_token?

comment is, in our opinion, much clearer than, e.g. summary. The same goes for the others. Do you have any concerns about this naming scheme?

In T101501#1629866, @mobrovac wrote:

In T101501#1629506, @Ricordisamoa wrote:

Why comment, is_minor, is_bot, csrf_token?

comment is, in our opinion, much clearer than, e.g. summary. The same goes for the others. Do you have any concerns about this naming scheme?

Not only I find the chosen scheme not noticeably clearer than core's, but you've taken your own stance without any apparent ambition for standardization.

The PR for page save API has been deployed, so HTML and Wikitext save endpoints are now public and this ticket can be closed. Please feel free to reopen if you encounter any issues.

• Pchelolo closed this task as Resolved.Sep 21 2015, 8:25 AM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:41 PM

RFC: HTML and wikitext save API end-pointsClosed, ResolvedPublicActions