Page MenuHomePhabricator

Edit API does not always normalize line endings
Closed, ResolvedPublic

Description

This results in edits making large byte changes, without anything actually changing in the diff. Both of these are null edits of the same page, one via the API, the other via the edit form.

firefox_2016-08-12_15-14-11.png (46×156 px, 1 KB)

Whether the API normalizes line endings or not depends on the page's content model. More specifically, whether the content model calls the parser to do a pre-save transform (which is one place where line endings get normalized). This differs from editing via the usual edit form, where line ending normalization is done as part of importing the form data, and is done regardless of the content model.

Ultimately this is caused by ApiEditPage directly calling EditPage\attemptSave, skipping EditPage\edit, which normalizes the line endings (via EditPage\importFormData/EditPage\safeUnicodeInput/WebRequest\getText). This doesn't actually make any difference for wikitext, since saving will do a pre-save transform, and thus always normalize the line endings. However for content models such as Scribunto's, it doesn't do any pre-save transform, and thus only normalizes line endings when saved via the edit form.

Note for testing: The usual application/x-www-form-urlencoded format seems to normalize line endings on the client, you have to use multipart/form-data, which doesn't (at least on Firefox).

new mw.Api( {
  ajax: { contentType: 'multipart/form-data' }
} ).postWithToken( 'edit', {
  action: 'edit',
  pageid: mw.config.get( 'wgArticleId' ),
  text: $( '#wpTextbox1' ).val()
} );

Event Timeline

Anomie subscribed.

I don't think that "always change line endings in the API" would be the best thing, though, since it prevents anything that does care about line endings from using the edit action (probably such a thing would override the usual text-based edit form).

Instead, I'd suggest that the Content's preSaveTransform() should do the line ending normalization when appropriate. It would probably also be appropriate to have TextContent::preSaveTransform() do that normalization. ScribuntoContent would also need to stop overriding that method.

My main concern is this inconsistency is a pitfall that is easy to fall into for extension developers. Either the API should normalize line endings like the edit form does, or the edit form shouldn't normalize line endings, and leave it entirely up to the content model to handle.

Change 304841 had a related patch set uploaded (by Anomie):
ScribuntoContent: Use base class preSaveTransform()

https://gerrit.wikimedia.org/r/304841

Change 304844 had a related patch set uploaded (by Anomie):
TextContent: Normalize newlines in preSaveTransform()

https://gerrit.wikimedia.org/r/304844

Change 304841 merged by jenkins-bot:
ScribuntoContent: Use base class preSaveTransform()

https://gerrit.wikimedia.org/r/304841

Change 304844 merged by jenkins-bot:
TextContent: Normalize newlines in preSaveTransform()

https://gerrit.wikimedia.org/r/304844

Anomie claimed this task.

Scribunto will now normalize line endings. Wikidata and Flow use their own editors for their content types, I don't know if they even allow action=edit to work. I don't see anything else that won't normalize line endings via TextContent or some other mechanism.