Page MenuHomePhabricator

Consider accepting the empty string for required parameters (esp. wikitext)
Closed, ResolvedPublic

Description

From T112286#1790669:

  1. Go to https://en.wikipedia.org/wiki/Jonathan_Mendelsohn.
  2. Click on the redlink labeled 'Nicola' in the final paragraph.
  3. Click on 'Create'.

I reliably get an HTTP 400 error this way.

The request is an HTTP POST to https://en.wikipedia.org/api/rest_v1/transform/wikitext/to/html/Nicola_Mendelsohn/0 with form data:

titleNicola_Mendelsohn
oldid0
wikitext
stashtrue

The response is:

{"type":"https://restbase.org/errors/invalid_request","method":"post","detail":"Missing request parameter: wikitext","uri":"/en.wikipedia.org/v1/transform/wikitext/to/html/Nicola_Mendelsohn/0"}

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke subscribed.
GWicke renamed this task from Accept empty string for required parameters (esp. wikitext) to Consider accepting the empty string for required parameters (esp. wikitext).Nov 7 2015, 7:33 AM
GWicke added a project: RESTBase.
GWicke set Security to None.
GWicke edited subscribers, added: Pchelolo; removed: Aklapper.

The issue happens when switching from a blank wikitext editor to VE, using the new stash end point. Although there is clearly no content to preserve, I think it would be nicer to relax our input validation to accept the empty string as an input there, as it would save clients from implementing a special case for blank pages.

There is another issue when using the stash API for new pages:

curl -X POST --header "Con-www-form-urlencoded" --header "Accept: text/html; profile="mediawiki.org/specs/html/1.1.0"" -d "wikitext=''&stash=true" "https://en.wikipedia.org/api/rest_v1/transform/wikitext/to/html/SomeNoneExistingTitle/0"
{"type":"https://restbase.org/errors/not_found#page_revisions","title":"Not found.","method":"post","detail":"Page or revision not found.","uri":"/en.wikipedia.org/v1/transform/wikitext/to/html/Foobar/0"}

This might be the revision deletion check triggering: things work when referencing revision 1. We should suppress that for stashing, and perhaps this transform end point in general.

The problem is actually way more interesting and not limited to stashing. Here's an example:

curl -X GET "https://rest.wikimedia.org/en.wikipedia.org/v1/page/html/Foobar/689534914"

The revision number 689534914 belongs to a Dog article, and although we've asked for a Foobar article, content for the Dog article will be returned. The problem here is that we cannot find content with the title+revision combination, and start generating it. page_revisions module can't find content too, so it requests it from MW API, but by revision attribute only. MW API returns it a for the Dog title, as this revision number belongs to it. Parsoid makes an assumption, that different title of the returned article is an normalisation mismatch, and requests content again for the Dog article - and returns it.

In the example from task description with 0 revision, 404 is returned because there's no revision with 0 id in MW. Zero is not special in any sense here, for example 3 will return 404, while 5 will return content.

The revision number 689534914 belongs to a Dog article, and although we've asked for a Foobar article, content for the Dog article will be returned.

This is a known issue. We should work on this part in the page_revisions and parsoid modules once the page table feature is merged.

This is a known issue. We should work on this part in the page_revisions and parsoid modules once the page table feature is merged.

The page table feature wouldn't really help us here. The page table redirects only requests for the latest revision, while here we request a specific historic revision, so it will not be redirected even if a page is renamed. Also page table doesn't do anything with normalisation. So, it will not be involved at all.

Normalisation contain capitalising the first letter, replacing underscores with spaces, and replacing namespace name with the localised form in that wiki. Although we can do first 2 things in RESTBase, for the last one we need to run a request to MW API. To resolve the normalisation part, I propose to make a new internal-only endpoint in page_revisions, something like normalise. It will do the following:

  1. Make local normalisation (replace underscore and capitalise). Check storage for the title+revision pair
  2. If found - all is fine, return normalised title+revision === original title+revision
  3. If not found - make a call to MW API title normalisation endpoint.

Another problem here is that our normalizeTitle function does exactly the opposite of what MW does. We replace space with an underscore, while MW replaces underscore with a space. We can't really change that, because it would mean invalidation all the storage..

Although there is clearly no content to preserve

This is not true if the user started from an existing page and then deleted all the content. The empty content is this case is meaningful and should be preserved.

@matmarex: Technically, any HTML content will differ on save, and no selective serialization will happen either way. Whether an empty page was stashed or not won't make a difference to the save result.

In any case, I think the more important point is to avoid special cases for the client.

I submitted a workaround patch for VE (see T118152).

@Pchelolo, we shouldn't really care too much about the specified title. It is passed to Parsoid to make sure that page-dependent things render as expected, but otherwise it's an internal issue of how we store those stashes. We could avoid having to deal with titles for stash storage if we used uuids exclusively, as proposed in T118128.

If this is not going to be worked around in VE, it should have higher priority, as currently VE's interface literally prevents (non-power-)users from starting new articles using it.

@matmarex, I think getting out a temporary fix in VE like https://gerrit.wikimedia.org/r/#/c/252014/ would be good to stop this impacting users.

Independently, we are looking into making RESTBase work with current VE, but the sooner we can stop this broken feature impacting users the better.

A change that makes RESTBase API behavior conform to VE's assumptions is available at https://github.com/wikimedia/restbase/pull/404.

In detail:

  • Accept requests with an empty body.
  • Fix a bug in rbUtil.cloneRequest() that replaced an empty-string body with null.
  • Accept stash requests without a version, or with a fake '0' version.

This is now merged & will likely be deployed tomorrow.

mobrovac claimed this task.

Deployed, resolving.