Page MenuHomePhabricator

On-demand generation of HTML and data-parsoid
Closed, ResolvedPublic

Description

TBD

  • Who the first round of users of this feature will be
  • Request content type, headers, body model
  • Response status, content type, headers body model

Implementaiton notes

Request to RESTBase: GET /{domain}/v1/page/{name}/html/{revision}

Restbase checks whether the HTML for that revision is found in storage. If not, then it asks Parsoid to generate it.

For a normal GET request, RESTBase checks whether the predecessor revision is found in storage (currently the predecessor revision is passed in an x-parsoid header, but we could also try the pagecontent revision table). If it is, then it retrieves the data for that and posts:

POST /v2/{domain}/html/{name}/{revision}

{
    previous: {
        revid: 12345, // The previous revision ID
        html: {
            headers: {
                'content-type': 'text/html;profile=mediawiki.org/specs/html/1.0.0'
            },
            body: "the original HTML"
        }
        'data-parsoid': {
            headers: {
                'content-type': 'application/json;profile=mediawiki.org/specs/data-parsoid/0.0.1'
            },
            body: {}
        }
    }
}

For a no-cache request, RESTBase instead first checks whether the *current* revision is found in storage. If it is, it sends the data for that in the original key:

POST /v2/{domain}/html/{name}/{revision}

{
    original: {
        revid: 12345, // The original revision ID
        html: {
            headers: {
                'content-type': 'text/html;profile=mediawiki.org/specs/html/1.0.0'
            },
            body: "the original HTML"
        }
        'data-parsoid': {
            headers: {
                'content-type': 'application/json;profile=mediawiki.org/specs/data-parsoid/0.0.1'
            },
            body: {}
        }
    }
}

This entry point returns both html and data-parsoid in one JSON blob, which restbase stores in html and data-parsoid buckets, and also returns to the client.

Example response from Parsoid:

{
    revid: 12346, // The new revision ID (maybe?)
    html: {
        headers: {
            'content-type': 'text/html;profile=mediawiki.org/specs/html/1.0.0'
        },
        body: "the new HTML"
    }
    'data-parsoid': {
        headers: {
            'content-type': 'application/json;profile=mediawiki.org/specs/data-parsoid/0.0.1'
        },
        body: {}
    }
}

Event Timeline

Jdouglas claimed this task.
Jdouglas raised the priority of this task from to High.
Jdouglas updated the task description. (Show Details)
Jdouglas added subscribers: mobrovac, fgiunchedi, ssastry and 4 others.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 3 2015, 6:47 PM
This comment was removed by Jdouglas.
Jdouglas renamed this task from On-demand generation of HTML and data-parsoid to On-demand generation of HTML and data-parsoid.Feb 4 2015, 5:20 PM
Jdouglas updated the task description. (Show Details)
Jdouglas set Security to None.
Jdouglas added a comment.EditedFeb 4 2015, 5:29 PM

This task has a few TBDs:

  • Who the first round of users of this feature will be
  • Request headers, body model
  • Response status, content type, headers body model

Based on most recent conversations, I've put the following together:

Initial users

Editors using VisualEditor, via the VisualEditor team.

Request/response example: html

Request

  • Method: GET
  • URL: /{domain}/v1/page/{title}/html/{revision}
  • Accept: text/html;profile=mediawiki.org/specs/html/1.0.0

Response

  • Status: 200
  • Content-Type: text/html;profile=mediawiki.org/specs/html/1.0.0
  • Body: (the raw HTML)

Request/response example: data-parsoid

Request

  • Method: GET
  • URL: /{domain}/v1/page/{title}/data-parsoid/{revision}
  • Accept: application/json;profile=mediawiki.org/specs/data-parsoid/0.0.1

Response

  • Status: 200
  • Content-Type: application/json;profile=mediawiki.org/specs/data-parsoid/0.0.1
  • Example body:
{
  "counter": 1,
  "ids": {
    "mwAA": {
      "dsr": [
        0,
        34,
        0,
        0
      ]
    },
    "mwAQ": {
      "src": "#REDIRECT ",
      "a": {
        "href": "./Left_coronary_artery"
      },
      "sa": {
        "href": "Left coronary artery"
      },
      "dsr": [
        0,
        34,
        null,
        null
      ]
    }
  }
}
GWicke added a comment.EditedFeb 4 2015, 7:30 PM

@Jdouglas, initial users are also the other current users of the Parsoid v1 API. See T88319.

Regarding the API, I think the accept headers should be optional (as they are right now). The transform API is also important. From a CR comment:


We were talking about a flat POST API as in the current Parsoid v1 API, possibly slightly generalized / cleaned up for the /transform hierarchy.

Perhaps something like this:

POST /{domain}/v1/transform/wikitext/to/html/{title}/{oldid}
Content-type: multipart/form-data

wikitext: '== Foo =='
bodyOnly: 'true'
POST /{domain}/v1/transform/html/to/wikitext/{title}/{oldid}
Content-type: multipart/form-data

html: '<html>...</html>'
Jdouglas added a comment.EditedFeb 5 2015, 11:35 PM

Here's the latest, based on our conversations. @GWicke, @mobrovac please comment with any corrections you'd like to make.


This task has a few TBDs:

  • Who the first round of users of this feature will be
  • Request headers, body model
  • Response status, content type, headers body model

Based on most recent conversations, I've put the following together:

Initial users

Nobody in particular.

Request/response example: html

Request

  • Method: GET
  • URL: /{domain}/v1/page/html/{title}/{revision}

Response

  • Status: 200
  • Content-Type: text/html;profile=mediawiki.org/specs/html/1.0.0
  • Body: (the raw HTML)

Request/response example: data-parsoid

Request

  • Method: GET
  • URL: /{domain}/v1/page/data-parsoid/{title}/{revision}

Response

  • Status: 200
  • Content-Type: application/json;profile=mediawiki.org/specs/data-parsoid/0.0.1
  • Body: (a JSON object)
GWicke closed this task as Resolved.Feb 12 2015, 11:54 PM

This is now implemented and quite well tested. Resolving.