Page MenuHomePhabricator

Implement Parsoid's API / URL routing in MediaWiki core
Closed, DuplicatePublic

Description

  • Ensure a clean separation between http / parsoid internals
    • Idea being that all the same info from restbase will probably be present (content-type, etc) but maybe in different forms (ie, query parameters instead of http header). So draw a line between the “extract information from HTTP request” part and the “do something useful with the extracted info”.
  • Figure out what a reasonable “action API” implementation of Parsoid’s interface would be
  • Determine what amount of the parsoid API is actively used by RESTBase, with the idea of simplifying the port by only supporting the minimal necessary API.

Event Timeline

ssastry created this task.Sep 24 2018, 8:39 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 24 2018, 8:39 PM
ssastry renamed this task from Evaluate ease / difficult of implementing Parsoid's API / URL routing in MediaWiki core to Evaluate ease / difficulty of implementing Parsoid's API / URL routing in MediaWiki core.Sep 24 2018, 8:39 PM
ssastry triaged this task as Normal priority.
ssastry moved this task from Backlog to Prototype / Evaluation on the Parsoid-PHP board.
Anomie added a subscriber: Anomie.EditedSep 25 2018, 2:09 PM

I'm really excited to see there's a Parsoid-PHP board!

Figure out what a reasonable “action API” implementation of Parsoid’s interface would be

Looking over https://www.mediawiki.org/wiki/Parsoid/API from the perspective of the action API:

  • The common HTTP headers Accept-Encoding and Cookie will be handled by MediaWiki. X-Request-Id becomes the requestid GET or POST parameter to the action API.
  • Most of the common path parameters and common payload parameters will need to be considered for each endpoint.
    • "domain" is determined by which wiki you hit, rather than all wikis being accessed through one shared endpoint.
  • For POST in general, I note MediaWiki doesn't support application/json for the form data, only application/x-www-form-urlencoded and multipart/form-data. I also note gerrit:388486 exists. Is this necessary?
  • Both the GET and POST Wikitext -> HTML should likely be integrated into the existing ApiParse module.
    • Equivalents of "title", "revision", and "from" all already exist in ApiParse.
    • "format" as HTML is generally prop=text, but the default should probably remain not including the RDFa metadata with a new flag to enable it (much like the existing flags).
    • "format" as pagebundle could be done as a prop (for the split-out attributes) or a flag, along the lines of prop=parsetree or the generatexml flag. I'd recommend a prop.
    • "body_only" is the default. Is the ability to get a full HTML page of some sort needed, or can it be pieced together by the client from existing props such as headhtml?
    • I see there's mention of other possible fields, but they're not documented so I can't really comment on them.
  • HTML -> Wikitext should be a new action, which for now I'll call "ApiUnparse" (action=unparse).
  • HTML -> HTML could be the same ApiUnparse endpoint or a different one. At first glance I'd lean towards a different one since the parameters seem largely different.
  • Wikitext -> Lint could be a new prop on ApiParse, or it could be a separate module. I'd lean towards the former for this one.

I'm really excited to see there's a Parsoid-PHP board!

I am sure you are. :-)

Figure out what a reasonable “action API” implementation of Parsoid’s interface would be

Looking over https://www.mediawiki.org/wiki/Parsoid/API from the perspective of the action API:
...

Thanks @Anomie! This would be useful if we want to integrate with the Action API.

I didn't get around to updating the task entirely with other offsite discussion .. I just did a copy-paste dump from offsite notes. But, alternatively (in the interest of documenting all possibilities),

  • client integration can happen internally via VirtualRESTAPI without exposing a HTTP API. The HTTP API would be exposed by the REST API layer (which platform/services is probably looking at).
  • we expose Parsoid API as a REST API ... and in that case, we need to figure out how what would happen. Right now, Parsoid's code (which works with express) has a url router that extracts params from the REST API url and routes the request to the appropriate internal endpoint. The qn. would be if there is an equivalent router in core and if not, what it would take to do one.
cscott added a subscriber: cscott.Sep 25 2018, 3:54 PM

@Anomie note that we already have includes/libs/virtualrest/VirtualRESTService.php in core. Ideally we could use this to transparently redirect requests, so that clients are unaware of VE using (a) "old JS Parsoid" in a separate process, (b) RESTBase (indirectly invoking a separate JS Parsoid), (c) "new PHP Parsoid" as a loopback request or in-process function call with no RESTBase caching, or (d) RESTBase (indirectly invoking the new PHP Parsoid as an action API call or REST API call).

We deliberately do *not* want to mix the current ApiParse and the new Parsoid parse in the initial implementation. The porting plan is to initially preserve the existing API as much as possible so that "in PHP" is the only difference. We'd still use RESTBase to call back into a Parsoid cluster, the Parsoid cluster would just be running PHP code. Then we can incrementally make the changes you describe above to unify the APIs -- returning Parsoid output instead of PHP output for action=parse for instance. But we don't want to change the world all at once if we can help it, and it would be best if RESTBase didn't have a complicated API change to manage at the same time.

ssastry renamed this task from Evaluate ease / difficulty of implementing Parsoid's API / URL routing in MediaWiki core to Implement Parsoid's API / URL routing in MediaWiki core.Dec 6 2018, 6:49 PM