The extracts endpoint is one of the most popular query endpoint in MediaWiki PHP API with on average 450 req/s. The information returned by the endpoint is fairly static and cacheable. The cache invalidation conditions are the same as for Parsoid html data, and are already tracked by RESTBase. With Varnish-cachable REST API we could provide significant improvement in response latency and lower the overall load on the PHP API.
However, the PHP endpoint provides multiple options, which makes it hard to convert it to REST. This task is created to discuss which of the options are useful, and which could be dropped.
Currently, following options are supported:
- exchars - how many characters are returned. This could be easily dropped out, as the full returned extract could be easily stripped on the client. However, dropping out this option makes the client load up the whole content, which might be inefficient, if this number is normally significantly lower than the content length. In case this is true, we could provide a number of options for returned content length, like short, long, full.
- exsentences - How many sentences to return. Mostly identical to the previous one.
- exlimit - How many extracts to return. REST API wouldn't allow batch requests, so this option would be definitely drooped.
- exintro - Return only content before the first section. If it's needed, this could become a separate endpoint. However, to support it, RESTBase would need to make a second request to the PHP API on invalidation, so ideally this should be dropped as well.
- explaintext - Return extracts as plain text instead of limited HTML. In case this is used, the REST endpoint could provide a format parameter. However, we should consider returning only simplified HTML, as filtering-out all the tags is a fairly easy operation to do it on the client.
- exsectionformat - How to format sections in plaintext mode: plain, wikitext, raw. Likely to be dropped out, as RESTBase doesn't serve wikitext, and raw doesn't seem to be useful for clients.
- excontinue - When more results are available, use this to continue. As batch requests wouldn't be supported, this would be dropped.
- exvariant - Convert content into this language variant. The language variant would be controlled by the request domain, so this would also be dropped.
In case none of the options are strictly required, the API could look like this:
- /page/extract/{title} - returns the page extract for the latest title
- /page/extract/{title}/ - lists all revision/tid pairs available in storage for text extracts
- /page/extract/{title}{/revision}{/tid} - get an exact for a historic revision of a page
This task is created to accumulate the information about real usages of the API and decide on the options needed by the clients.