Page MenuHomePhabricator

Expose page properties as an API
Closed, ResolvedPublic

Description

Various users will need access to page properties that are set in the PHP parser during the parse process. We do have information about everything that's in the wikitext (categories, interwikis etc), but don't have access to private properties set by extensions. It would be great if these could be exposed through the API on:

  • action=parse (used by parsoid to expand extension tags), and
  • action=expandtemplates

In the longer term, we should think about how we'd like to evolve page metadata in general. One option is to move metadata out of the wikitext as discussed in T55508 and https://www.mediawiki.org/wiki/User:GWicke/PageProperties.

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke changed Security from none to None.
GWicke added a project: Web-Team-Backlog.
GWicke edited subscribers, added: MaxSem; removed: Aklapper.
GWicke subscribed.

the "page properties" in mediawiki terms are available through prop=pageprops or set prop=properties for action=parse

Note that getting properties on expandtemplates is unlikely because it just preprocesses wikitext while properties are usually (or rather universally?) added later, at the parse step.

@Umherirrender, @MaxSem: Thanks for the info!

I completely agree that expandtemplates would be less useful. We also really need to get the properties for the entire page, as some properties (like the wikidata item) are not actually added by parser tags. Here is a full example query: http://en.wikipedia.org/w/api.php?action=parse&format=json&page=Foobar&prop=properties

For the current revision this actually performs relatively well thanks to the page cache, so I think we can just call this & cache the info if we need faster access. For old revisions this will be an expensive re-parse, so it'll be important to save the information.

There are still some more bits of information from ParserOutput that we might want to expose, especially head items and resourceloader modules.

action=parse includes headitems and modules (not config vars: T67015), but not inline scripts. But in that case it is better to migrate the inline script to a module instead of adding that to action=parse.

All things from parser output are expensive for old versions. For the current version there are also the action=query&prop= modules for access without reparse or multi page access, but updates may delayed by job queue.

http://en.wikipedia.org/w/api.php?modules=parse

headitems: Gives items to put in the <head> of the page.
headhtml: Gives parsed <head> of the page.
modules: Gives the ResourceLoader modules used on the page.

Change 181442 had a related patch set uploaded (by Anomie):
API: Add page properties to action=expandtemplates output

https://gerrit.wikimedia.org/r/181442

Patch-For-Review

To summarize the situation so far:

  • action=parse already returns properties for the parsed page or wikitext.
  • action=query&prop=pageprops already returns properties from the database.
  • action=expandtemplates could still use prop=properties, Gerrit change 181442.

Change 181442 merged by jenkins-bot:
API: Add page properties to action=expandtemplates output

https://gerrit.wikimedia.org/r/181442

Anomie claimed this task.

There doesn't seem to be anything concrete left to do here.