Page MenuHomePhabricator

displaytitle page props should contain html in representation
Open, MediumPublic

Description

This information is available in data-parsoid,

Π01 class

<meta property="mw:PageProp/displaytitle" content="Π01 class" data-parsoid='{"src":"{{DISPLAYTITLE:&amp;Pi;&lt;sup>0&lt;/sup>&lt;sub style=\"margin-left:-0.5em\">1&lt;/sub> class}}","a":{"content":"Π01 class"},"sa":{"content":"&amp;Pi;&lt;sup>0&lt;/sup>&lt;sub style=\"margin-left:-0.5em\">1&lt;/sub> class"},"dsr":[0,78,null,null]}'/>

However, data-parsoid is stripped for template generated content.

'Til Death

<meta property="mw:PageProp/displaytitle" content="'Til Death" about="#mwt4"/>

See the expected results in,
https://en.wikipedia.org/w/api.php?action=query&prop=info&inprop=displaytitle&titles=1983+World+Artistic+Gymnastics+Championships%7C%27Til+Death%7C%CE%A001+class&format=jsonfm

The current content is only useful for stuff like,

IPhone

<meta property="mw:PageProp/displaytitle" content="iPhone" about="#mwt3"/>

Event Timeline

Arlolra raised the priority of this task from to Medium.
Arlolra updated the task description. (Show Details)
Arlolra added a project: Parsoid.
Arlolra added subscribers: Arlolra, Bianjiang.

Any semantic information that is needed in the HTML should be surfaced out of data-parsoid even if it is present there .. since data-parsoid is considered private and we should retain the freedom of changing its format / contents without having to worry about breaking parsoid html clients.

Do we really want to include styling in the value here? It's just being used for italics in the example, but I'm not completely convinced that <meta property="mw:PageProp/displaytitle" content="<i>'Til Death</i>"> is actually helpful here. If we were to include HTML, we'd want to include a small small subset of HTML, not blindly copy through any <span> tags corresponding to internal template markers, etc.

Stripping HTML is "as designed" here... although we can discuss whether the design should be changed.

i was using "extracting displaytitle" an an example for my statement "we constantly have backfill requirements, with more and more development happening around APIs".

From information extraction point of view, a real hard problem is for "C#" language:

https://en.wikipedia.org/w/api.php?action=query&prop=info&inprop=displaytitle&titles=C_Sharp_(programming_language)&format=jsonfm

where we do expect to get a string "C#", as it is mentioned in the {{Correct title}}. It would be better if Parsoid can help.

@cscott:
For the styling, from articles I've seen so far, style by itself is often implies semantic.
e.g. according to wikipedia's style guide [1], most italic title implies the underlying article is a "major work", instead of a general name. So it's better to have a way keep such information (e.g. if there is concern on using <i> in a plain string, maybe you can introduce an additional field (is_italic) to save it.

[1] https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Titles#Major_works

This is required for us to be able to preview the title of a document in VE, for example if I add {{italic title}} to my page, and want to preview the output using Parsoid...