Page MenuHomePhabricator

Parsoid: displaytitle HTML now appearing in <title> element rather than page title
Closed, ResolvedPublic

Description

Visit https://en.wikipedia.org/api/rest_v1/page/html/...And_Justice_for_All_(film) and in the HTML you'll see <title>&lt;i>...And Justice for All&lt;/i> (film)</title>.

Previously the actual page title would be in the title element, not the displaytitle's HTML (not really sure when this broke). Aside from it being a regression, using the displaytitle is a bad fit because you can't have HTML tags inside the title (per https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title)

If this is the intended behavior, what's the correct way to get a page's title out of its HTML? Parse it from <link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/...And_Justice_for_All_(film)"/>?

Event Timeline

Legoktm renamed this task from displaytitle HTML now appearing in <title> element rather than page title to Parsoid: displaytitle HTML now appearing in <title> element rather than page title.Dec 5 2022, 2:28 AM

not really sure when this broke

Maybe with the port to PHP? See T294621

Some other open displaytitle tasks are T293514 and T122976

~~~I can't find the code in Parsoid which actually sets the title element (!).~~~ It was originally thought that the title element would be a good way to convey the title metadata, but as @Legoktm points out there's some additional sanitization which happens with the value on its way to title which complicates this. This isn't really an issue for 'modern' parsoid since we communicate the title via ParserOutput/ContentMetadataCollector metadata, not the title element. The actual page title (as opposed to displaytitle) is passed in a <link rel="dc:isVersionOf"> element in the head as @Legoktm mentioned. Perhaps we should be passing displaytitle that way as well.

Change #1068802 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Parsoid <title> element should be actual title, not displaytitle

https://gerrit.wikimedia.org/r/1068802

Change #1068802 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Parsoid <title> element should be actual title, not displaytitle

https://gerrit.wikimedia.org/r/1068802

Change #1070043 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a19

https://gerrit.wikimedia.org/r/1070043

Change #1070051 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a19

https://gerrit.wikimedia.org/r/1070051

Change #1070043 abandoned by Jgiannelos:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a19

https://gerrit.wikimedia.org/r/1070043

Change #1070051 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a19

https://gerrit.wikimedia.org/r/1070051