Page MenuHomePhabricator

hywiki: some doi links in refs render with https with legacy parser and http with Parsoid
Open, Needs TriagePublicBUG REPORT

Description

Check Parsoid vs legacy. The first link in the first ref which is an extlink has a http:// protocol in Parsoid HTML and https:// protocol in legacy HTML.

I haven't investigated the reason - but likely either a difference in the output of the Cite journal template OR some functionality in Cite extension that is different in Parsoid. It is likely the template.

Event Timeline

I would guess https://www.mediawiki.org/wiki/Extension:SecureLinkFixer , which is deployed on WMF wikis, doesn't work with Parsoid.

@Legoktm says: "it's using whatever Linker hook for changing external URLs. presumably parsoid should call that or SLF could just do a post-processing DOM pass I guess.". One of us on CTT will poke around .. should be a straightforward fix.

I don't have a strong opinion on how this is implemented (really I haven't thought about it), I do want to flag one potential area for improvement though. The signature for the hook is:

public static function onLinkerMakeExternalLink( &$url, &$text, &$link, &$attribs, $linktype ) { ... }

where $url is a string, which means callers often need to parse the URL, do something with it, and then re-assemble it. On Wikimedia wikis this happens twice, as both SecureLinkFixer and Wikibase use the hook and call parse_url.

If this ends up being reimplemented as a new hook, I would suggest having the hook provide a pre-parsed version of the URL so it can be examined/manipulated in structured form and then re-assembled after the hooks are done.

Parsoid doesn't use the Linker at all, so this is likely to be implemented as a postprocessing pass; as such there's no good way to avoid re-parsing the href from the <a> tag each time. I probably need the new hook that Isabelle is going to add to the OutputTransform pipeline for DiscussionTools, though, so I'm waiting for that to land.

There is a DOM postprocessing pass supported by the Parsoid Extension API and this should be implemented as part of that, not the OutputTransformPipeline because we want canonical HTML served by the REST API to have https, not just read views.