Page MenuHomePhabricator

Previewing a link to a File in Flow's VE-mode, opens a broken destination due to /w/
Closed, ResolvedPublic0 Estimated Story Points

Event Timeline

Quiddity raised the priority of this task from to Needs Triage.
Quiddity updated the task description. (Show Details)
Quiddity subscribed.
EBernhardson subscribed.

This looks to be a base href issue, this might also be an issue on subpages. We have a href fixer that adjusts things as necessary in the parsoid output if needed.

Can you dump the Parsoid output? I'd be surprised if this was a Parsoid bug; double-checking parsoid output for both [[File:...]] and [[:File:...]] looks fine to be (relative link with appropriate base href). Perhaps somebody (Visual Editor? Flow?) is not paying attention to the base href Parsoid is providing?

In particular, it looks like the editing page is being served from /w/ -- but that is *not* the <base href> provided in the Parsoid output. You should be using the Parsoid-provided base href to resolve URLs, not the base href of the editing or flow page.

The content coming from parsoid should be fine, the issue for flow is that we render content from multiple pages onto the same page, so can't just arbitrarily lift one of the base href's into <head>. Instead for most content we have an output type called 'html-fixed', in this mode the parsoid output has the base href's pre-resolved (the html-fixed also applies redlinking and stripping of images rejected by wfIsBadImage). For VE editing we skip all that and give it the raw parsoid content, but that means the href's are not based in reality. The short of it is flow pages do not define a base href because not all the content is guarantee'd to have the same base hef afaiu.

The content coming from parsoid should be fine, the issue for flow is that we render content from multiple pages onto the same page, so can't just arbitrarily lift one of the base href's into <head>.

Now that Parsoid uses a consistent base href regardless of the page, e.g.:

<base href="//en.wikipedia.org/wiki/"/>

we could in theory do that. I didn't before since I was a little concerned that changing the base href in the <head> would break non-Flow content.

I forgot that we can't use the fixers for VE currently. That does complicate things a little.

VE for articles doesn't seem to use a top-level <base href> either, though they do have some code changing their internal documents' <base> attributes. I'll ask them to comment.

VE expects its input documents to have a <base> with a sensible href. It then performs minimal magic to make things work from there, but you shouldn't notice any of this happening as long as you provide VE with input HTML that has a sensible <base href>.

The magical things we perform are:

  1. If the input document has no <base> (common case: new page creation, input HTML is empty string), we interpolate one from wgArticlePath and write that in
  2. If the base href is relative or protocol-relative (Parsoid does this) and the browser we're in doesn't grok protocol-relative <base href> artificially created documents (works in Firefox, not in Chrome) we resolve the <base href> against the outer document's base URL. (In practice that means resolving // to http:// or https:// as appropriate.)
  3. In places where relative hrefs from the input HTML end up being used in the UI (e.g. link previews), we ensure that the HTML that ends up in the UI isn't a relative URL (because the browser would resolve that against the main document's base URL) but pre-resolved against the input's base URL. In practice that means that a preview of <a href="./Foo"> ends up being rendered as <a href="https://en.wikipedia.org/wiki/Foo"> because what if we're at /w/index.php or /wiki/OS/2 or whatever.

Unfortunately, magic things #1 and #2 are in ve.init.mw.Target. For #1 that's defensible (because interpolation from wgArticlePath is MW-specific) but for #2 it's not, and that code should be moved.

This is a reason to look at a shared base class that is MW-specific (so it can assume things like wgArticlePath), but doesn't assume the page model.

@Catrope, there is a similar bug in article VE. See updated description.

We probably want to stop passing body=true to Parsoid, so we get the <head> and can benefit from the <base href> in there.

body=true can also lead to data corruption if the first element is a <meta>:

document.documentElement.innerHTML = '<meta>foo'
"<meta>foo"
document.documentElement.innerHTML
"<head><meta></head><body>foo</body>"

This is a quirk in the HTML5 parsing spec, which causes <meta> elements to move to the <head> if not wrapped into a <body>. We should probably return body.outerHTML instead of .innerHTML.

It looks we may need to revisit e4ab381fff5da49acf21f8a0533517ed201bca5b, and extract the body contents as a ContentFixer.

Change 206468 had a related patch set uploaded (by Catrope):
Use VE's fixBase utility so link previews point to the right place

https://gerrit.wikimedia.org/r/206468

Change 206468 merged by jenkins-bot:
Use VE's fixBase utility so link previews point to the right place

https://gerrit.wikimedia.org/r/206468