Page MenuHomePhabricator

VisualEditor automatically converts backslashes to forward slashes in wikilinks
Open, Needs TriagePublicBUG REPORT

Description

A minimal example for the problem is the wikitext

[[o\i]]

Steps to replicate the issue:

What happens?:

  • VisualEditor converts the test case to
[[o/i|o\i]]

What should have happened instead?:

  • No change for "[[o\i]]"

Impact:

  • Links are silently corrupted without user action

Other information :

Event Timeline

zoe subscribed.

I think this is Parsoid. I've clipped out the HTML that ve is building its model from:

<a rel="mw:WikiLink" href="./O/i" title="O/i" class="mw-redirect" id="mwAw">o\\i</a>

For a fuller reference, here's what VE is reading on my machine:

<!DOCTYPE html>\n<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head prefix="mwr: https://de.wikipedia.org/wiki/Special:Redirect/"><meta charset="utf-8"/><meta property="mw:pageId" content="13752163"/><meta property="mw:pageNamespace" content="2"/><meta property="mw:revisionSHA1" content="d13caf4bd8cd5e774f11e09490f120829cad3cf7"/><meta property="mw:htmlVersion" content="2.8.0"/><meta property="mw:html:version" content="2.8.0"/><link rel="dc:isVersionOf" href="//de.wikipedia.org/wiki/Benutzer%3AKallichore/VE"/><base href="//de.wikipedia.org/wiki/"/><title>Benutzer:Kallichore/VE</title><link rel="stylesheet" href="/w/load.php?lang=de&amp;modules=mediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Csite.styles&amp;only=styles&amp;skin=vector"/><meta http-equiv="content-language" content="de"/><meta http-equiv="vary" content="Accept"/></head><body lang="de" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" data-mw-parsoid-version="0.23.0.0-alpha12" data-mw-html-version="2.8.0" id="mwAA"><section data-mw-section-id="0" id="mwAQ"><p id="mwAg"><a rel="mw:WikiLink" href="./O/i" title="O/i" class="mw-redirect" id="mwAw">o\\i</a></p>\n</section></body></html>

I've tagged in the team for Parsoid, who hopefully can help more.

The answer might be browsers.

Try http:\\www.google.com\search?q=example in your browser, for example.

  • Going to a /w/index.php?Title=O\I behaves
  • Going to /wiki/O\I translates to /

Also investigating from the VE side:

  • Raw fetch() behaves
  • $.ajax() behaves

BROWSERS

a = document.createElement('a');
a.setAttribute('href', './O\\i');
[ a.getAttribute('href'), a.href ];
// (2) ['./O\\i', 'https://de.wikipedia.org/w/O/i']

I think this might be best handled by having Parsoid escape \ to %5C. If you visit the read page with parsoid turned on then [[o\i]] is also rendered incorrectly.

The fact that it changes the behavior between Parsoid and legacy (and breaks Parsoid read views) is indeed a sign that we CTT should do something there.

It's not entirely clear to me if it should be a full escaping of \ to %5C on all hrefs (internal *and* external), and what to do with query strings (and what legacy does with these, as well). It's also not entirely clear to me if there's any risk associated with fully escaping the href if it's not necessary (I'd expect not, but I'm not confident there's not a surprise in there), or if there's a place where we'd rely on transforming \ to / in VE, or or or.

(It feels like it'd be fine, but I'm always a bit nervous about URL transformations). Any thoughts?

Change #1235157 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] URL-encode backslashes in title hrefs

https://gerrit.wikimedia.org/r/1235157

This should be safe, patch above: we already have a "sanitize link" method, this just adds one character to it. We don't use the href for round tripping, so adding a bit of URL encoding is relatively harmless.

Change #1235157 merged by jenkins-bot:

[mediawiki/services/parsoid@master] URL-encode backslashes in title hrefs

https://gerrit.wikimedia.org/r/1235157

Change #1235865 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a14

https://gerrit.wikimedia.org/r/1235865

Change #1235865 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a14

https://gerrit.wikimedia.org/r/1235865

I verified on test.wikipedia.org with 1.46.0-wmf.14 that the problem with the minimal example has been fixed.

However, I also found that [[o%5Ci]] becomes [[o\i]] after "round tripping", while [[Hello%20World]] remains unchanged. Is this intentional?