Page MenuHomePhabricator

Fix base href and red link rewriter to handle new Parsoid relative links
Closed, ResolvedPublic

Description

We need to update our code for T72743: Parsoid base URL should be independent of page. Symptom is that page points to the wrong URL. E.g. first version of header at https://ca.wikipedia.org/wiki/Viquip%C3%A8dia:La_taverna/Tecnicismes points to https://ca.wikipedia.org/wiki/Viquip%C3%A8dia:La_taverna/Viquip%C3%A8dia:La_taverna/Tecnicismes/Arxius_2 .

C. Scott provided some help in IRC:

[11/13/14 16:53] <cscott> superm401: if you actually properly resolve the given URL with the given base HREF, your pages will continue to work both before and after

I don't know if we preserve old base hrefs given by parsoid though.

He also said:

[11/13/14 16:54] <cscott> superm401: and if you ignore base href and just replace(/([.][.]?\/)*/, '') on your hrefs to get article titles, your pages will also continue to work both before and after

Since I think we're doing some server-side processing, I think the second one should be doable.

Event Timeline

Mattflaschen-WMF raised the priority of this task from to High.
Mattflaschen-WMF updated the task description. (Show Details)
Mattflaschen-WMF changed Security from none to None.
Mattflaschen-WMF added subscribers: Unknown Object (MLST), Quiddity, DannyH and 7 others.

Now internal links work! Except this case:

Change 179007 had a related patch set uploaded (by EBernhardson):
Gracefully handle non-ascii hrefs in redlinker

https://gerrit.wikimedia.org/r/179007

Patch-For-Review

Change 179024 had a related patch set uploaded (by EBernhardson):
Add ConfirmEdit dependency to Flow phpunit

https://gerrit.wikimedia.org/r/179024

Patch-For-Review

Change 179024 merged by jenkins-bot:
Add ConfirmEdit dependency to Flow phpunit

https://gerrit.wikimedia.org/r/179024

The problem looks to be related to non-ascii in the title, flow was not appropriately calling urldecode on the value taken from an href before passing it into the Title class. This resulted in the original parsoid anchors being output instead of the rewritten anchors from Linker::link.

We can also look into converting red linker back into only processing red links. It was switched to handle all links instead of only non-existant due to issues we had with subpage links, but the changes to parsoid anchor output since then should have fixed that.

Mattflaschen-WMF renamed this task from Fix base href to handle new Parsoid relative links to Fix base href and red link rewriter to handle new Parsoid relative links.Dec 12 2014, 7:02 PM

Change 179512 had a related patch set uploaded (by Mattflaschen):
Dont output base hrefs

https://gerrit.wikimedia.org/r/179512

Patch-For-Review

Just for the record, there's one more patch possibly coming down the pipe here:

Currently Parsoid emits a leading ./ for every link, because otherwise browsers would try to parse File:Foo.jpg as the url *protocol* file: (ie, instead of http). Initial colons are special in urls. So currently we emit <a href="./File:Foo.jpg"> -- the slash ensures that the colon isn't misinterpreted as a protocol marker.

We are considering eventually URL-encoding these initial colons (which don't otherwise need to be URL-encoded) so that links to File:Foo.jpg would become <a href="File%3AFoo.jpg">. This would make it even easier for folks to consume our HTML -- you just need to URL-decode the href to get the article title -- at the cost of making it a little less pretty for humans to read.

So if you are feeling forward thinking you might want to double-check that your href-parsing will handle File%3A:Foo.jpg just as well as ./File:Foo.jpg.

So if you are feeling forward thinking you might want to double-check that your href-parsing will handle File%3A:Foo.jpg just as well as ./File:Foo.jpg.

Thanks for the heads-up. Erik's fix (https://gerrit.wikimedia.org/r/#/c/179007/) adds a urldecode which is required anyway for non-ASCII characters, but I think will address this too.

Change 179007 merged by jenkins-bot:
Gracefully handle non-ascii hrefs in redlinker

https://gerrit.wikimedia.org/r/179007

Thanks! When do you expect this change to be deployed in ca.wiki?

Change 179512 merged by jenkins-bot:
Dont output base hrefs

https://gerrit.wikimedia.org/r/179512

In T78090#849953, @Qgil wrote:

Thanks! When do you expect this change to be deployed in ca.wiki?

It should be deployed in tomorrow's release train, as it's in 1.25wmf12. (I asked in IRC)

Change 180303 had a related patch set uploaded (by EBernhardson):
Gracefully handle non-ascii hrefs in redlinker

https://gerrit.wikimedia.org/r/180303

Patch-For-Review

Change 180303 merged by jenkins-bot:
Gracefully handle non-ascii hrefs in redlinker

https://gerrit.wikimedia.org/r/180303