Page MenuHomePhabricator

Incorrect page parsing in DiscussionTools due to red links in Parsoid HTML – can't reply, bad permalinks
Closed, ResolvedPublic

Description

I saw a weird test failure today:

15:36:11 1) MediaWiki\Extension\DiscussionTools\Tests\ThreadItemStoreTest::testInsertThreadItems with data set #0 ('cases/ThreadItemStore/1simple-example')
15:36:11 Failed asserting that two arrays are equal.
15:36:11 --- Expected
15:36:11 +++ Actual
15:36:11 @@ @@
15:36:11      'discussiontools_items' => Array (
15:36:11          0 => Array (
15:36:11              'it_id' => '1'
15:36:11 -            'it_itemname' => 'h-X-20220720010100'
15:36:11 +            'it_itemname' => 'h-X?action=edit&redlink=1-20220720010100'
15:36:11              'it_timestamp' => null
15:36:11              'it_actor' => null
15:36:11          )
15:36:11          1 => Array (
15:36:11              'it_id' => '2'
15:36:11 -            'it_itemname' => 'c-X-20220720010100'
15:36:11 +            'it_itemname' => 'c-X?action=edit&redlink=1-20220720010100'
15:36:11              'it_timestamp' => '20220720010100'
15:36:11 -            'it_actor' => '2'
15:36:11 +            'it_actor' => null
15:36:11          )
...

This indicates that we're storing bogus data for permalinks. It looks like Parsoid HTML now has red links, and we don't handle that.

As a result of this, it's also impossible to reply to comments where the user and talk page are red links.

Event Timeline

matmarex triaged this task as Unbreak Now! priority.

Option 1: Maybe you need to request "no redlinks" when discussion tools asks Parsoid for the HTML?
Option 2: Maybe includes/utils/UrlUtils.php has the functions you need to properly parse the URL?

Option 1 doesn't exist right now, i.e. Parsoid doesn't implement that ... Since this is a train blocker, this will need fixing without that. Or, we need to revert Parsoid to the previous version if that is the path we want to go now.

Change 861900 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] Only match article path until first '?' when parsing links

https://gerrit.wikimedia.org/r/861900

In that patch I'm just changing some regexps to account for the URL query being present.

In the longer term, we should probably switch our URL parsing to a library implementing the URL Standard (https://url.spec.whatwg.org/). Here's one that looks promising: https://github.com/esperecyan/url. (I can't promise that we'll actually work on this though.)

Change 861900 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Only match article path until first '?' when parsing links

https://gerrit.wikimedia.org/r/861900

Option 1 doesn't exist right now, i.e. Parsoid doesn't implement that ... Since this is a train blocker, this will need fixing without that. Or, we need to revert Parsoid to the previous version if that is the path we want to go now.

We probably want to do that shortly, since redlinks are "read view" not "edit view". We don't have that toggle yet but need it.

In that patch I'm just changing some regexps to account for the URL query being present.

In the longer term, we should probably switch our URL parsing to a library implementing the URL Standard (https://url.spec.whatwg.org/). Here's one that looks promising: https://github.com/esperecyan/url. (I can't promise that we'll actually work on this though.)

UrlUtils::parse(string $url) from core doesn't do what you want?

UrlUtils::parse(string $url) from core doesn't do what you want?

It doesn't handle relative URLs, like ./Foo or /wiki/Foo.

Change 861475 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@wmf/1.40.0-wmf.12] Only match article path until first '?' when parsing links

https://gerrit.wikimedia.org/r/861475

Change 861475 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@wmf/1.40.0-wmf.12] Only match article path until first '?' when parsing links

https://gerrit.wikimedia.org/r/861475

(I'm resolving the task myself to remove it as a deployment blocker.)

This will still break if the wiki doesn’t use clean URLs (I assume Parsoid can handle that?), as those contain the question mark before the article title, not after it. (For example, on my local wiki, $wgArticlePath is /wiki/index.php?title=$1; a red link would look like /wiki/index.php?title=Foo&action=edit&redlink=1.) Of course, fixing it is not a train blocker, but it should be fixed either here (unresolving this task and removing it as a train blocker) or in a followup task.

The code checks for the 'title' query parameter first (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/861900/2/includes/CommentUtils.php#484), so I think that case will actually work correctly.

Another thing to be aware of (... which I wasn't up until yesterday): you can have ? in a fragment/anchor. So if you do NOT have a redlink, but you have a ? in your fragment, you'll lose the end of said fragment, I think.

The code checks for the 'title' query parameter first (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/861900/2/includes/CommentUtils.php#484), so I think that case will actually work correctly.

I see. Yes, I it should work in that case.