Page MenuHomePhabricator

Special page DeadEndPages lists plain text pages as dead-end pages, even though they can't have links added to them
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue:

  • Create a sandbox page in the main namespace of a wiki.
  • Change its content model to plain text. You can do this via Special:ChangeContentModel/pagename.
  • Go to Special:DeadEndPages.

What happens?:

The page you created gets listed in Special:DeadEndPages.

What should have happened instead?:

This is technically a dead-end page as it has no wikilinks. On the other hand, this is essentially a feature that can't be changed. You can even add pseudolinks like [[Foo]] to the page, and they're not parsed.

Other thoughts:

If a page is something else that doesn't have any links from it for some reason, that would presumably have similar behaviour.

If a page has another content model, like Markdown or HTML, but actually contains links, would that throw a false positive?

So should this check that a page is wikitext before checking for no links? That seems to be the best response.

Related Objects

Event Timeline

Gourebimarc99 subscribed.

Hi,

I would like to work on this task (T396305) regarding the Special page DeadEndPages and the plain text pages issue.

Could you please assign this task to me?

Thank you!

Best regards,
Gourebimarc99

Hi,

I would like to work on this task (T396305) regarding the Special page DeadEndPages and the plain text pages issue.

Could you please assign this task to me?

Thank you!

Best regards,
Gourebimarc99

You've assigned it to yourself.

Good luck.

Hello,

I have submitted a patch via Gerrit to fix the issue with the Special:DeadEndPages listing plain text pages.

Here is the link to my Gerrit review:
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1154434

Please consider this correction.

Thank you!

Best regards,
Gourebimarc99

Change #1154475 had a related patch set uploaded (by Gourebimarc99; author: Gourebimarc99):

[mediawiki/core@master] Fix: make formatResult() public to comply with parent class

https://gerrit.wikimedia.org/r/1154475

Change #1154475 had a related patch set uploaded (by Gourebimarc99; author: Gourebimarc99):

[mediawiki/core@master] Fix T396305: Correct filtering for Dead-end pages

https://gerrit.wikimedia.org/r/1154475

Change #1154434 had a related patch set uploaded (by Aklapper; author: Gourebimarc99):

[mediawiki/core@master] Make formatResult() public to comply with parent class

https://gerrit.wikimedia.org/r/1154434

Dragoniez subscribed.
This comment was removed by Dragoniez.

Change #1215999 had a related patch set uploaded (by Dragoniez; author: Dragoniez):

[mediawiki/core@master] Exclude plaintext pages from Special:DeadendPages

https://gerrit.wikimedia.org/r/1215999

There are two possible approaches to this issue:

  1. Restrict the results to non-plaintext pages.
  2. Restrict the results to wikitext pages.

The first approach is guaranteed to work. The second is more aggressive in that it might unintentionally exclude pages using other content models (IMO).
I’d be happy to go with the second approach if we can be confident it won’t accidentally filter out content-model pages that should be included, but I’m not entirely sure that’s the case.

Maybe you can use wgTextModelsToParse for this to include wikitext pages or pages handled like that.
Not sure about the performance impact on the database query, because there is no index on page_content_model.
There is also T230607: stop using page_content_model

Maybe you can use wgTextModelsToParse for this to include wikitext pages or pages handled like that.
Not sure about the performance impact on the database query, because there is no index on page_content_model.
There is also T230607: stop using page_content_model

Thanks for the suggestion. Using wgTextModelsToParse sounds like a reasonable choice.

As for the performance impact, I tested this by inserting ~50k dummy rows into the page table (and some into pagelinks) in my local MW setup. Below are the results.

Query without page_content_model (current behaviour)
SELECT page_namespace, page_title
FROM page
LEFT JOIN pagelinks ON page_id = pl_from
WHERE pl_from IS NULL
  AND page_namespace = 0
  AND page_is_redirect = 0
ORDER BY page_title
LIMIT 51;
QUERY PLAN
|--SEARCH page USING INDEX page_name_title (page_namespace=?)
`--SEARCH pagelinks USING COVERING INDEX sqlite_autoindex_pagelinks_1 (pl_from=?)
Run Time: real 0.007 user 0.000000 sys 0.002826
Run Time: real 0.001 user 0.000000 sys 0.000782
Run Time: real 0.001 user 0.000000 sys 0.000562
Query with page_content_model != 'text'
SELECT page_namespace, page_title
FROM page
LEFT JOIN pagelinks ON page_id = pl_from
WHERE pl_from IS NULL
  AND page_namespace = 0
  AND page_is_redirect = 0
  AND page_content_model != 'text'
ORDER BY page_title
LIMIT 51;
QUERY PLAN
|--SEARCH page USING INDEX page_name_title (page_namespace=?)
`--SEARCH pagelinks USING COVERING INDEX sqlite_autoindex_pagelinks_1 (pl_from=?)
Run Time: real 0.005 user 0.000000 sys 0.002559
Run Time: real 0.001 user 0.000000 sys 0.000695
Run Time: real 0.001 user 0.000000 sys 0.000717

Overall, the query plan remains unchanged and the timing differences are within noise, so I believe the performance impact should be negligible.
In the meanwhile, I'm starting to wonder.... Is this worth fixing at all, if the content namespaces aren't really expected to contain plaintext pages?

In the meanwhile, I'm starting to wonder.... Is this worth fixing at all, if the content namespaces aren't really expected to contain plaintext pages?

Hi @Dragoniez — thanks for looking at this.

I think this is a legitimate question. To try to answer it, I suggest that:

  1. MediaWiki should be able to handle non-wikitext content models if the site admins wish to set up their wiki that way. For example, XML or Markdown, or something exotic.
  2. The best way to accommodate (1) is to ensure that plain text pages are handled properly, as that's then a base for any other model.

Specifically regarding wrong content model in the wrong namespace, I'd say that's fundamentally a different issue from lack of wikilinks. One which could be better controlled with more restrictions on changing content models/moving them from one namespace to another. But also one which could be reported on, but I think that would be a different special page.