Page MenuHomePhabricator

incorrect sitelink deletion behavior when page is moved to excluded (unsupported) namespace with "suppress redirect" option
Closed, ResolvedPublic8 Estimated Story Points

Description

Problem:
Wikibase handles some tasks around sitelink maintenance automatically like changing the sitelink when an article is moved or removing the sitelink when an article is deleted. This fails in some specific cases and leaves behind a sitelink pointing to a non-existing article. We should fix this and remove the sitelinks in these cases.

To reproduce:

  1. have an Item with a sitelink to an article on a client wiki
  2. move the article with the "suppress redirect" option to a namespace that is not supported for sitelinks (e.g. User: and Draft: for Wikidata)
  3. see sitelink not being changed/removed

Links to existing discussions/reports:

Acceptance criteria:

  • The sitelink is removed from the Item if the article is moved to an unsupported namespace with the "suppress redirect" option

Notes:

  • Setting for unsupported namespaces is $wgWBClientSettings['excludeNamespaces'] (for local testing, = [ NS_USER ] or = [ NS_PROJECT ] may be used)

Expected behavior for each case:

page is moved to a non-excluded namespacepage is moved to an excluded namespace
suppress redirect is checkedsitelink on the Item is updated to the new targetsitelink is removed from the Item
suppress redirect is not checkedsitelink on the Item is updated to the new targetsitelink is untouched

Event Timeline

I'm not sure this is the best way to frame the problem. The acceptance criteria don't cover the case where an article gets deleted but the redirect stays.

I think the core problem is that it's currently unknown to Wikidata whether a sitelink points to a normal page, a redirect or to a nonexisting page. Ideally, it would be both visible when visiting an item and looking at the list of sitelinks and via SPARQL. Then a normal Wikidata bot could clean up.

It's worth noting that not every Wikipedia admin who deletes a page necessarily has the right to edit the respective Wikidata item as he might not be autoconfirmed on Wikidata and the page is protected. As such the user account can't remove links in all cases when a bot on Wikidata could.

Reopening this as it seems people disagree with the merge.

I'm not sure this is the best way to frame the problem. The acceptance criteria don't cover the case where an article gets deleted but the redirect stays.

Hmmmm. Is it even possible to delete an article and leave a redirect? Where would it be?

I think the core problem is that it's currently unknown to Wikidata whether a sitelink points to a normal page, a redirect or to a nonexisting page. Ideally, it would be both visible when visiting an item and looking at the list of sitelinks and via SPARQL. Then a normal Wikidata bot could clean up.

It's worth noting that not every Wikipedia admin who deletes a page necessarily has the right to edit the respective Wikidata item as he might not be autoconfirmed on Wikidata and the page is protected. As such the user account can't remove links in all cases when a bot on Wikidata could.

Yeah agreed. That's an issue but my understanding is that that's unrelated to the issue of suppressing the redirect when moving to an unsupported namespace.

Lydia_Pintscher renamed this task from incorrect sitelink deletion behavior when page is moved to non-supported namespace with "suppress redirect" option to incorrect sitelink deletion behavior when page is moved to excluded (unsupported) namespace with "suppress redirect" option.Nov 10 2020, 2:12 PM
Lydia_Pintscher updated the task description. (Show Details)

@Lydia_Pintscher task inspection question:
Having looked into this our assumption is that the checkbox to suppress redirects should in fact not play a role and only the fact of a namespace being supported or not influences whether the sitelink gets updated or deleted. Can you confirm this?

This is how I was thinking it should be:

page is moved to an non-excluded namespacepage is moved to an excluded namespace
suppress redirect is checkedsitelink on the Item is updated to the new targetsitelink is removed from the Item
suppress redirect is not checkedsitelink on the Item is updated to the new targetsitelink is untouched

But now I'm questioning myself on the bottom right cell.

Thoughts from others?

Personally, I would like to see us going in a different direction - we should stop excluding namespaces. For the 'Draft' namespace in particular, I don't like it, but it's unlikely to go away any time soon. So if we want to encourage people to use Wikidata information in articles, without having to constantly define the QID, then we should allow sitelinks to the draft articles. They would only be temporary after all, since enwp seems to periodically delete drafts.

But to answer @Lydia_Pintscher's specific question: I think it's fine to leave the sitelink to a redirect. If the draft gets deleted, then the redirect will also be deleted, and the sitelink removed. Probably the same will also apply to other excluded namespaces. The important thing is that sitelinks are removed if the page they are linking to no longer exists at all.

I'm not sure this is the best way to frame the problem. The acceptance criteria don't cover the case where an article gets deleted but the redirect stays.

Hmmmm. Is it even possible to delete an article and leave a redirect? Where would it be?

Yes. For example, I created https://www.wikidata.org/wiki/User:Mike_Peel/Redirect_sandbox then moved it to https://www.wikidata.org/wiki/User:Mike_Peel/Redirect_sandbox_example and deleted it. https://www.wikidata.org/wiki/User:Mike_Peel/Redirect_sandbox still exists as a redirect to a deleted page. Normally a bot would then delete the redirect, but it's not done automatically.

page is moved to an non-excluded namespacepage is moved to an excluded namespace
suppress redirect is checkedsitelink on the Item is updated to the new targetsitelink is removed from the Item
suppress redirect is not checkedsitelink on the Item is updated to the new targetsitelink is untouched

But now I'm questioning myself on the bottom right cell.

I agree with all four. Bottom right is indeed the complicated one; therein, the untouched sitelink is now a redirect and it points to a target that might not be very useful content-wise (as the page would not have been moved otherwise). Yet, I think the software should reliably manage that only existing pages are connected, not necessarily only useful pages.

Once redirects are being permitted as regular sitelinks (without that ugly hack), we could also look into ways to add the already available redirect badges to those sitelinks. This would make the outcome much more managable for the community which then could decide whether the remaining redirect sitelink is actually useful, and not just existing.

Thanks folks :) I'll update the ticket description accordingly.

@Mike_Peel about getting rid of the concept of unsupported namespaces altogether: I'm not opposed to it but it was introduced back then based on community demand IIRC so would probably need some community discussion to reverse.

@Lydia_Pintscher I started looking into the message Wikibase shows the user after the page move is complete, but this led me down a terrible rabbit hole (T268135). Can I assume that updating that message is not part of the AC here, and we’ll eventually tackle that task separately? (I think this would mean that Wikibase would still tell the user that “your move should now be reflected in the item language link, we ask you to check” whether or not the sitelink was deleted.)

Change 641733 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Delete sitelink when moving to excluded namespace without redirect

https://gerrit.wikimedia.org/r/641733

@Lydia_Pintscher I started looking into the message Wikibase shows the user after the page move is complete, but this led me down a terrible rabbit hole (T268135). Can I assume that updating that message is not part of the AC here, and we’ll eventually tackle that task separately? (I think this would mean that Wikibase would still tell the user that “your move should now be reflected in the item language link, we ask you to check” whether or not the sitelink was deleted.)

Yeah I think it's ok to keep it out of this task.

Change 643906 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Restructure logic in UpdateRepoHookHandler

https://gerrit.wikimedia.org/r/643906

Change 641733 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Delete sitelink when moving to excluded namespace without redirect

https://gerrit.wikimedia.org/r/641733

Change 643906 abandoned by Lucas Werkmeister (WMDE):
[mediawiki/extensions/Wikibase@master] Restructure logic in UpdateRepoHookHandler

Reason:
I prefer the existing code

https://gerrit.wikimedia.org/r/643906

For verification, NS_USER is an excluded namespace on every production wiki (along with some less useful ones like LiquidThread stuff).