Page MenuHomePhabricator

tgwiki sitelinks remain on Wikidata after the article has been deleted
Closed, ResolvedPublic

Description

When articles are deleted on tgwiki, then the sitelinks to them from Wikidata should be automatically removed. However, this doesn't seem to be happening reliably.

A specific example is at https://www.wikidata.org/wiki/Q30023365 - where https://tg.wikipedia.org/wiki/%D0%9A%D0%B0%D0%BC%D1%80%D0%B0%D0%BD_%D0%90%D0%BB%D0%B8%D0%B9%D0%B5%D0%B2 was deleted in 2017.

More examples are listed at https://www.wikidata.org/wiki/User:Mike_Peel/tgwiki_sitelink_problems

Event Timeline

This looks like its only happening on tgwiki, i checked on Wikipedia Albanian language and was automatically removed: https://www.wikidata.org/w/index.php?title=Q7455202&diff=722408289&oldid=722408219

There are quite a few examples now on my user subpage, but from what I can see they are all deletions by VASHGIRD on 17-18 October 2017. Perhaps there was a specific issue at that time?

Should I leave the examples live, or can I remove them from the wikidata entries?

I would suggest to do a test, ask someone from the administrators to create a page, link it to wikidata, and then delete it. To see if this was a specific issue at that time. And then you can remove the entries and we close the ticket here.

I would suggest to do a test, ask someone from the administrators to create a page, link it to wikidata, and then delete it. To see if this was a specific issue at that time. And then you can remove the entries and we close the ticket here.

Presumably this test happens quite often whenever tgwiki admins delete a page, and there must be some way to access the removal history of sitelinks for a particular project... I've asked at:
https://www.wikidata.org/wiki/Wikidata:Request_a_query#Recently_removed_sitelinks_to_a_particular_project
in case anyone there can make a suggestion for how to query for recently removed sitelinks to tgwiki.

The query approach didn't find any, but looking through VASHGIRD's contributions on wikidata found several that were deleted from tgwiki and the sitelink automatically removed from wikidata, e.g. at Q9729338 and Q13201301 . So this must have been a temporary glitch. I'll work on removing the broken links, then this can be closed.

Thank you @Mike_Peel. Let us know when you're finished

I've submitted a bot request:
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Pi_bot_9
It will take a few days to be approved, so I'll probably run this over the weekend. The bot will be quite general, so should this happen again in the future (with tgwiki or elsewhere) then I'll also be able to fix those too.

Hi @Mike_Peel how are the edits going?

It's turned out to be a bigger problem than I expected. So far the bot's removed 18,000 sitelinks, logs at:
https://www.wikidata.org/wiki/User:Mike_Peel/tgwiki_sitelink_fixes
https://www.wikidata.org/wiki/User:Mike_Peel/tgwiki_sitelink_fixes2

I've also been having connection issues with the script, so I've had to restart it a number of times. It's now running on a more stable internet connection to try to finish this.

However, there are more cases in other languages - Robby's found a few in enwiki, see:
https://www.wikidata.org/wiki/User_talk:Mike_Peel#Link_to_deleted_page_on_en-wikipedia_remains_here_on_wikidata

So I think some investigation needs to be done into this - at least there should be some sort of constraint violation report for when sitelinks. What I'm doing will not scale to checking all sitelinks across all languages...

@Lydia_Pintscher @Addshore Matěj Suchánek wrote a Quarry query that looks for bad sitelinks, see:
https://www.wikidata.org/wiki/Wikidata:Request_a_query#Identifying_interwiki_links_that_no_longer_exist

It looks like there are around 13,000 bad enwiki sitelinks, see the query results at:
https://quarry.wmflabs.org/query/29907

These don't seem to be as clear-cut as the bulk deletion on tgwiki is - different usernames and different dates. Can you / someone on the Wikidata tech team please have a look into this?

I've been re-running my bot code today, and it's found a few more from 2017 on tgwiki, but nothing more recent. So I'm closing this for now. Note that there are still cases on enwp, but I'm not sure that will be resolved in this ticket anyway.