Page MenuHomePhabricator

Remove Moldovan Wikipedia from WDQS
Closed, ResolvedPublic

Description

Moldovan Wikipedia was deleted in 2010 (see proposal and T169450), but WDQS still knows the triple <https://mo.wikipedia.org/> wikibase:wikiGroup "wikipedia". This should probably be removed.

(Note that the <https://mo.wikipedia.org/> IRI also exists in two other triples, as the official website of Q3568049. This is unrelated and doesn’t need to be changed.)

Related Objects

Event Timeline

Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptMar 2 2018, 12:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Strainu added a subscriber: Strainu.Mar 2 2018, 2:16 PM

I agree this should be removed.
Links, label and description maybe could be kept, but at least doing a interwiki sitelink to a wiki that doesn't exists should not be possible.

thiemowmde triaged this task as Low priority.Mar 2 2018, 3:48 PM
thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.
thiemowmde added subscribers: daniel, thiemowmde, Addshore.

From what I see, the following conditions must be met for this triple to appear in a dump:

The actual bug here seems to be that the Wikibase code still thinks "mowiki" is a valid wiki, while it is a redirect in reality. You can see that "mo" can still be found when trying to add a sitelink, but doesn't work properly.

There are 10 (!) calls to \SiteLookup::getSites in the Wikibase code, and they all have this issue. Unfortunately it seems the sites table does not contain the information if a domain is a redirect. Where is this stored?

There used to be sitelinks for this wiki until relatively recently, when @VIGNERON removed them. I assume that when the sitelinks were removed, the WDQS updater removed those triples from the query service, but never removed the wikibase:wikiGroup triple for it, even though I suspect that it no longer would appear in a full Wikibase dump after the last sitelink was removed.

Yes, @Lucas_Werkmeister_WMDE is completely correct - when sitelinks has been removed, the wikiGroup statemens remained since they were present before in the dump, and they do not belong to any specific item. I can manually delete the orphan wikiGroup statements, but I wonder - does it hurt anything to leave them alone? Is this breaking something? We don't have automatic detection of such cases because wiki deletion is an extremely rare event.

I can't do much about the mowiki appearing in Wikidata interface sitelinks widget - I'd suggest making separate task for it.

IMHO, a manual deletion would be the best course of action (unless it’s very cumbersome?). Wiki deletion shouldn’t be common enough to require any automatic detection in the updater, but I don’t think we should wait for the next full dump reload of the query service either to fix this problem – the query service is out of sync with Wikidata, and we can fix it (well, this part…), so why not do it?

Shouldn't we also remove mo from supported monolingual language codes?

Shouldn't we also remove mo from supported monolingual language codes?

Unclear, I removed the 337 sitelinks as they were clearly going nowhere but I didn't touch the label or description (which are more numerous and a bit messy, some are written in Cyrl some in Latn :/ ).

FYI, there was a discussion on the langcom mailing list mid-February : https://lists.wikimedia.org/pipermail/langcom/2018-February/001934.html ; we probably should ask them for a clear decision.

@VIGNERON Well, I mean that how to remove the mo support from


?

Languages are non 1-1 to wikis, but looking that mo language code is retired (https://en.wikipedia.org/wiki/Moldovan_language) it mаy make sense to retire it from Wikimedia too, at least as an option for new data.

Smalyshev closed this task as Resolved.Mar 22 2018, 9:46 PM
Smalyshev claimed this task.