Page MenuHomePhabricator

The new en.uncyclopedia copy off wikia is missing 1.5 million revisions
Open, Needs TriagePublic


Unclear if this is deleted or regular revisions or a combination of both. Unclear if this is because of Wikia or something with this particular wiki. Similar also happened with fr, but to a much less significant amount,<10%, as opposed to almost 25%.

We might want to find out why.

Event Timeline

Isarra created this task.Apr 11 2019, 8:58 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 11 2019, 8:58 PM

If you have the XML dump, you can use grep, gawk or something to count the number of revisions present on the XML, and compare with the dump (it should be the same)

Isarra added a comment.EditedApr 11 2019, 10:35 PM

Wikia's xml dumps are terrible. We tried that with he: and the xml dump was missing like 10k revisions already.

Carlb added a subscriber: Carlb.Apr 12 2019, 1:21 AM

[[Special:Statistics]] includes both live and deleted revisions in the "revisions since wiki inception" count; an XML dump will be missing every deleted revision, which pretty much ensures the numbers will not match.

In fact, Special:Statistics can count a number lower than the largest revision ID. If I move a page leaving a redirect, the number of edits is incremented by 1 but the number of revisions is incremented by 2 (the row in the page for the move, and the row on the redirect page). On the other hand, log actions that don't create new revisions, like deleting or restoring a page, each add 1 to the "number of edits of the wiki".

Without database access to wikia it's hard to tell what could be happening. Maybe the number of revision aren't even consecutive if an error caused an insert to the table to be rolled back. If the original was on a different host, maybe wikia screwed up during the initial transfer.

Best you can do is to select some pages and see if the number of revisions on the new wiki matches the number of revisions on the old one.

Well, even just comparing live revisions got via grabber vs live revisions from the dumps (this was what we compared with he) we've been getting some very weird numbers. I just don't even know what to make of it.