Page MenuHomePhabricator

list=alldeletedrevisions returns pageid=0 instead of ar_page_id
Open, LowestPublic

Description

I don't see any reason to return deleted revisions with pageid set to 0 when ar_page_id is non-zero. If there is a reason, another way to make ar_page_id visible through the API should be devised.

Event Timeline

Anomie subscribed.

The pageid being returned is the page ID, if any, associated with the deleted revision's title.

I see no point to returning ar_page_id. If you have an actual use case beyond "I want all the data", you'll have to describe it.

I'm writing a client software which will synchronize the content of a wiki into a local database. This is useful e.g. for caching of the data for bots which don't have direct access to the wiki's database. So I think that in the end, "I want all the data" is pretty good reason :-)

To be more specific, the synchronization should replicate the delete/undelete actions in the local database based on the log events to avoid expensive queries requesting the full history of deleted/undeleted pages. But without ar_page_id being available, it is kind of difficult and I'm still not sure if my code for undeletion always does the right thing with arbitrary delay between synchronizations.

That seems like a very inefficient and error-prone way to try to back up a wiki, even if you did have the ar_page_id.

It's not intended as a backup solution, but caching for bots and other client tools. And since incremental dumps are not supported, it's actually much more efficient than downloading the full dumps again and again. Plus, the synchronization period is controlled by the client, not by the administrator releasing the dumps.

Since the system actually allowed me to reopen the task myself, I just did that because the extra information requested by @Anomie has been provided.

Anomie triaged this task as Lowest priority.Feb 12 2018, 5:04 PM
Anomie moved this task from Unsorted to Needs Code on the MediaWiki-Action-API board.

Another use case, in Wikimedia-Takedown-Tools we store the affected pages by page ids (in case it's moved, undeleted later, etc.). However, we can't get a page id for pages that don't exist (See T181570). I suppose the workaround would be to store the page title in addition to the pageid?

The tricky part with trying to track pages across undeletion by the page_id is that you can get some unexpected situations:

  • Undeleting a subset of revisions, moving that page elsewhere, then undeleting the rest will assign a new page ID to the second batch even though they're at the old title.
  • Recreating the page at the same title will assign a new ID for the title.
    • Then undeleting the old revisions will keep that new page ID.