I don't see any reason to return deleted revisions with pageid set to 0 when ar_page_id is non-zero. If there is a reason, another way to make ar_page_id visible through the API should be devised.
The pageid being returned is the page ID, if any, associated with the deleted revision's title.
I see no point to returning ar_page_id. If you have an actual use case beyond "I want all the data", you'll have to describe it.
I'm writing a client software which will synchronize the content of a wiki into a local database. This is useful e.g. for caching of the data for bots which don't have direct access to the wiki's database. So I think that in the end, "I want all the data" is pretty good reason :-)
To be more specific, the synchronization should replicate the delete/undelete actions in the local database based on the log events to avoid expensive queries requesting the full history of deleted/undeleted pages. But without ar_page_id being available, it is kind of difficult and I'm still not sure if my code for undeletion always does the right thing with arbitrary delay between synchronizations.
It's not intended as a backup solution, but caching for bots and other client tools. And since incremental dumps are not supported, it's actually much more efficient than downloading the full dumps again and again. Plus, the synchronization period is controlled by the client, not by the administrator releasing the dumps.
The tricky part with trying to track pages across undeletion by the page_id is that you can get some unexpected situations:
- Undeleting a subset of revisions, moving that page elsewhere, then undeleting the rest will assign a new page ID to the second batch even though they're at the old title.
- Recreating the page at the same title will assign a new ID for the title.
- Then undeleting the old revisions will keep that new page ID.