Problem
I was under the assumption that the pageid was a stable identifier (stable, as in that it remains the same regardless of user action). However, that does not seem to be the case.
If I am a permissioned user and have a pageid I can request the title:
/api.php?action=query&format=json&pageids=4&formatversion=2
{ "batchcomplete": true, "query": { "pages": [ { "pageid": 4, "ns": 0, "title": "Gotham" } ] } }
If I delete that page, I get a response that it's missing:
/api.php?action=query&format=json&pageids=4&formatversion=2
{ "batchcomplete": true, "query": { "pages": [ { "pageid": 4, "missing": true } ] } }
While it's in the deleted state, the pageid does not exist (even if you have permission to see all of the deleted revisions).
If I restore the page, then all of the sudden, it's back again:
/api.php?action=query&format=json&pageids=4&formatversion=2
{ "batchcomplete": true, "query": { "pages": [ { "pageid": 4, "ns": 0, "title": "Gotham" } ] } }
However, there are many ways in which restoring may not result in the same pageid
The problem is that we can store the pageid or the title in a database, but there isn't way to ensure that in the future this refers to the same page. I realize that if you change everything about a page, is it still the same page? I suppose I mean what users consider to be the same page. If a page can be deleted, restored, and moved and still be the same page, then it should have a stable id throughout any of those processes.
Solution
We could change our page deletion strategy from a hard delete (where the page is removed from the table) to a soft delete. This would invovle adding a page_deleted column that would either be a nullable datetime of when the page was deleted, or a boolean field that would indicate whether or not the page is deleted. I think the former is better since it gives more information about the page being deleted.
This change would fix the API endpoints as the page would no longer be missing (but perhaps should return that it has been deleted). If a user were to re-create the page, it would recreate with the same id, it's deleted status would be removed (although, all of the existing revisions would continue to be deleted). Effectively, a deleted page is the same as saying no revisions.
Alternatively, if we don't want to change the way that page deletion works, we could just abstract this with the API. The API would basically query for pages in the archive table. However, this adds a lot of overhead to API endpoints without actually fixing the underlining issue.
Use Cases
Work Around
Store the pageid and the title and assume that one or the other hasn't changed.