Page MenuHomePhabricator

Special:Undelete doesn't use ar_page_id
Closed, ResolvedPublic

Description

When using Special:Undelete, it does not take into account the value of ar_page_id when inserting into the page table. This is nasty. If the page doesn't exist ($makepage = true) and the ID isn't already in use (for whatever reason), Special:Undelete should attempt to insert into the page table using the value of ar_page_id.


See Also:
T60986: When new pages are created (by Special:Undelete or other means), the page_id should be set to highest archive.ar_page_id associated with that namespace and title

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:21 PM
bzimport set Reference to bz26123.
bzimport added a subscriber: Unknown Object (MLST).

As pointed out by Brion on another bug, this becomes problematic when a deleted title has multiple old page IDs.

It makes sense placing the deleted revision in different blocks if they have different ar_page_id.

The page id is less important than the revision ids.

Additional work will be needed after this bug is fixed to determine where (if anywhere) on WMF projects an OOO risk in history occurs.

(In reply to comment #4)

What'a an OOO risk?

Fairly sure he means "out of order."

  • Bug 28553 has been marked as a duplicate of this bug. ***

When reuse the ar_page_id, please also look at the ar_parent_id. Thanks.

(In reply to comment #3)

The page id is less important than the revision ids.

Additional work will be needed after this bug is fixed to determine where (if
anywhere) on WMF projects an OOO risk in history occurs.

This has nothing to do with order, and out-of-order issues cannot happen. Order is determined by timestamp, not at all by page ID.

Reassure me that the time stamp is sufficiently granular that the revision ID is NEVER needed as a tie breaker, which is what I was referring to. Clearly, if page ID was the only method of determining order, things would be horribly broken, equally clearly they are not, so my assumption was that the time stamp was at least primarily used. This was all implicit in my statement "to determine where (if anywhere)" was the request above. Does the time stamp suffice everywhere? And of course determining this may not be trivial, since the source of the time stamp needs to be taken into account, even two machines running from the same local ntp service will generate a percentage of out-of-order timestamps, depending on the granularity.

(In reply to comment #9)

Reassure me that the time stamp is sufficiently granular that the revision ID
is NEVER needed as a tie breaker, which is what I was referring to.

It's not, but then that issue exists already.

Clearly,
if page ID was the only method of determining order, things would be horribly
broken,

Yes, they would be. Which is why no one is proposing the page ID be used to determine the order of anything. It's simply proposed that the page we create to give restored revisions a home get its old page ID back.

To reiterate: this proposal is about restoring page IDs when restoring revisions. It wouldn't change the existing behavior concerning timestamps or revision IDs or anything else that's used for ordering. Therefore, there are no out-of-order risks because ordering isn't touched.

equally clearly they are not, so my assumption was that the time stamp

was at least primarily used. This was all implicit in my statement "to
determine where (if anywhere)" was the request above. Does the time stamp
suffice everywhere? And of course determining this may not be trivial, since
the source of the time stamp needs to be taken into account, even two machines
running from the same local ntp service will generate a percentage of
out-of-order timestamps, depending on the granularity.

Yes, timestamps aren't perfect, but that's an existing problem that won't be made any worse (or better) by the requested feature.

OK so my original comment was valid (if ill-informed).

What I am still unclear about is whether undeleted pages use the old revision IDs (I can't see why they wouldn't), if so my issue goes aaway.

(In reply to comment #11)

What I am still unclear about is whether undeleted pages use the old revision
IDs (I can't see why they wouldn't), if so my issue goes aaway.

Yes, they do, for all revisions deleted in MediaWiki 1.5 or later.

Is there any risk (performance or replication or whatever) in doing an insert with the primary key specified instead of relying on the autoincrement? Other than that, this task seems straightforward enough to be a GCI candidate. The oldid is already retrieved for undelete (needed by the ArticleRevisionUndeleted hook), it just needs to be used in the insert query.

Change 263080 had a related patch set uploaded (by MtDu):
Use ar_page_id in Special:Undelete

https://gerrit.wikimedia.org/r/263080

Change 263080 abandoned by MtDu:
Use ar_page_id in Special:Undelete

https://gerrit.wikimedia.org/r/263080

I might not completely understand this, but since the pageid column uses auto_increment , how could the ID of a deleted page not be in use since it once was there? Suppose I have:

+--------+-------------+
| pageid | title       |
+--------+-------------+
| 1      | Page 1      |
+--------+-------------+
| 2      | Page 2      |
+--------+-------------+
| 4      | Fourth page |
+--------+-------------+
| 5      | Fieth page  |
+--------+-------------+

The Third page was deleted but the database still increases the auto increment index. Could anyone, please, explain this?

@Victorbarbu: The autoincrement only works when you omit the autoincrement field in an insert. It will be autopopulated with a new value. If you specify a value for that field on the insert, it will be used instead of the autoincrement

I know that, that's why I don't understand why grabbing the page id may be problematic (as @Catrope, @Platonides and @Rich_Farmbrough stated)

That was mostly a misunderstanding about whether the page ID is used for sorting, I think.

Change 263509 had a related patch set uploaded (by Victorbarbu):
Use ar_page_id on undeletion

https://gerrit.wikimedia.org/r/263509

Victorbarbu claimed this task.

Change 263509 merged by jenkins-bot:
Use ar_page_id on undeletion

https://gerrit.wikimedia.org/r/263509

Change 263576 had a related patch set uploaded (by Gergő Tisza):
Fall back to autoincrement when page cannot be recreated with old ID

https://gerrit.wikimedia.org/r/263576

Change 263576 merged by jenkins-bot:
Fall back to autoincrement when page cannot be recreated with old ID

https://gerrit.wikimedia.org/r/263576