Page MenuHomePhabricator

Over-reliance on timestamps can lead to incorrect counts
Open, NormalPublic

Description

In several places in the codebase, timestamps are used to determine the order of edits and other actions, which can lead to issues in situations where several actions happen at nearly the same time. One example of this is countRevisionsBetween in Title.php. If multiple revisions of a page are made within 2 seconds, it will return an incorrect count when used on those revisions. Shouldn't this and all similar uses be changed to compare based on the table's primary key instead (revision ID in this case)?

Details

Reference
bz59609

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 2:42 AM
bzimport added a project: MediaWiki-General.
bzimport set Reference to bz59609.
bzimport added a subscriber: Unknown Object (MLST).

Maybe related: bug 2930 / bug 17591

(In reply to comment #0)

In several places in the codebase

Thanks for reporting this. If you are aware of more places, please list them.

I've just now noticed that edit conflict detection in EditPage.php does the same thing.

624dfd884e683dddc3fb2d86e6764277e8546bef by @aaron changed countRevisionsBetween and countAuthorsBetween to use rev_timestamp instead of rev_id.

It took me some time to figure out why diffs like this don't show the multinotice.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 18 2015, 6:56 AM
He7d3r added a subscriber: He7d3r.Sep 7 2015, 12:14 PM
saper added a subscriber: saper.Oct 21 2015, 1:51 PM
Krinkle added a subscriber: Krinkle.EditedMay 10 2016, 3:59 PM

The reason our code often uses timestamps (instead of primary keys) is because revisions can be inserted out of sequence. Most commonly as a result of importing revisions and pages via Special:Import.

It may also happen as result of a history merge (unsure?).

Timestamps tend be to be preferred because they match the way we view revisions on the history page.

Krinkle updated the task description. (Show Details)
Krinkle moved this task from Untriaged to Usage problem on the Wikimedia-Rdbms board.
Krinkle removed a subscriber: wikibugs-l-list.

As described, I think the task should be declined. But it would be reasonable to extend the resolution of timestamps to the point where they are unique, for example, using a time UUID like UIDGenerator::newTimestampedUID88(). Revision IDs are not monotonic in time and should never be used to select ranges for display.