Page MenuHomePhabricator

[S] Explore how many rows in the file tables have duplicated timestamp values on WMF wikis
Closed, ResolvedPublic

Description

Files in the image, oldimage, and filearchive table can have the same upload timestamp value. This task is to determine the extent to which the duplications occur and as such what safeguards need to be in place to deal with files with the same upload timestamp.

To do this queries will be run to find the maximum number of duplications for upload timestamps on commonswiki.

Acceptance criteria
  • Have these queries run and explore what this means for the code that uses the upload timestamp as some kind of uniqueness

Related Objects

StatusSubtypeAssignedTask
ResolvedDreamy_Jazz
DuplicateNone
Resolvedkostajh
ResolvedNone
ResolvedTchanders
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedBUG REPORTDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedBUG REPORTDreamy_Jazz
ResolvedDreamy_Jazz
Resolvedkostajh
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz

Event Timeline

Dreamy_Jazz closed this task as Resolved.EditedNov 16 2023, 11:32 PM
Dreamy_Jazz claimed this task.

Results:

This suggests that if the batch count is small (say less than 50), then if a upload timestamp with more than 50 usages is found raising the batch size to the number of rows using this timestamp plus 1 would not be a problem.