Page MenuHomePhabricator

Truncate timestamps in comment IDs / names
Closed, ResolvedPublic

Description

Our timestamp format currently ends with seconds, milliseconds and a timezone marker, which means a fixed string of ":00.000Z" as signatures never have seconds (or milliseconds) and we always store in UTC. This adds an extra 8 or 16 bytes to every comment ID, which makes URLs unnecessarily long and adds to DB storage.

If we were to change the ID/name format again, places where the old format could be stored include

  • In HTML caches for X days
  • In discussiontools_subscription table (sub_item) - could be updated via a script, or we could just add code to support both formats
  • In echo_event (type dt-subscribed-new-comment), event_extra stores comment name and ID which is used to render notification links
  • In echo email notifications, the old style links will exist forever
  • Some users have already started using our #coment-id format for "permalinking" via a gadget (although this has no promises of being an actual permalink yet, e.g. it doesn't survive archiving)

Related Objects

Event Timeline

I had a better idea about how to change it: we should just pick a date soon in the future, and use the new format only for comments posted after that date, but keep using the old for comments posted before. This way we don't need to migrate existing data, and we don't need to support two formats forever – there is just one format, it just looks different depending on the date, but each ID / name only exists in one format.

Change 806942 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] Truncate timestamps in comment IDs / names

https://gerrit.wikimedia.org/r/806942

Change 806942 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Truncate timestamps in comment IDs / names

https://gerrit.wikimedia.org/r/806942

matmarex added a project: Editing QA.

I've picked 2022-07-12 for the switchover date (to ensure that the formats won't change back-and-forth in case the train reverted), the new behavior can be tested on new comments after that date (or if you make up a timestamp).

Example ID: https://en.wikipedia.beta.wmflabs.org/wiki/Talk:Main_Page#c-ESanders_(WMF)-20220825092500-DLynch-2022-05-13T17:14:00.000Z

Note the first timestamp is after 12 July so uses the shorter format, but the parent comment has the longer format.

@matmarex in the task I suggested stripping seconds as they are always 00, but you have kept them in. Should we remove them?

I wish to keep them so that we can use standard MediaWiki methods for parsing these timestamps (as this is the same format that is used in URLs like https://en.wikipedia.org/w/index.php?title=Special:Contributions&offset=20220503215552&target=Matma+Rex).

EAkinloose subscribed.

Example ID: https://en.wikipedia.beta.wmflabs.org/wiki/Talk:Main_Page#c-ESanders_(WMF)-20220825092500-DLynch-2022-05-13T17:14:00.000Z

Note the first timestamp is after 12 July so uses the shorter format, but the parent comment has the longer format.

The long format:

Screenshot 2022-07-14 at 12.01.55.png (492×1 px, 171 KB)

The short format:

Screenshot 2022-07-14 at 12.01.23.png (436×1 px, 151 KB)