Page MenuHomePhabricator

mw.user.generateRandomSessionId should return a UUID
Open, MediumPublic

Description

Session IDs created by mw.user.generateRandomSessionId currently take the form of a random 80-bit hex integer. To better conform to industry standards and support cross-platform consistency, mw.user.generateRandomSessionId should instead return a v4 (random) UUID.

mw.user.generateRandomSessionId should be left in place to support its many existing consumers but should be updated to call out to a new mw.user.generateUUIDv4 method rather than employing a custom ID generation algorithm.

Existing consumers shouldn't be relying on the ID format, but it's always possible that they are, so the change should be announced in advance.

Proposed migration plan:

  • Announce the planned change on wikitech-l
  • Create a a new mw.user.generateUUIDv4 method leveraging the uuid library
  • Update affected schemas to accept UUIDs where the current format is expected
  • Update mw.user.generateRandomSessionId to call mw.user.generateUUIDv4 internally
  • Update impacted EP instruments to produce events according to the updated schema versions

Open questions:

  • Are only Event Platform schemas affected? Legacy EL schemas probably don't support pattern-based field data validation, but we should double-check this.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
sdkim triaged this task as Medium priority.
sdkim moved this task from Inbox to Next on the Product-Data-Infrastructure board.
sdkim added a project: Better Use Of Data.
sdkim subscribed.

I think because it was on the Event Platform board, but doesn't have anything really to do with Event Platform. Instead, it has to do with MW generated session IDs, which I believe are used in EventLogging instrumentation schemas.

Does Metrics Platform support logging the MW session ID?

Perhaps Data Products is not correct, please move accordingly! :)

VirginiaPoundstone added a subscriber: phuedx.

@phuedx

Does Metrics Platform support logging the MW session ID?

?

@VirginiaPoundstone: Yes. It's the performer.session_id contextual attribute (which is soon to be renamed to performer.browsing_session_token)

@Ottomata and @lbowmaker I think this is a library that Data Engineering owns? Should this get a maintenance plan or killed?

I think this is a library that Data Engineering owns?

@VirginiaPoundstone I don't think so. I believe mw.user.generate.generateRandomSessionId is part of MediaWiki core.