Change Details

Currently, EventLogging does not perform any actions related to the generation, storage, or attachment of tokens such as session and pageview identifiers directly. That responsibility is largely delegated to `mw.user`, in the form of `mw.user.sessionId` and `mw.user.getPageviewToken`, and, sometimes, `mw.user.generateRandomSessionId`. EventLogging achieves its sampling with particular defaults, but also allows instrumentation or other calling code to override with an explicit token, often generated from the same methods in `mw.user`, but whose persistence is managed differently, depending on the needs of the instrument. What we've found is that this has led to many different strategies for managing tokens and their lifetimes, along with different formats for these tokens, different field names in event schema that hold these tokens, wide inconsistency in how sampling occurs, and a lack of understanding for the subtleties both of the browser's persistence model, and how the sampling strategy changes depending on how the token is refreshed. For these reasons, in Product's re-think of our instrumentation client library, we opted to make the handling of pageview and session identifiers opaque to the caller, something that took place and was managed internally within the library. That allows for: * Uniform field names, since the library will always set the value on the same field * Uniform ID formats, since the library will always be in control of the format * Uniform sampling strategy, since the library will always be in control of the sole determinant of the sampling outcome -- the pseudorandom identifier * Caller code does not need to manage associative identifiers on a per-instrument basis, which proliferates cookies, storage keys, etc. Obviously there will be cases where it is advantageous for tokens to be managed using a different strategy that the library may not support. But in that case I would argue for a severe examination of the needs in order to determine whether this is truly necessary. If it is necessary, we should explore ways to acheive their needs from within the library, and update the API accordingly. The current level of delegation is causing a maintenance problem. ===== Proposal 1. Factor out and parameterize token-generating logic from `mw.user.generateRandomSessionId(void)` to `mw.util.generateRandomToken(int num_bytes)` 2. Make `mw.user.generateRandomSessionId(void)` call `mw.util.generateRandomToken(80)`, this is retained for compatibility with all existing code that relies on it, but its deprecation should be noted in the documentation This enables MEP Client Library (and others) to call `mw.util.generateRandomToken` for its identifier needs without depending on `mw.user`

Currently, EventLogging does not perform any actions related to the generation, storage, or attachment of tokens such as session and pageview identifiers directly. That responsibility is largely delegated to `mw.user`, in the form of `mw.user.sessionId` and `mw.user.getPageviewToken`, and, sometimes, `mw.user.generateRandomSessionId`. EventLogging achieves its sampling with particular defaults, but also allows instrumentation or other calling code to override with an explicit token, often generated from the same methods in `mw.user`, but whose persistence is managed differently, depending on the needs of the instrument. What we've found is that this has led to many different strategies for managing tokens and their lifetimes, along with different formats for these tokens, different field names in event schema that hold these tokens, wide inconsistency in how sampling occurs, and a lack of understanding for the subtleties both of the browser's persistence model, and how the sampling strategy changes depending on how the token is refreshed. For these reasons, in Product's re-think of our instrumentation client library, we opted to make the handling of pageview and session identifiers opaque to the caller, something that took place and was managed internally within the library. That allows for: * Uniform field names, since the library will always set the value on the same field * Uniform ID formats, since the library will always be in control of the format * Uniform sampling strategy, since the library will always be in control of the sole determinant of the sampling outcome -- the pseudorandom identifier * Caller code does not need to manage associative identifiers on a per-instrument basis, which proliferates cookies, storage keys, etc. Obviously there will be cases where it is advantageous for tokens to be managed using a different strategy that the library may not support. But in that case I would argue for a severe examination of the needs in order to determine whether this is truly necessary. If it is necessary, we should explore ways to acheive their needs from within the library, and update the API accordingly. The current level of delegation is causing a maintenance problem. ===== Proposal 1. Factor out and parameterize token-generating logic from `mw.user.generateRandomSessionId(void)` to `mw.util.generateRandomToken(int num_bytes)` 2. Make `mw.user.generateRandomSessionId(void)` call `mw.util.generateRandomToken(80)`, this is retained for compatibility with all existing code that relies on it, but its deprecation should be noted in the documentation This enables MEP Client Library (and others†) to call `mw.util.generateRandomToken` for its identifier needs without depending on `mw.user` †: For example, this function in [[ https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/master/modules/ext.wikimediaEvents/searchSatisfaction.js | WikimediaEvents::searchSatisfaction.js ]] would be unnecessary if `mw.util.generateRandomToken(int num_bytes)` existed: ```lang=JS /** * Generate a unique token. Appends timestamp in base 36 to increase * uniqueness of the token. * * @return {string} */ function randomToken() { return mw.user.generateRandomSessionId() + Date.now().toString( 36 ); } ```