Page MenuHomePhabricator

Switch mw.user.sessionId back to session-cookie persistence
Closed, ResolvedPublic

Description

As part of an effort to improve web performance, in 2017 I migrated the persistence of the EventLogging-related "session ID" from being stored in an HTTP session-cookie, to being stored in HTML5 Session Storage.

This "session ID" is an anonymous token temporarily assigned to a user viewing a specific website (e.g. en.wikipedia.org), unrelated to their account (if they have one), and not coordinated between wikis.

When using an HTTP session cookie:

  • It expires at the end of the browsing session (e.g. when you close the browser and don't use the "restore session" feature some browsers have), or when browsers delete them to free up space by removing cookies from sites you haven't visited recently (e.g. after 7 days in case of Safari).
  • For as long as it has not expired, it is the same between different tabs to the same website.
  • It is sent to the server (and ignored there) because the Cookie transportation mechanism does not have a way to be "JS-only" (it does have HTTP-only, which we cannot use here).

When using HTML5 sessionStorage:

  • It expires with the lifetime of a single logical browsing "tab". They live for as long as the tab is alive in some form or another. Going from one page to another within the same website within the same tab, is still the same "tab". You can also go to a different website in that tab, and then use the "Back" and "Forward" mechanisms to revisit the old page in the same tab, and still associate the same HTML session. In most browsers, one can also re-open a closed tab within a limited time from the History menu. When closing a browser complete there is not a way to restore it usually. Not even when "restoring the browser session".
  • It is never shared between tabs to the same website. While this restriction might sound interesting from a privacy perspective, it seems to me not useful and not matching user expectation. When a user is logged-in (which uses cookies), they are logged in in all tabs for that website, not just the tab where they logged-in. They can open additional tabs via bookmarks or context menus and expect that to span the same session. This makes HTML5 session storage not useful for A/B testing and other EventLogging purposes.

The reason for migrating it in 2017 was to reduce network traffic (bandwidth cost for users) between browser and server for information only never needed on the browser. The session cookies, while having all the correct behaviours we want, are also sent to the server, which adds a cost that negatively impacts performance.

It was my (incorrect) understanding that HTML5 session storage is like HTTP cookie sessions, except without the cost of HTTP transfers.

Proposal

Migrate the storing mechanism for this token back from HTML5 session storage to HTTP session cookies. This is technically a simple task to do.

Stakeholder approval

Check the box if this change is desirable for your team, and if you have concerns about when to deploy it (e.g. not during a particular research campaign), also let us know in the comments below when you'd prefer this to (not) roll out.

  • Performance Team (implied).
  • Analytics.
  • Product Analytics

See also

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle triaged this task as Medium priority.May 20 2019, 7:22 PM
Krinkle updated the task description. (Show Details)
Krinkle added a subscriber: Nuria.

I think (if Performance-Team agrees) this work can be done by @jlinehan as part of the efforts we are doing for the Modern Event Platform work.

I think (if Performance-Team agrees) this work can be done by @jlinehan as part of the efforts we are doing for the Modern Event Platform work.

Happy to do it, if there's approval from Performance-Team.

@Krinkle it seems like HTML5 localStorage (link), which does not expire on tab closure, and is available to all tabs, would deal with the shortcomings of HTML5 sessionStorage while still keeping the data it's persisting off the wire. Is my understanding correct that the reason why localStorage is considered unsuitable is due to its lack of any TTL mechanism?

it seems like HTML5 localStorage (link), which does not expire on tab closure, and is available to all tabs, would deal with the shortcomings of HTML5

Problem is that localStorage does not work for browsers such UC browser for which we get many pageviews (millions a day). https://caniuse.com/#search=Localstorage

Some of the browsers that do not support localstorage also are below grade A in MW javascript support. See: https://github.com/wikimedia/mediawiki/blob/2aed14b686/resources/src/startup/startup.js#L39 but not all, so * I think* storing on a cookie is a requirement.

Problem is that localStorage does not work for browsers such UC browser for which we get many pageviews (millions a day). https://caniuse.com/#search=Localstorage

Aha. And KaiOS browser support is unknown too, I see. What a shame. I wonder if it's actually unsupported (I assume it is) or there really is just no data. I see that the caniuse numbers look the same for sessionStorage https://caniuse.com/#search=sessionstorage, I assume this was another reason to return to using cookies here.

Ottomata raised the priority of this task from Medium to High.
Ottomata added a project: Analytics-Kanban.
Ottomata moved this task from Radar to Data Quality on the Analytics board.

Uassigning @Milimetric and lowering priority a bit in the light of other (completely unrelated work) we need to do

Nuria removed a project: Analytics-Kanban.
Nuria added a subscriber: Milimetric.
mpopov signed these changes with MFA.Dec 5 2019, 9:31 PM
mpopov added a subscriber: mpopov.

Okay, if localStorage is out of the discussion (due to lack of expiration setting and full browser support) and our options are sessionStorage (current status quo) and session cookies (proposed), we should definitely switch back to using session cookies. Not having a cross-tab mw.user.sessionId is a major hindrance to analysis of sessions on the web.

Hey @Nuria, @Krinkle, I think it will be simple to patch this, my understanding from looking at the Mediawiki code is that we already have the code to work with cookies from the mw.cookie module. So the patch here is to the mw.user module, correct? (specifically mw.user.sessionId). Can I ask why this feature lives in mw.user? If there are other modules besides EventLogging that depend on it, then they should also be informed since session cookies will alter the persistence behavior vs sessionStorage. If there are no other modules depending on it, why is it in mw.user?

@jlinehan We have too many balls up in the air now that need code reviews, let's pick this patch up when https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventLogging/+/524575/ and https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/553376/ have been merged and are working in vagrant.

mpopov edited projects, added Product-Analytics; removed Product-Analytics (Kanban).
mpopov moved this task from Needs Investigation to Current Quarter on the Product-Analytics board.

Jason and I talked about picking this up this quarter

Change 572011 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[mediawiki/core@master] Switch mw.user.sessionId back to session-cookie persistence

https://gerrit.wikimedia.org/r/572011

@Nuria: Can you please sign-off in the ticket description as the Analytics stakeholder if you agree with Timo and the change that's been uploaded? Session ID should behave in the way we expect it to, which is to say the JS-based session cookie solution that allows it to be shared by tabs and is restricted to the lifetime of the browser session gets us there (and returns to how it used to be).

Are we using js-based cookies or non-js-based cookies?

JS-based. mw.user is a JavaScript module that runs client-side and generates these IDs by computing random tokens with the Web Crypto API. The computation and storing as cookie happens lazily when event instrumentation code first needs the ID and it isn't already set.

Change 572011 merged by jenkins-bot:
[mediawiki/core@master] mediawiki.user: Switch sessionId() back to session-cookie persistence

https://gerrit.wikimedia.org/r/572011

Thanks @jlinehan for announcing the change to product-all@, research-internal@lists, and analytics@lists

Shouldn't this cookie have been added to https://foundation.wikimedia.org/wiki/Cookie_statement ?

Maybe. I am not a laywer, but as far as I know that page and relevant legislation applies to cookies and other forms of client storage more or less equally (e.g. Local Storage, IndexedDB etc.).

This feature was not listed there, so perhaps it was already forgotten back when it did use a cookie. Either way, if it is listed there, I don't think it needs any change since the expiry would continue to be "When user exits browser". Some of the features that moved from Cookies to Local Storage were highlighted as such on that page, but only because they used a special kind of expiry, not because they used LocalStorage itself instead of a cookie. (And the SessionStorage variant of LocalStorage is basically equivalent to Session cookie, which is what this feature used until recently.)

kzimmerman added subscribers: DAbad, kzimmerman.

@DAbad @jlinehan this came across my team's radar again as we were going through our backlog, and I wanted to flag the recent comments above (https://phabricator.wikimedia.org/T223931#6976622, https://phabricator.wikimedia.org/T223931#6978412) for your attention

Need to reach out to Legal to figure out the next steps to determine if something needs to be done.

Basically do we need to document this in the cookie statement?

JL: probably yes

Who maintains the cookie statement page?

It would be helpful to close this task and open a separate one for the possible change to the cookie statement. I was just looking up the behavior of mw.user.sessionId and assumed it didn't have session-cookie persistence because this task was open. It was only the next day that I realized I was wrong, after I saw some conflicting documentation and looked at the source.

Aklapper added a subscriber: Aklapper.

Removing inactive assignee from this open task. (Please update assignees on open tasks after offboarding. Thanks.)

Krinkle claimed this task.