Page MenuHomePhabricator

Switch mw.user.sessionId back to session-cookie persistence
Open, HighPublic

Description

As part of an effort to improve web performance, in 2017 I migrated the persistence of the EventLogging-related "session ID" from being stored in an HTTP session-cookie, to being stored in HTML5 Session Storage.

This "session ID" is an anonymous token temporarily assigned to a user viewing a specific website (e.g. en.wikipedia.org), unrelated to their account (if they have one), and not coordinated between wikis.

When using an HTTP session cookie:

  • It expires at the end of the browsing session (e.g. when you close the browser and don't use the "restore session" feature some browsers have), or when browsers delete them to free up space by removing cookies from sites you haven't visited recently (e.g. after 7 days in case of Safari).
  • For as long as it has not expired, it is the same between different tabs to the same website.
  • It is sent to the server (and ignored there) because the Cookie transportation mechanism does not have a way to be "JS-only" (it does have HTTP-only, which we cannot use here).

When using HTML5 sessionStorage:

  • It expires with the lifetime of a single logical browsing "tab". They live for as long as the tab is alive in some form or another. Going from one page to another within the same website within the same tab, is still the same "tab". You can also go to a different website in that tab, and then use the "Back" and "Forward" mechanisms to revisit the old page in the same tab, and still associate the same HTML session. In most browsers, one can also re-open a closed tab within a limited time from the History menu. When closing a browser complete there is not a way to restore it usually. Not even when "restoring the browser session".
  • It is never shared between tabs to the same website. While this restriction might sound interesting from a privacy perspective, it seems to me not useful and not matching user expectation. When a user is logged-in (which uses cookies), they are logged in in all tabs for that website, not just the tab where they logged-in. They can open additional tabs via bookmarks or context menus and expect that to span the same session. This makes HTML5 session storage not useful for A/B testing and other EventLogging purposes.

The reason for migrating it in 2017 was to reduce network traffic (bandwidth cost for users) between browser and server for information only never needed on the browser. The session cookies, while having all the correct behaviours we want, are also sent to the server, which adds a cost that negatively impacts performance.

It was my (incorrect) understanding that HTML5 session storage is like HTTP cookie sessions, except without the cost of HTTP transfers.

Proposal

Migrate the storing mechanism for this token back from HTML5 session storage to HTTP session cookies. This is technically a simple task to do.

Stakeholder approval

Check the box if this change is desirable for your team, and if you have concerns about when to deploy it (e.g. not during a particular research campaign), also let us know in the comments below when you'd prefer this to (not) roll out.

  • Performance Team (implied).
  • Analytics.
  • Product Analytics

See also

Event Timeline

Krinkle triaged this task as Medium priority.May 20 2019, 7:22 PM
Krinkle created this task.
Krinkle updated the task description. (Show Details)
Krinkle moved this task from Inbox to Backlog: Future Goals on the Performance-Team board.
Krinkle added a subscriber: Nuria.
fdans moved this task from Incoming to Radar on the Analytics board.
Nirmos added a subscriber: Nirmos.Jun 6 2019, 10:26 AM
leila edited projects, added Research-Backlog; removed Research.Jul 11 2019, 3:50 PM
Krinkle updated the task description. (Show Details)Sep 10 2019, 6:50 PM
LGoto moved this task from Next Up to Blocked on the Product-Analytics (Kanban) board.
Nuria added a comment.Fri, Nov 15, 8:28 PM

I think (if Performance-Team agrees) this work can be done by @jlinehan as part of the efforts we are doing for the Modern Event Platform work.

jlinehan added a comment.EditedFri, Nov 15, 8:45 PM

I think (if Performance-Team agrees) this work can be done by @jlinehan as part of the efforts we are doing for the Modern Event Platform work.

Happy to do it, if there's approval from Performance-Team.

@Krinkle it seems like HTML5 localStorage (link), which does not expire on tab closure, and is available to all tabs, would deal with the shortcomings of HTML5 sessionStorage while still keeping the data it's persisting off the wire. Is my understanding correct that the reason why localStorage is considered unsuitable is due to its lack of any TTL mechanism?

Nuria added a comment.Fri, Nov 15, 9:19 PM

it seems like HTML5 localStorage (link), which does not expire on tab closure, and is available to all tabs, would deal with the shortcomings of HTML5

Problem is that localStorage does not work for browsers such UC browser for which we get many pageviews (millions a day). https://caniuse.com/#search=Localstorage

Some of the browsers that do not support localstorage also are below grade A in MW javascript support. See: https://github.com/wikimedia/mediawiki/blob/2aed14b686/resources/src/startup/startup.js#L39 but not all, so * I think* storing on a cookie is a requirement.

Problem is that localStorage does not work for browsers such UC browser for which we get many pageviews (millions a day). https://caniuse.com/#search=Localstorage

Aha. And KaiOS browser support is unknown too, I see. What a shame. I wonder if it's actually unsupported (I assume it is) or there really is just no data. I see that the caniuse numbers look the same for sessionStorage https://caniuse.com/#search=sessionstorage, I assume this was another reason to return to using cookies here.

Ottomata raised the priority of this task from Medium to High.
Ottomata added a project: Analytics-Kanban.
Ottomata moved this task from Radar to Data Quality on the Analytics board.
jlinehan moved this task from MEP to EPC on the Better Use Of Data board.Tue, Nov 19, 5:31 PM
mforns moved this task from In Progress to Paused on the Analytics-Kanban board.Mon, Nov 25, 5:26 PM
Nuria added a comment.Mon, Dec 2, 6:55 AM

Uassigning @Milimetric and lowering priority a bit in the light of other (completely unrelated work) we need to do

Nuria removed Milimetric as the assignee of this task.Mon, Dec 2, 6:56 AM
Nuria removed a project: Analytics-Kanban.
Nuria added a subscriber: Milimetric.
mpopov signed these changes with MFA.Thu, Dec 5, 9:31 PM
mpopov added a subscriber: mpopov.

Okay, if localStorage is out of the discussion (due to lack of expiration setting and full browser support) and our options are sessionStorage (current status quo) and session cookies (proposed), we should definitely switch back to using session cookies. Not having a cross-tab mw.user.sessionId is a major hindrance to analysis of sessions on the web.

jlinehan added a comment.EditedFri, Dec 6, 5:07 PM

Hey @Nuria, @Krinkle, I think it will be simple to patch this, my understanding from looking at the Mediawiki code is that we already have the code to work with cookies from the mw.cookie module. So the patch here is to the mw.user module, correct? (specifically mw.user.sessionId). Can I ask why this feature lives in mw.user? If there are other modules besides EventLogging that depend on it, then they should also be informed since session cookies will alter the persistence behavior vs sessionStorage. If there are no other modules depending on it, why is it in mw.user?

Nuria added a comment.Fri, Dec 6, 5:17 PM

@jlinehan We have too many balls up in the air now that need code reviews, let's pick this patch up when https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventLogging/+/524575/ and https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/553376/ have been merged and are working in vagrant.