Page MenuHomePhabricator

Clear site data on MediaWiki log out
Open, MediumPublic

Description

Right now when a user logs out of MediaWiki, a significant amount of state can stay behind spanning both the logged-in and logged-out browsing session, which is likely unexpected from a user perspective.

While we take care to expire the PHP session data, and PHP session cookie on the client. Other cookies (session-bound or otherwise), and all browser storage (sessionStorage and localStorage) remain.

The session-bound cookies and sessionStorage values should be cleared if the user remembers to properly close all windows and quit the browser. But even then other storage remains.

And more likely, a user may close the browser in its entirety, in which case most modern browsers are helpful enough to save it anyway and offer to restore the session upon re-opening of the browser.

Logging-out is the key user interaction here that we should use to clear everything else.

This could be taken care of by loading some JavaScript code on the page in response to the POST request after a successful log-out.

In addition, we can use the Clear-Site-Data header which can help clear additional things in supported browsers (such as HTTP caches).

Clear Site Data (W3C specification)
https://www.w3.org/TR/clear-site-data/

Event Timeline

Based on https://www.chromestatus.com/feature/4713262029471744 this appears to be a Chromium/Opera-only feature right now? https://bugzilla.mozilla.org/show_bug.cgi?id=1268889 is the request to implement this in Firefox.

Yeah, this task is for doing it in general, which will require JS code for now, but we can and should still do that first.

The header just helps it happen sooner in supported browsers, no harm in adding it. Especially on mobile the header can help make the deletion scheduled natively even if the user closes the page before all the JS arrives and executes.

Note that we specifically want certain things to remain present after logout, including the user name cookie used to prefill the field on a subsequent login, the cookie for the "cookie block" feature I've heard some talk about, and the new anonymous session cookie (if any). There's also T142542 that wants to return to setting a LoggedOut cookie.

We'd also possibly want to preserve UI state cookies and local storage, e.g. things that remember whether some UI element is expanded or collapsed.

Users may or may not also expect gadget or user script data saved to cookies or local storage to remain across a logout and log-back-in; that should probably be investigated.

At a quick glance, it seems Clear Site Data may not be particularly suitable for WMF use, both in that it can only clear all or none of various things and in that it only works for the current origin (e.g. it'd clear en.wikipedia.org, but not de.wikipedia.org, fr.wikipedia.org, etc.). I may, of course, be mistaken.

fdans moved this task from Incoming to Radar on the Analytics board.

Right now when a user logs out of MediaWiki, a significant amount of state can stay behind spanning both the logged-in and logged-out browsing session, which is likely unexpected from a user perspective.

This would be true of state that alters your interactions with the site. We certainly would like for analytics cookies to remain after logout and they do not any way affect the interactions of the user with the site. (Ex: WMF-Last-Access)

Nuria subscribed.

Tagging Privacy-Engineering as FYI. This may be worth looking into and get into our planning.

See Coding conventions which recommend a workaround, which I believe is likely not consitently followed currently and also hard to notice or enforce.

JFishback_WMF moved this task from Incoming to Backlog on the Privacy Engineering board.

Quoting here to as a reminder for the future:

[…]
Clear-Site-Data does clear subdomains so that would be a meaningful privacy improvement, at the cost of maybe clearing cookies that shouldn't be cleared (logged-out preferences, unique device counters etc). Maybe worth a task of its own. I'll close this one though as I don't think we really care about whether the icons show or not.

I intended for this task as applying to MediaWiki core and CentralAuth. Regarding subdomains, for that aspect of CSD to manifest as a problem (or solution), the wiki would need to be hosted on an apex with other wikis (or non-wiki services) at subdomains. Afaik we don't have such setup at WMF. The closest I can think of is Wikidata.org which has query.wikidata.org, but alas the wiki is canonically at www.wikidata.org.

So, where cookies allow scopes to go inward (en.m.wikipedia.org can set en.m.w.o, *.m.w.o, and *.w.o), with CSD the scope only goes outward. This means the CentralAuth configuration variable we have that lists which domains we log-in by wildcard as fully MediaWiki-operated (e.g. *.wikipedia.org) and which we log-in individually (commons.wikimedia.org, meta.wikimedia.org, www.mediawiki.org, www.wikidata.org, etc.) — does not help us with CSD.

On the other hand, given it isn't affected by cross-origin, CORS, or third-party cookie restrictions, it should be fairly easy to make a bunch of requests to each wiki where the user is logged-in. E.g. based on globaluser table (as shown by Special:CentralAuth). That should keep it fairly small in most cases. It could be done without JavaScript by embedding <img> requests to a transparent pixel served from something like Special:CentralClearSiteData or /w/rest.php or some such).

Per https://www.w3.org/TR/clear-site-data/#clear-cookies the cookies are cleared on the entire registered domain. So on one hand, we could use the edge wiki set, on the other hand we'd probably want to avoid doing it for *.wikimedia.org wikis (which might make it somewhat pointless, although I guess we could use plain Set-Cookie there instead).

... it should be fairly easy to make a bunch of requests to each wiki where the user is logged-in. E.g. based on globaluser table (as shown by Special:CentralAuth). That should keep it fairly small in most cases.

Per above this doesn't really matter, but it's worth keeping in mind that e.g. stewards tend to have hundreds of local accounts (and there is no way to tell which ones the user is currently logged into). A privacy feature which sometimes works isn't ideal.

On the other hand, given it isn't affected by cross-origin, CORS, or third-party cookie restrictions ... It could be done without JavaScript by embedding <img> requests to a transparent pixel served from something like Special:CentralClearSiteData or /w/rest.php or some such).

I imagine we'd want to reimplement CORS checks on the server side, because otherwise this would be very abusable. But yes, triggering it would be easy.