Page MenuHomePhabricator

Document usage policy for LocalStorage in MediaWiki code
Closed, ResolvedPublic

Description

Core has a minimal api for LocalStorage. CentralNotice has a more elaborate system that suits its needs.

Current guidelines suggest using LocalStorage instead of cookies, and the majority of the browsers for which we support JS have LocalStorage.

Do we want something more elaborate in core? Should we re-use some of the CentralNotice code? Use an external library? Or maybe write something new?


In the meantime, jquery.jStorage has TTL (time to live) eviction. (Remember the eviction will only run at times the library is loaded.) (It is marked as deprecated, but mediawiki.storage does not have eviction.)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle triaged this task as Medium priority.Mar 24 2016, 5:03 AM

Change 254326 merged by jenkins-bot:
kvStoreMaintenance: Refactor to use requestIdleCallback

https://gerrit.wikimedia.org/r/254326

Next steps here is integrating this back into MediaWiki core as part of the mediawiki.storage module. I propose to let the new implementation of kvStoreMaintenance CentralNotice simmer for a few deployment cycles and in a month or so pick this up again.

Krinkle renamed this task from Study options and libraries for a more featureful LocalStorage API in core to Provide featureful Local Storage abstraction in MediaWiki core.Apr 21 2016, 7:06 PM
Krinkle removed a project: Patch-For-Review.

Change 323344 had a related patch set uploaded (by Krinkle):
mediawiki.storage: Provide a wrapper for sessionStorage too

https://gerrit.wikimedia.org/r/323344

Change 323344 merged by jenkins-bot:
mediawiki.storage: Provide a wrapper for sessionStorage too

https://gerrit.wikimedia.org/r/323344

Krinkle removed a project: Patch-For-Review.
Krinkle added a project: MediaWiki-General.

Re-opening since this was merely a SessionStorage wrapper. An expiry/ttl mechanism (which is the minimal required feature) is still missing.

Change 340185 had a related patch set uploaded (by Krinkle; owner: Krinkle):
kvStore: Minor clean up and optimization of kvStoreMaintenance

https://gerrit.wikimedia.org/r/340185

Change 340185 merged by jenkins-bot:
kvStore: Minor clean up and optimization of kvStoreMaintenance

https://gerrit.wikimedia.org/r/340185

Have you considered bringing the expiry issue up at standards (specification) level? What are the arguments (if any) that prevents this from being in the standard?

@Nirmos Yep. I've brought this to various whatwg members a year or 2 ago.

https://wiki.whatwg.org/wiki/Storage details a summary of the various APIs. https://storage.spec.whatwg.org/ is a draft that would make these different storage APIs collaborate better together (e.g. have a common interface for quotas and persistence/non-persistence). However it unfortunately does not yet deal with time-based expiration or any other eviction mechanism for that matter.

https://github.com/whatwg/storage/issues/11 is an open issue asking for an expiry mechanism.

The most recent storage interface added to the web is Cache and CacheStorage interfaces part of theService Worker spec. https://github.com/w3c/ServiceWorker/issues/863 is an issue for adding a native eviction support, there.

In MediaWiki core, we currently don't have any expiry for the mw.storage interface. We set values without any expiry. And when keys get removed from the system, or if a feature is removed from the platform, or disabled by the user. We either forget about it, or if there's a reasonable place to put it, we'll add small chunk of code that, via requestIdleCallback, checks if the key exists, and removes it. - On every page view. This doesn't scale.

In CentralNotice, there's a custom kvStore interface that needed generic eviction logic because its keys contain identifiers for user-generated content (in this case, CentralNotice tracks which state about page banners that are in rotation, and these have unique IDs). The code is at ext.centralNotice.kvStoreMaintenance.js. Main change was https://gerrit.wikimedia.org/r/254326 (4a01c97129).

  • Starts via requestIdleCallback.
  • Loop through localStorage.length, using localStorage.key and collect relevant keys.
  • Yield, and take another requestIdleCallback to process the keys. This time a recursive callback that reschedules itself until all keys have been processed.

There's a TODO to trigger this logic only on a random sampling of page views, and/or add some locking mechanism via sessionStorage to avoid triggering it from multiple windows simultaneously.

While this seems to be working fine, I'm not happy with the performance numbers for this. Especially on mobile, where unlike desktop browsers, devices typically do not read the entire store into memory after the first key access. So it's repeated disk I/O. Not sure whether this is a browser thing or an OS thing, could be that desktop just has a more effective disk cache due to having more spare memory. Either way, the end result is the same.

Another problem is that, because we can only evict keys when a page is open, space is inefficiently used. I'm sure that when absolutely needed, browsers will evict all stores of an origin to save space, but it would be preferable for keys that are known to be expired to go first, thus leaving more room for higher priority data, and increasing the chances that on a repeat view performance-sensitive values are still cached. Or that that localStorage-saved draft comment the user forgot about didn't get randomly get deleted when some other site requires a lot of space.

This is also relevant to T66721, which is about our JS/CSS module cache in localStorage often not working well in Firefox. Firefox has a shared quota for all subdomains. This means that when our more active users that participate on multiple wikis (Wikimedia hosts over 800 wikis, including 284 active Wikipedia subdomains), you visit a wiki, even once, that module cache is populated, but until you visit that wiki again, we have no way to clear it. The modules may have changed many times since then, and would be pruned immediately when viewed, but still it's taking up precious space and preventing other wikis from claiming space instead (in Firefox). This is somewhat orthogonal, but useful to keep in mind.

Cookies, the oldest value store on the platform (as far as I know), has a key eviction mechanism. But localStorage has neither TTL nor LRU. IndexedDB, WebSQL and the new SW Cache API, also don't have either.

A few random thoughts: When a user logs out, we terminate their "session" from a server perspective. Authentication cookies are also cleared client-side. However, unlike session cookies, client-side sessionStorage relates to the larger browser session. Perhaps users are expected to be wise enough to close browsers on shared computers, but it might make sense to clear sessionStorage by default from our Log out page. Not just for users on public computers, but also to avoid our own "anonymous" event logging from spanning across a logged-in session and logged-out session.

In addition to sessionStorage, we may also want to think about localStorage. Is there a use case for storing personal information in localStorage, and if so, what is our eviction strategy? For example, other sites often store draft-like content in localStorage so that it may be persisted across browser crashes and other unexpected restarts (unlike sessionStorage). However when a user explicitly chooses to log out, they might not expect that to persist. (But then again, what if the log-out happened by accident?) On the other hand, localStorage is already cleared automatically by the same system that clears cookies (e.g. "Clear all cookies", or "Incognito" mode).

A few random thoughts: When a user logs out, we terminate their "session" from a server perspective. Authentication cookies are also cleared client-side. However, unlike session cookies, client-side sessionStorage relates to the larger browser session. Perhaps users are expected to be wise enough to close browsers on shared computers, but it might make sense to clear sessionStorage by default from our Log out page.

Completely agree with clearing sessionStorage on logout. A lot of users still don't know to close their browsers (or understand what happens when they do).

In the meantime, there is jquery.jStorage. This is marked as deprecated in favor of mediawiki.storage.

However, jStorage has TTL, which mediawiki.storage does not. In my opinion, jStorage should be removed until mediawiki.storage offers some kind of automatic eviction (e.g. TTL or LRU).

Of course, jStorage's eviction will only run if the library is loaded (since its TTL handler needs to run), but that's better than nothing.

Krinkle changed the task status from Open to Stalled.Jun 6 2019, 9:06 AM

Marking as stalled, blocked on browsers having a storage API that

  1. Has support for expiry.
  2. Doesn't have its values send over the network (like cookies).

Without this, our only options for expiring values are unacceptable from a performance perspective.

Change 588432 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] mediawiki.storage: Add warning docs about key and expiry caveats

https://gerrit.wikimedia.org/r/588432

Change 588432 merged by jenkins-bot:
[mediawiki/core@master] mediawiki.storage: Add warning docs about key and expiry caveats

https://gerrit.wikimedia.org/r/588432

I've been hoping for a decade that browsers will introduce anything aside from cookies that supports per-key eviction and expiry.

Unfortunately nothing of the sorts has appeared in the WHATWG and W3C standard landscape, and there doesn't seem to be significant interest in introducing this, either.

LocalStorage doesn't. IndexedDB doesn't. WebSQL doesn't. SW Cache API doesn't.

They are all specified and implemented as whole units. Meaning the browser will keep it in its entirety until/unless it is deciding to free up the entire storage bucket relating to a specific website (e.g. when disk space is low, or if you haven't visited a site in a while).

The one thing that is changing is that there is growing interest from browser vendors to shorten the time span for when a site's data is evicted. For example, Safari now evicts a site's storage after 7 days of no use.

That will reduce the impact of this problem to some extent, but for regularly active users it doesn't and we'll likely still see issues where some data builds up over many months of use (ref T253137).

I think the only advice we can state is to continue to be careful with using localStorage. To make sure you don't use "variable" keys. E.g. never put a page name, user id, or some other variable in the key. The reason being that if the surroundings change, you'll have a key your code can't rediscover to clean up after a while. Instead, use only a single key per feature and bundle arrays within it. These arrays themselves should of course also be limited. E.g. keep the array length limited and push out old stuff when adding new stuff before saving.

In general, the recommendation I think, while unfortunatem, is for most small things to go back to cookies. The cost of cookies has come down a lot over the years. With the rise of HTTP/2 we generally no longer see them sent out to the server repeatedly in full during a browsing session (due to better compression and re-use).

Run-time performance is also a lot better for cookies. localStorage is often backed by a SQLite database that browsers re-open and close into memory from disk for every get/set operation (on mobile; on desktop it is often kept in memory). But cookies, due to having a smaller maximum size overall are I believe often kept in memory during a browsing session which means repeat access and get/set isn't as costly. So for anything reasonably small and simple, cookies might be the best option.

This not in the least also because of cookies having support for "session" as the expiry, where there is no hard TTL but it is tied to a browsing session. Even that is not supported by LocalStorage. And remember that SessionStorage is not about browser sessions, they are about tab sessions (it should be called "tabStorage"), which means the values are not shared among tabs and thus unstuitable for most use cases (T223931).

Krinkle renamed this task from Provide featureful Local Storage abstraction in MediaWiki core to Document usage policy for LocalStorage in MediaWiki code.May 20 2020, 6:12 PM
Krinkle changed the task status from Stalled to Open.

Change 804591 had a related patch set uploaded (by Esanders; author: Esanders):

[mediawiki/core@master] Add an expiry time option to mw.storage

https://gerrit.wikimedia.org/r/804591

While this seems to be working fine, I'm not happy with the performance numbers for this. Especially on mobile, where unlike desktop browsers, devices typically do not read the entire store into memory after the first key access. So it's repeated disk I/O. Not sure whether this is a browser thing or an OS thing, could be that desktop just has a more effective disk cache due to having more spare memory. Either way, the end result is the same.

Quoting this here because I don't think performance is going to be an issue:

I created https://www.measurethat.net/Benchmarks/Show/19293/1/localstoragesessionstorage-set-and-remove

On the slowest browser we support IE11, it can write a 1k value at 5,000 ops/sec and delete at 280,000 ops/sec, so nothing to worry about there.

Other browsers tested:

  • Android 5 + Chrome 61: 15k ops/s write, 66k ops/s remove
  • iPhone X + Safari 11: 2.7M write, 3.7M remove
  • iPhone 8 + Safari 15: 2.1M write, 2.9M remove
  • Chrome latest: 108k write, 2M remove
  • FF latest: 680k write, 1.8M remove

Change 804591 merged by jenkins-bot:

[mediawiki/core@master] mediawiki.storage: Add support for expiry time to localStorage

https://gerrit.wikimedia.org/r/804591

Change 823620 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] mediawiki.storage: Update API guidance to mention new expiry feature

https://gerrit.wikimedia.org/r/823620

Change 823750 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] mediawiki.storage: Remove unused `isRawKey` arg

https://gerrit.wikimedia.org/r/823750

I guess there could be a bug in line 227 now.

  • In function isExpired( key ) the key is the user provided name (might be documented more precisely, has been called raw key recently).
  • Then I would expect that expiry = this.store.getItem( key ); needs to be prefixed by EXPIRY_PREFIX +.
  • .prototype.set() is using the same key for both payload and .isExpired(key) assignment.

Furthermore, I would be more defensive on comparison < of expiry and Math.floor() since the item may not contain a number, perhaps always returning strings, or partially written on browser crash, or manual editing of values by development tools, or changing implementation strategy in future, etc. If not a positive number then always remove.

  • Personally, I would expect .getItem() to return DOMString or null, therefore always parsing.

Change 825299 had a related patch set uploaded (by Esanders; author: Esanders):

[mediawiki/core@master] Follow-up Ifec82b5f: Always try/catch when accessing native storage object

https://gerrit.wikimedia.org/r/825299

I guess there could be a bug in line 227 now.

  • In function isExpired( key ) the key is the user provided name (might be documented more precisely, has been called raw key recently).
  • Then I would expect that expiry = this.store.getItem( key ); needs to be prefixed by EXPIRY_PREFIX +.
  • .prototype.set() is using the same key for both payload and .isExpired(key) assignment.

I'm not sure I see what bug you are describing here?

Furthermore, I would be more defensive on comparison < of expiry and Math.floor() since the item may not contain a number, perhaps always returning strings, or partially written on browser crash, or manual editing of values by development tools, or changing implementation strategy in future, etc. If not a positive number then always remove.

  • Personally, I would expect .getItem() to return DOMString or null, therefore always parsing.

I don't think we are guarding against corrupted data, nor is it obvious what to do if we find an invalid expiry time (delete or keep). We defend against setting data with a key that starts with EXPIRY_PREFIX.

Change 825299 merged by jenkins-bot:

[mediawiki/core@master] Follow-up Ifec82b5f: Always try/catch when accessing native storage object

https://gerrit.wikimedia.org/r/825299

Change 823750 merged by jenkins-bot:

[mediawiki/core@master] mediawiki.storage: Remove unused `isRawKey` arg

https://gerrit.wikimedia.org/r/823750

Furthermore, I would be more defensive on comparison < of expiry and Math.floor() since the item may not contain a number, […] If not a positive number then always remove.

  • Personally, I would expect .getItem() to return DOMString or null, therefore always parsing.

I don't think we are guarding against corrupted data, nor is it obvious what to do if we find an invalid expiry time (delete or keep). […]

In general I expect the browser to have safeguards in place such that its internal storage faccities do not corrupt in this way. For example, through I/O transactions that ensure incomplete writes are discarded on reboot. This isn't for client-side javascript to be concerned about. If there is evidence that browsers consistently don't guard against this and/or if the spec highlights this as developer responsibility, I'm open to changing my mind :)

I think so long as mw.storage doesn't produce an error, e.g. it either considers the value eternally fresh or it considers it as expired, then our contract is met. This is based on the understanding that such a corrupted value does not come from mw.storage, and thus whether we preserve or dispose of it doesn't matter very much. I'd slightly lean toward preserving in that case for improved interop, but as said, I don't consider either choice a bug in mw.storage there. Presumably the bug in that case is, is in the calling code, the bug being that one is reading non-mw values through the mw interface.

Change 823620 merged by jenkins-bot:

[mediawiki/core@master] mediawiki.storage: Update API guidance to mention new expiry feature

https://gerrit.wikimedia.org/r/823620