Page MenuHomePhabricator

Investigate unexpected flushWrites (mw.loader.store) before arrival of first module response
Closed, ResolvedPublic

Description

When looking at a timeline following the changes by T192623, I noticed that both before and after that update, there's an unexpected call to flushWrites executing before any of the requested modules arrive.

Screen Shot 2018-08-23 at 02.47.07.png (489×1 px, 52 KB)
Screen Shot 2018-08-23 at 02.47.39.png (715×1 px, 41 KB)

The flushWrites call is scheduled by mw.loader.store.update() as called from mw.loader.store.init(), and is executed pretty much right after (more or less the next tick).

Event Timeline

Krinkle triaged this task as Medium priority.Aug 23 2018, 2:02 AM
Krinkle created this task.

I was afraid this happens on all views (which could be costly, to serialise the whole store on repeat views right after reading/parsing it). However, that is not the case. It only happens on first views when the localStorage vary-key is not found.

But, even the "normal" update that happens after the modules arrive, is also fired immediately after, whereas update/flushWrites is meant to be debounced by a few seconds to prioritise module execution and allow for batching.

  1. The update call is not meant to happen that early.
  2. Any update should debounce, instead of flushing the next tick.

Regarding the first point, it seems the early update call has existed since the first commit in 2013 (c719401661e; https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/86867/). Looking at the code more closely, I think I understand its intended purpose may have been to initialise the localStorage key with an empty object, however that doesn't seem useful. If and when modules arrive, they will be added to the store, which naturally results in an update getting scheduled. I'll remove the early call.

Regarding the second point, this used to work. It was broken by me in 2015 with commit 4174b662f623 (https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/252338/). In that change I swapped setTimeout for requestIdleCallback, and re-implemented the debounce using a boolean flag. This changed behaviour two ways (both intended).

First, it started deduplicating in the wrong direction. We want to keep postponing the update until the specified interval has elapsed without another update request happening. Instead, it now deduplicated in a way that the earlier call is preferred. Causing it to start firing earlier, and (in theory) more often.

Second, it reduced the interval from a minimum of 2 seconds, to merely the next idle callback, when tends to be <50ms from now.

Change 454739 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Restore mw.loader.store update postponing logic

https://gerrit.wikimedia.org/r/454739

Change 454739 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Restore mw.loader.store update postponing logic

https://gerrit.wikimedia.org/r/454739

Still not quite right. I should have better read my own commit message from 2015, and I should have better read Ori's documentation comment from 2013.

Specifically:

@Krinkle wrote in 2015

it is no longer needed to cancel and re-schedule an update every time update() is called.

And:

@ori wrote in 2013

When called multiple times in quick succession, the calls are coalesced into a single update operation.
This allows us to call #update without having to consider the module load
queue; the call to localStorage.setItem will be naturally deferred until the
page is quiescent.

I misread this information from Ori as being "inefficient". I thought that deferring the write to 2 seconds after the last update-request would be better than deferring it to 2 seconds after the first update-request. In terms of disk usage efficiency, this thought was true. If a batch comes in, and another batch arrives after it (< 2 seconds after the first ended, but >2 seconds after the first started). Then my change would now result them being in the same flush. My change would also avoid the flush from taking up time in between the two batches.

The problem with my change is that it requires clearing and re-creating a timer in the browser after every single module response being processed. Not after each batch, but after each individual module. Which happens hundreds of time in quick succession and has non-trivial overhead. This is was I was fixing in 2015, and I've just restored that - that wasn't what I planned.

What I planned to restore was the fact that it waits 2 seconds, which we lost.

If I had to choose, I'll pick the 2015 version version over the 2013 version because the space for batching is very narrow in practice. There's gonna be one big batch one very page, and it'll be processed in a single tick of the event loop, so whether 0s timeout, 2s timeout or requestIdleCallback, it's gonna be all in one flush.

But forcing the 2 seconds of delay is useful because these hundreds of modules are gonna have initialisation code of their own that should take precedence. So, let's try this one more time, by doing both. Wait 2 seconds from the first update request. Then, write via requestIdleCallback. No cancelling of timers. And reset the flush lock after it's all done.

Change 456781 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Avoid clear/set timer overhead in mw.loader.store.add

https://gerrit.wikimedia.org/r/456781

Change 456781 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Avoid clear/set timer overhead in mw.loader.store.add

https://gerrit.wikimedia.org/r/456781

Change 457984 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.32.0-wmf.19] resourceloader: Avoid clear/set timer overhead in mw.loader.store.add

https://gerrit.wikimedia.org/r/457984

Change 457985 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.32.0-wmf.20] resourceloader: Avoid clear/set timer overhead in mw.loader.store.add

https://gerrit.wikimedia.org/r/457985

Change 457985 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.20] resourceloader: Avoid clear/set timer overhead in mw.loader.store.add

https://gerrit.wikimedia.org/r/457985

Change 457984 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.19] resourceloader: Avoid clear/set timer overhead in mw.loader.store.add

https://gerrit.wikimedia.org/r/457984

Krinkle moved this task from Blocked (old) to Doing (old) on the Performance-Team board.
Krinkle removed a project: Patch-For-Review.