Page MenuHomePhabricator

Use memcached (or something similar) to keep the latest chd_seen state, only flush to table every once in a while
Open, LowPublic

Description

Instead of updating chd_seen whenever we dispatch to a wiki, we could also put the id of the change we last dispatched into a memcached key and only write it back to the table every now and then (for example every 10th time). This saves us most of the writes to that table, but also makes some things more complicated. For example, we would probably need a different algorithm for selecting the next wiki to dispatch to (a weighted list could work for this, for example). Also it would be harder (but not impossible), to get statistics about the current dispatch lag.

Dispatching changes twice very occasionally is not a problem since we have de-duplication for the RecentChanges entries being inserted (we would potentially purge some pages twice, though).

This only makes sense after T162556: Consider only updating wb_changes_dispatch after a successful run has been implemented.

Event Timeline

This basically proposes a write buffer. We should try hard to always flush when a dispatch process terminates. There is no way to know whether it's the last running process. In a setting where the dispatch script will not be run again for several minutes, the write buffer could otherwise remain vulnerable until the next run.

Addshore lowered the priority of this task from Medium to Low.Oct 18 2018, 7:49 AM