Page MenuHomePhabricator

Add a new MW_ constant for long-running scripts to avoid using stale data from static cache variables
Closed, DeclinedPublic

Description

rSVN52460 introduced a local cache which could -- in a long-running script -- pollute the information the script is using if a user happens to be renamed while it's running.

We may want to consider adding a new MW_ constant like MW_COMPILED set in the context of a long-running script which would be polluted by caches meant to make requests more efficient. And caching code such as that would disable itself in such a situation.

Details

Reference
bz31030
TitleReferenceAuthorSource BranchDest Branch
fix no interactionrepos/releng/cli!350addshorefix-no-interactionmain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:54 PM
bzimport set Reference to bz31030.
bzimport added a subscriber: Unknown Object (MLST).

That sort of temporary cache (which cannot be cleared by other processes making updates) is probably best done in a very explicit way, such as how LinkBatch and friends are used to batch multiple title lookups.

By explicitly creating and discarding the cache / batch lookup, the calling code is able to declare that it's starting an operation, fit its lookups within a limited time/space, and explicitly disavow it when done so any future operations that do similar things can restart from the then-current database state.

TTO renamed this task from Add a new MW_ constant for long-running scripts. to Add a new MW_ constant for long-running scripts to avoid using stale data from static cache variables.Jan 23 2017, 1:49 PM
TTO updated the task description. (Show Details)
TTO removed a subscriber: wikibugs-l-list.
Krinkle subscribed.

I believe in general these kind of races are understood to always be possible at our scale. Slightly stale information is expected in regular GET requests such as page views and other browsing around the site, if anything simply because we fetch information from a replica database that may also be slightly behind the very latest changes.

The key to avoiding persisting corrupt, stale or incompatible data back into the database is to not utilise any (unverified) replica or cache information when preparing a database write. In general write-informed code paths must avoid caches or information (in)directly received from a database replica. I expect this to be working nowadays.

I also note that more generally today we don't store user names in as many places as we used to, thus making renames much less eventful given the Actor system we now have in place.