Page MenuHomePhabricator

Deprecate and remove CACHE_ANYTHING; cache on by default
Open, MediumPublic

Description

The concept of CACHE_ANYTHING is primarily used in core where we want to cache something somewhere even if the "main" cache has not been configured by the local site administrator (e.g. is still CACHE_NONE, per the default settings.) Typically this means that something is sufficiently expensive to compute that we don't mind using the Database to store it - since all cache groups default to CACHE_NONE and ObjectCache::newAnything falls back to CACHE_DB.

This status quo is based on the very old assumption that most keys are "as expensive or less expensive" to generate than a database query, which is why the default is CACHE_NONE, and not CACHE_DB.

I'd like to challenge this assumption and instead recommend that anything sufficiently cheap to generate that doesn't want to be stored in the database, should probably use APC instead and fallback to nothing.

Another way to avoid being stored in the database is by using the new getQoS() interface to determine whether the cache backend is SQL or not. This is already in use in various places and represents a more scalable approach.

Additionally, the problem with CACHE_ANYTHING is that it predated WANObjectCache, which means it isn't multi-DC aware.

Highly expensive, or non-volatileMore expensive, volatileExpensive, volatileCheap, volatile
CurrentMainStash (default: DB)CACHE_ANYTHINGWAN cache (default: None, typically Memcached)Local server (APC if available, fallback: None)
ProposedMainStash (default: DB) ………  WAN cache (default: TBD, typically Memcached)Local server (APC if available, fallback: None)

Action items:

  • Deprecate the concept of CACHE_ANYTHING. Convert uses to MainStash, WANObjectCache, or local server cache.
  • Decide whether it would be beneficial for most things that use WAN cache to use a DB-backed store, or whether not having a cache is still generally a more performant default.
    • Option A) Make DB the default WAN cache, and move things that would no longer make sense for WAN cache (knowing it can be DB-based) down to the LocalServer cache. Downside: This would make then no longer cached as well for WMF.
    • Option B) Keep NONE as default WAN cache, and move things that don't want to ever use "none" for and make sense to fallback to a DB for, to the MainStash (such as rate limiting). Downside: They would move from Memcached to MainStash at WMF. And they would no longer have all the WAN-optimisations for stampede control and tombstones etc.
  • Remove ObjectCache::newAnything(), CACHE_ANYTHING etc.
Uses of CACHE_ANYTHING
  • mediawiki-core
    • Skin (getCachedNotice) – use WANObjectCache – https://gerrit.wikimedia.org/r/362608
    • SpecialVersion (getCreditsForExtension)
    • JobQueue (dupCache param)
    • CacheHelper (for "chunks")
    • ServiceWiring (for SiteStore if !HHVM) – use APC always, maybe with anything as fallback
  • EventLogging
    • RemoteSchema (default cache)
  • CentralAuth
    • SpecialGlobalUserMerge (rate limiting) - use what core uses?
    • SpecialCentralAutoLogin (if !HHVM) – use APC always, maybe with fallback?

Event Timeline

Krinkle triaged this task as Medium priority.Feb 7 2018, 1:37 AM
Krinkle created this task.
Krinkle renamed this task from Deprecate CACHE_ANYTHING to Deprecate and remove CACHE_ANYTHING.May 29 2019, 3:25 PM
Krinkle renamed this task from Deprecate and remove CACHE_ANYTHING to Deprecate and remove CACHE_ANYTHING; cache on by default.Jun 7 2020, 2:28 AM

Change 822119 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] Explicitly set wgMessageCacheType=mcrouter (avoid newAnything in prod)

https://gerrit.wikimedia.org/r/822119

Change 822119 merged by jenkins-bot:

[operations/mediawiki-config@master] Explicitly set wgMessageCacheType=mcrouter (avoid newAnything in prod)

https://gerrit.wikimedia.org/r/822119

Somewhat related:

Maybe just default everything to memcached -> DB? Those are both very reliable, DB is always available, and memcached should be available on any site where the owner cares about performance.

@Tgr Yeah, but this has consequences for db size and latency. There's a difference in how we treat things that (may) be organised by LRU pressure and things that may not be. For example, the MainStash could be backed by Memcached in theory but its documented guruantees and defaults allow a DB to be placed there, thus making it the developer's responsibility to avoid high-cardinality data. Similar for local-server cache, we place a responsibility on the developer to not fill up apcu since that results in it becoming reset.

On the other hand, the WAN cache has an empty default and should only be set to things where space isn't a concern (i.e. small wiki accepting the risk and using APCU or DB, or larger wiki opting in with redis/memcached).

Defaulting the main cache to DB would place a lot of data there and create potential DOS vectors as well. Yet, I did propose it as "option A" in the task description, but we'll probably need to mitigate risk somehow.

In WordPress there's a distinction made between transient and persistent cache (one defaults to DB, one defaults to none and expects APCU or Memc). We currently have four distinctions per https://www.mediawiki.org/wiki/Object_cache.

  • LocalServer (APCU or none),
  • Main/WAN (Memc/Redis/other or none),
  • MicroStash (Memc or DB),
  • MainStash (DB).

I suppose that gives us enough pillars to attach meaning to, but I'm struggling to see a good way to have both DB as the default for WAN, and yet also continue the practice of assuming LRU. Or maybe we can embrace what we do with APCU where we say basically, yes, we'll use MySQL as default WAN cache, and if it gets too large, it's the site admin responsibility to replace it with something else.

My issue with that is that we currently make such responsibilities fairly predictably, based on size of the community and traffic on a given wiki. Whereas, I suspect for may of these cache keys, it just takes 1 bad actor with a fairly low req/s to saturate a cache group, so it doesn't correlate well with the status quo of "use Memc/Redis if you're large" where "large" is (I believe) meant to convey externally observed size of the wiki in terms of users, pages, and traffic. Not, the literal size of a (cache) database that you have very little control or visiblity over by default.

@Krinkle mainly I am wondering about $wgSessionCacheType. With the various CACHE_* keywords, either we default it to CACHE_DB (which is not great for performance) or it might end up either CACHE_NONE or a broken CACHE_ACCEL implementation, which would break the wiki. MicroStash would work, but that's an ObjectCache entry point, not a CACHE_* keyword, so it can't be used. (The whole CACHE_* / ObjectCache::get* split is pretty confusing IMO.)

@Tgr The CacheType configuration variables mostly exist, as I understand it, as a way to selectively move caches behind selected notable components elsewhere. For the most part, components use one of the four cache interfaces directly, which are configurable wholesale via a dedicated configuration variable (e.g. wgMainCacheType, wgMainStash, wgMicroStash).

As we improve the (now, four) cache service interfaces improve their documented requirements and defaults, I expect the number of needed (and supported) component-level overrides to go down.

The constants are, in my mind, not overlapping with or competing with this notion. The constants represent a very small subset of wgObjectCaches keys that can be assigned to either of these to kinds of configuration variables (per-component vars like ParserCache and SessionManager, and general services like MainCache/MainStash/MicroStash).

The odd one out is CACHE_ANYTHING which this task seeks to deprecate and remove.

The question of whether SessionManager could be satisfied by MainStash or MicroStash is an interesting one. At glance, SessionManager expectations are not met by requirements at https://www.mediawiki.org/wiki/Object_cache for MainStash (in particular "low latency"), nor MicroStash (in particular "local writes" and "local reads"). However, historically MainStash and SessionManager were indeed co-located in Redis. If WMF settles on a different MainStash backend and is comfortable imposing that for large wikis as a documented requirement, then we could once again co-locate these two and use it as a default, perhaps even without needing a config option (or make it nullable, and use MainStash service when null). Anyway, that's orthogonal to this task.

I think for SessionManager we'd want a default that:

  • implicitly defaults to CACHE_DB, just like CACHE_ANYTHING does today. Its latency is already accepted today and MW core has no runtime requirements besides a DB so there is realistically no other default that there could be.
  • automatically use something better when available, somehow. That is, we don't want sys admins to have to set each of these services to the same thing by hand. Informing MediaWiki about Memcached via MainCache should automatically promote stuff to it as appropiate, as it does today.

CACHE_ANYTHING considers MainCacheType before anything else. So in essence, once this task is resolved and we don't have "none" as default for MainCacheType, then letting SessionManager use MainCacheType as its unset default, is equivalent to today's logic. In terms of implementation, that woudl probably look like null in order to preserve the feature of automatic promotion and feature to override via config.