This ticket aims to resolve two conflicting desires:
- concurrency control for all parses using poolcounter (see T387478)
- avoiding flooding the parser cache with irrelevant entries (see T346765)
If we apply concurrency control using poolcounter, one process that needs parser output will wait while another process is parsing that output. When the other process releses the lock, the first process expects to find the parsed output in the ParserCache. If it doesn't, it needs to parse again (or fail).
This works fine as long as we always write the result of parsing into the parser cache. But we don't if ParserCacheFilter sais we shouldn't (e.g. we don't cache any output for the File namespace on Commons). So the poolcounter wait is pointless and pontentially even harmfull in these cases.
The solution would consist of two parts:
- Be to make ParserCacheFilter nore nuanced: instead of preventing caching, it could reduce the TLL for the cache entry.
- Make ParserCache skip storage in the database backend if the TTl is too low. We could either implement that in a BagOStuff layer, or make ParserCache itself aware of a short-term cache to use for entries with low TTL. Or we could move the ParsercacheFilter logic into ParserOutputAccess. In any case, the short term cache could be the one that is used RevisionOutputCache (which is WANObjectCache).
There are several time limits that would need to be tuned to match:
- the poolcounter lock timeout (currently 15 seconds in production, after which a stale cache entry is used if possible)
- the threshold used by ParserCacheFilter to decide whether the output's TTL should be reduced.
- the threashold ParserCache should use when deciding whether an entry should go into the database (perhaps the same as OldRevisionParserCacheExpireTime, which is 1 hour in production).
- the TTL to which output with "fast parse" is reduced by ParserCacheFilter (must be smaller than the threashold in Parsercache, so maybe 30 minutes).
To keep things simple, perhaps ParserCacheFilter should be the one that decides whether the entry should go to the persistent cache or a transient cache.