I don't think so.
Thu, Aug 15
Does this still occur?
Mon, Aug 12
Per my comment above, this is the expected behavior.
It's an optional table, not installed by update.php.
Fri, Aug 9
They were obsoleted by flaggedrevs_statistics.
Thu, Aug 8
The remaining vary-revision instances are basic self-transclusions (https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/526157/ should handle those).
Mon, Aug 5
Fri, Aug 2
Thu, Aug 1
Wed, Jul 31
Is https://phabricator.wikimedia.org/T212881#5195101 the error that still happens or is it the read-only one too?
Jobs are fine...though this case is complicated since people want their "latest views" to be immediately reflected...so it would have to do something like WatchedItemStore.
How much of this is unique from T205936 ?
Sat, Jul 27
Thu, Jul 25
Tue, Jul 23
I wonder if this is fixed in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/519565/
The logs for doSelectDomain() look quite for the last 7 days.
959daa2ca44c039e72c8a9a5199d4c74dd05caba added the << $status->value = [ 'warnings' => $upload->checkWarnings() ]; >> line. It seems like checkWarnings() has all kinds of File objects inside of it potentially. Some callback could easily slip in given that.
Mon, Jul 22
ObjectCache always mentioned getMainStashInstance() as "Ephemeral global storage". It was just supposed to *try harder* to be persistent than memcached (rdb snapshots, expectation that stuff can *probably* still be there a week later or so). The existence of redis evictions and consistent re-hashing on host failure making data disappear or go stale was well known at the time it was picked as the original "stash".
Fri, Jul 19
JobQueueException should be thrown from push(), with nothing catching it other than MWExceptionHandler or site-specific caller. Typically, push() should be used pre-send, before preOutputCommit, so everything would just rollback anyway. Jobs pushed after than are enqueued during DeferrableUpdates (directly or indirectly via lazyPush()); in that case, DeferredUpdates should (already) catch any exceptions (not just job queue ones) and rollback on an update-by-update bases. The exceptions are logged in the DeferredUpdates channel (previously the Exception channel).
Also, the timeout exceptions themselves where redis, not LBFactory. The later seemed to just have errors related to the improper shutdown.
Jul 19 2019
Jul 18 2019
Dropping the field doesn't make sense, but dropping the whole table does. We do not use that class in production (and it is optional within MW core).
The redis bug is at T228303
The timeouts correspond with the redis problems:
The timeout aspect seems strange. The huge "idle" time increase at https://grafana.wikimedia.org/d/000000273/mysql sounds like the PageEditStash::parseAndCache() has an infinite timeout instead of 0 seconds (bug, it should be 0 as in non-blocking) and the parsing may have been slowed down for some reason, making more threads wait on the lock. Maybe the concurrent nutcracker issues were also affecting mcrouter (since the same hosts are used). Could also be something adding memcached write load: https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&from=1563458818482&to=1563464680644 looks a little unusual, though not unlike the result of key version changes that happen from release to release (including the slow return to normal set() rate).
OK, replication for SET/DELETE seems fine on mw1261/mw2224 for me and the STORED/NOT_STORED and FOUND/NOT_FOUND replies are what I expect when using (no prefix, /otherdc/mw-wan, and /thisdc/mwwan).
Err, more PEBCAK . I put the * in the wrong spot...
So, I've noticed that on mw1261/mw2224 as *well* as plain old mwmaint1002,mwmaint2001, that broadcasting keys doesn't seem to work, e.g.:
I guess it can go on our backlog.
Jul 16 2019
Is there a codfw host with the patch applied?
In terms of what MediaWiki actually queries, you have the following cases from both eqiad and codfw:
a) Local getWithSetCallback() requests for (regular) value keys: "get WANCache:v:elukey-test" "add WANCache:v:elukey-test" "cas WANCache:v:elukey-test"
Jul 15 2019
The relevant getWithSetCallback() call uses pcTTL, so there still shouldn't be many of these queries. Unless a large number of distinct connections were acquired. Not just that, but connections to different load balancer clusters.
Jul 12 2019
Is $wgMainCacheType set to CACHE_NONE ?
For generic key testing, there is always:
Jul 11 2019
Not seeing this in the logs lately.
Jul 10 2019
I think (1) is more useful and fills a needed gap of writes on GET/HEAD. What I've been doing in T227376 is trying to move things off the Stash that can easily enough use some other store. This narrows down the "problem space".
I strongly prefer 0-based.
Jul 6 2019
Jul 5 2019
Jul 4 2019
Can this be closed now?
Jul 1 2019
The current code will never trigger this path if $wgMiserMode is set, which it is on production.
Jun 28 2019
Is this still a problem?
Jun 27 2019
Not sure what to do with this. It seems like some kind of connectivity problem, not just the server being read-only.
Jun 26 2019
Jun 25 2019
Jun 24 2019
Jun 22 2019
Jun 19 2019
Jun 18 2019
Jun 17 2019
I assume a mixture of opportunistic (limited row count) purging of rows during existing watchlist row changes along with WHERE clause filtering on SELECT to ignore expired rows would work (e.g. similar to blocks and page protections).
This was likely due to an APC change. Filing a separate task for the 4/20 group 2 regression (which seems out of band for deployments).
Some sort of meeting sounds reasonable.