So is anything other than EventRelayer blocking this?
Fri, Feb 15
I'd prefer we didn't take preferences and bits of data like this onto sessions (we used to many years ago and that was lame), especially given the cross-DC latency. I worry people will go overboard with such features that cause slow writes.
I think Timo backported the change, so it should be live:
$wgDBname and $wgDBprefix are used by methods like wfWikiId() and "DB domain" methods from WikiMap. The former is a very old method. They are assumed to contain the DB name/prefix of the current wiki (wiki farms will set this depending on the vhost or URL or something).
Thu, Feb 14
Wed, Feb 13
I updated the "Database transactions" and "Performance guidelines" pages, which I'd hope most people that work on MediaWiki will encounter when looking for generic guidelines. Things can be better linked, though that's a broader matter...
Mon, Feb 11
Sun, Feb 10
I recall something like this when testing around with the old python-memcached-relay daemon expirement (before we decided on mcrouter instead). That used non-blocking checks and conditional sleep (polling), but since there is only one server here, the message() timeout could work. Something like https://github.com/andymccurdy/redis-py/issues/631 with try/catch for redis.TimeoutError and resubscribe logic should be doable.
Thu, Feb 7
Wed, Feb 6
Mon, Feb 4
Sun, Feb 3
Also note that even the old LRU algorithm avoided bumping things in a slab more than once a minute (https://memcached.org/blog/modern-lru/), so that perhaps make it more likely that some other keys are flooding out the stab, since being "hot" does not mean being at the top of the list so much (if there are other things like prepared edit parse blobs and parsoid serialization blobs coming in on every edit).
Wed, Jan 30
Tue, Jan 29
Is this still reproducible in master?
Sat, Jan 26
Fri, Jan 25
Things needed here:
- Use only mcrouter in deployment-prep (no multiwrite) from MW
- Remove puppet code for deployment-prep
- Install mcrouter on the memached servers used by labswiki
- Make MW use mcrouter on labswiki
- Remove "memcached-pecl" cache entry from config
- Remove labswiki nutcracker code from puppet
I don't think ChronologyProtector is involved.
Wed, Jan 23
From a perspective of layered architecture and separation of concern, I'm not sure I like the idea of a MediaWiki script. But some script that reads from etcd and does the updates to the table seems reasonable.
To be useful, the tables.sql and other bits need to also handle idempotence or have some script parameter to skip them though...
Error: 1050 Table 'blobs_cluster24' already exists (10.64.32.184)
Fri, Jan 18
Did it finish?
Jan 16 2019
Another option, is using the existing 'ChronologyClientId' header that MediaWiki supports for acting on behalf of a client (e.g. no need to forward the agent/IP).
Jan 14 2019
Jan 12 2019
Jan 9 2019
Jan 7 2019
This was probably fixed alongside T207979.
I don't recall an overriding reason it has to be there. Seems reasonable to experiment with changing it as long as the common desktop case (fuller window) looks the same.
Dec 21 2018
That sounds right.
Dec 20 2018
Dec 19 2018
I see EntityRevisionCache and CacheAwarePropertyInfoStore seems to use set() on invalidation. Also, EntityRevisionCache
and CachingEntityRevisionLookup, and PopulateInterwiki seems to call delete() on a non-WAN cache instance.
The current callers don't assume the level of durability as with mysql, just that the data will likely not be randomly removed (e.g. high eviction rate, power outage, network blips). The WAN cache callers can handle a fair amount of that on the other hand.
Dec 18 2018
We need persistence and replication. The plan is to use the same store as session for the rest of the object stash usage (probably Cassandra). Flags like WRITE_SYNC might be used in a few callers, and should use appropriate backend requests (e.g. QUOROM_* settings in Cassandra). The callers of the main object stash all need persistence and replication though (callers have already been migrated to stash vs WAN cache and such).
Dec 13 2018
Dec 11 2018
I'm not sure why the recache() calls would cause many CAS commands though. The only threads doing the regeneration (and CAS) would be those of requests doing updates...which should not be that frequent.
Dec 10 2018
Dec 8 2018
Dec 2 2018
To clarify, the $useMutex logic in WAN cache never triggers due to minAsOf=INF, resulting in stampedes when someone invalidates the cache. Instead, this should be treated like a regular TTL expiration and have one thread at a time doing regeneration.
Nov 30 2018
Nov 26 2018
Nov 22 2018
@Gilles: Comcast only has cable infrastructure in terms what the ISP provides itself. For customers with cable, they can also get XFinity Mobile (https://www.tomsguide.com/us/xfinity-mobile-faq,news-25223.html) . That's basically just a bunch of Wi-Fi hotspots build off of Verizon. I don't know how many people are using that and it seems new-ish. Also, the latency figures are quite low, which makes me doubt that it is XFinity Mobile and more likely regular wireless/xfinity.
It looks sane, though I wonder why Comcast is so high in usage for mobile? Is that mostly from touchpad devices instead of smartphones?
Nov 20 2018
It definitely seems like something worth doing. Having the potential for high use cache keys becoming unusable for undefined periods of time is too much of a stability concern.
Nov 19 2018
Nov 14 2018
Since CategoryMembershipChangeJob runs via the job queue, wouldn't that have little effect on save timing itself? I guess it wouldn't hurt to optimize.
Nov 9 2018
Nov 8 2018
wl_notificationtimestamp is not meant to store the time the article was watched but the last revision the user saw on the page (NULL if they saw the latest revision). This would require a new column. Ideally, if watchlist sizes were limited, this woudn't need an index, but they are not.
Nov 7 2018
Keys are set by add/cas normally, so it seems like some key that takes a long time to regenerate might have expired (there are two data points at the elevated value over more than just a few seconds) or a class of many keys expired. The other possibility is some sudden change in access patterns for keys, which seems less likely, especially the more periodic this is.
Nov 6 2018
Nov 5 2018
Fixed in bf30fcb71427d673f7c83a067b3241040d3470b6. Rollback is used instead and uses $ignoreErrors so as not to trigger the exception in reportQueryError().
Cleaned up in 633eb437a3b808518469c6eaf4e86a436941d837
Nov 2 2018
Nov 1 2018
openConnection is badly named and still reuses connections. You'd probably want getConnection with CONN_TRX_AUTO
Oct 29 2018
What about our use of register_postsend_function? Is there anything equivalant?
Oct 28 2018
Oct 27 2018
Closing, per " The Error Occurs if the memcache is too slow".
This will be better with a3d6c1411dad3e057b if there are many message pages that exists for extension use.
4b1db1190bb8f2a115c6a81a5ee487b7d18cd303 seems more likely.
Note that git master (19dd28798163) installs fine with postgres, which has the same DB domain patches as 1.32.
Oct 26 2018
It looks like the errors come from some tool (JS?) that fires a bunch of API requests from a Special:Search tab to edit numerous pages in parallel. Each burst always for a certain user ID with a single referrer URL.
Does this really need to call commitAndWaitForReplication() when there is only one batch? Is it ever called thousands of times in a row?
Oct 25 2018
Oct 24 2018
Oct 22 2018
In the getMasterDatabase() method posted above, I noticed that the database domain (e.g. DB/schema/prefix) is missing from getConnection(). Instead that should be: