Fri, Apr 19
Thu, Apr 18
Tue, Apr 16
If it's not actually part of a job, it should be DB_REPLICA. Also, even when it should use DB_MASTER it should not be using getCachedGroupDefinitions() since it should skip the cache.
I think you can just add a method, similar to LBFactory::getChronologyProtectorTouched, that exposes the client ID, maybe call it LBFactory::getChronologyProtectorClientId();
Can someone please prioritize this? This basically means that very few people are using edit stashing on enwiki, which has this editor as the default. That seems like a pretty bad regression from the regular editor being the default.
Mon, Apr 15
Fri, Apr 12
Given all the PHP memory/speed enhancements, some benchmarking is probably in order. Maybe BenchmarkParse could use a preprocess-only option. I'm curious how PHP7 compares to HHVM for large pages.
So, I don't really think loading the parsed HTML into a DOM object like this is really acceptable performance-wise, and I don't see any trick to make it fast.
Thu, Apr 11
I made a version of the page that extracts less djvu pages (but still many) and tried to profile again:
Tue, Apr 9
Mon, Apr 8
I can't reproduce this on cf8f1fe9b0b5647d1ed3955f6ea6bce7d76fed44.
Have you compared LCStore with sqlite (as defined by the current installer) vs cdb?
Sun, Apr 7
Fri, Apr 5
Mon, Apr 1
Fri, Mar 29
Some cases still tale scan, but those are pages or API modules that aggregate everything, so it's not avoidable there.
Thu, Mar 28
LCStore_DB has gone through numerous changes since this report. For example, with mysql/postgres a separate auto-commit mode connection is used to avoid staleness and contention. The sqlite installer make a dedicated LCStore_DB database file to avoid contention and failed writes.
Wed, Mar 27
Table not used anymore.
Does this still occur?
FYI, I was playing around with my PHP interpreter last night using:
They don't seem related nor are actually thrown exceptions (just warnings caught by MWExceptionHandler and logged using Exception objects).
Tue, Mar 26
Mar 24 2019
You can even just omit the dbname from $wgDBservers. That is also a cleaner config for wiki farms. The servers (masters + X replicas) themselves don't really have a concept of "the table prefix", or even "the database" if it's mysql (rather than sqlite/postgres). Each wiki uses some db/prefix (might be the same DB or per-wiki or grouped or whatever). Calls to LoadBalancer::getConnection() or wfGetDB( DB_REPLICA, , $wikiDbDomain ) can just use the db/prefix contained in $wikiDBDomain, which is $wgDBname and $wgDBprefix if it's the current wiki.
Code should always handle the current wiki (e.g. wfWikiId/$wgDBname) case and not require it to be set in $wgLocalDatabases. The job queue should be fine now (at least in master, 32, and 31). If anything somewhere still has that problem, then it's a bug.
Mar 22 2019
I don't see a way to avoid having to set $wgDBname. It was already used to determine things like wfWikiId(), which is used by some things to get DB connections. Having them both set and mismatched has long since been a use case that is not well supported. The correspondence needs better documentation though.
Mar 21 2019
Why does the job itself have all of the transformed text rather than just a revision/page ID and use them to derive the transformed text? I get that some metadata is not stored elsewhere and would have to go in the job.
Mar 20 2019
The wetede_rand index seems like it will need to be changed as I mentioned, given the size. Otherwise, there are two many cases that could involve massing scanning.
The $wiki !== wfWikiID() check was meant for the local wiki case (as wgLocalDatabases is either fully set for some farms and often empty for smaller sites).
Mar 19 2019
@Tgr might know better.
Maybe the User::saveSettings() call has its clearSharedCache() call deferred in the old code, due to hasOrMadeRecentMasterChanges(), but is immediate in the new code. If the tests relied on the old cached value being there, then they could fail.
Actually, given the nature of tests all using DB_MASTER, I don't yet see what would be significantly affected here. It can't be hasMasterChanges() since that wouldn't have changed, doneWrites() is not used, lastDoneWrites() is only used by lastMasterChangeTimestamp(). That in turn is used by hasOrMadeRecentMasterChanges() and waitForReplication().
I'd imagine --use-normal tables would pass and not having that would fail. In production, there is no reason to treat temporary table operations as writes (e.g. for triggering commitMasterChanges as so on). They would just be used for complex reads and accumulated results. For tests, without --use-normal-tables, everything is temporary (though it it's meant to represent permanent stuff). Even still, this rarely matters, though things like User do check hasOrMadeRecentMasterChanges(), which would hit the edge case of everything being temporary (no longer returning true).
Mar 18 2019
Mar 17 2019
Seems to rarely occur (usually either 0 times per 30 minutes or 1 time per 30 minutes).
Mar 16 2019
Seems like it might be T217649 (which should be re-titled itself).
Mar 15 2019
Mar 13 2019
Yes, and CacheAwarePropertyInfoStore should use delete() or such for purges rather than set(). Using set() would only effect one DC.
Can the (extra) space be dedicated more so towards the larger slabs, were we have more problems AFAIK?
How will wikimedia_editor_tasks_entity_description_exists be populated and about how large will it be? Can wetede_language actually go up to 255 length? If the table can be huge, then I also wonder if the wetede_rand index should be on (wetede_language,wetede_description_exists,wetede_rand) instead.
Mar 12 2019
It's also useful as a tunable option for spreading I/O accross cache nodes rather than only being used to get around hard limits.
Which SQL layer? We have SQLBagOStuff using pcxxxx servers, which I suppose could be co-opted for this. As for ES, I wouldn't want to spam a bunch of one-off blobs in there since it is meant to be append-only. Depending on how much code it is I don't mind a bit of complexity. I'm still experimenting around with different ways to do it, but I don't think it has to be that complex.
Mar 9 2019
Does that still happen? I can't see how that would happen in master without files from two versions.
Mar 8 2019
Two pieces, one of them still huge. I'm working on a generic segmentation wrapper for BagOStuff atm, but that will take longer to do (lots of ways to arrange the classes, hard to choice from).
Mar 7 2019
The SET metric for redis is very slow, so wouldn't use 10x that figure.
Job queue has long since been migrated to changeprop
Mar 6 2019
Mar 5 2019
It reduces set()/setInterimKey() calls to volatile (recently invalidated) keys. Each of those calls translates into a GET/CAS since they use merge(). I don't think it will prevent the huge GET spikes we saw (since most did not correspond to CAS and were probably not merge() calls).
Mar 4 2019
Here is a form of the query without filesorts: