I just realized that the proposed dump format is still using numeric text IDs. That cannot be guaranteed to work, text blobs are now identified by URL-like blob addresses: "tt:12345" is the address of text row 12345, and we may start using "ext:DB:..." for ExternalStore soon.
Re-opening, since issues were found. To be concrete, entities can be stored in extra slots, but Wikibase gets confused when finding a page that exists, could have an entity in an auxiliary slot, but doesn't. Adam is working on it.
That debug info isn't really helpful...
Thu, Sep 20
For the record, a log line showing the equivalent issue for the parser cache:
Memcached error for key "plwiki:pcache:idhash:4336033-0!canonical" on server "/var/run/nutcracker/nutcracker.sock:0": ITEM TOO BIG
@Kelson I suppose this has been obsoleted by https://commons.wikimedia.org/wiki/Commons:Structured_data. Or maybe the plan is to pick it up again once that project has progressed a bit.
I see a bunch of ITEM TOO BIG errors from the blob store cache, too. If big pages don't fir into the edit stash, nor the parser cache, nor the blob store cache, they'll be really slow...
Other URLs I found in the log:
...also, does the wiki have content? Was the main page created properly? Perhaps we are seeing T203982: update.php fails for wikis with zero revisions.
@PlavorSeol if you run update.php without AbuseFilter being enabled, does it finish, or does it also hang?
Wed, Sep 19
I'd really like to move this forward. Ideally, we'd get the new dump format into the 1.32 release. @ArielGlenn is there anything holding this back? Can we have an IRC discussion on this soon?
Ok, I recovered a dump of the data that is failing to be stashed using the method above. It's 4379975 bytes or print_r output, and 4298008 bytes serialized. What's the memcached limit? 4MB doesn't seem too terrible...
I note that in ParserCache, we have the line $this->mMemc->set( $parserOutputKey, $parserOutput, $expire ); with no check of the return value. Perhaps we should also log when that returns false, to see if we also fail to write to the ParserCache, not just the edit stash.
To find out what is getting so big, I suggest to add something like the following to includes/api/ApiStashEdit.php on a debug host:
They should not?
Tue, Sep 18
I don't see anything recent in https://phabricator.wikimedia.org/source/mediawiki/history/master/includes/parser/ParserOutput.php that strikes me as relevant. I added the $mWrapperDivClasses field, which should be tiny. The same patch changed to parser to emit slightly less HTML. If anything, that should have made ParserOutput objects smaller.
@PlavorSeol Please provide more details. What exact version of MediaWiki are you running? Do you have a custom setting for $wgMultiContentRevisionSchemaMigrationStage? What happens if you run update.php again? Can you manually verify that the abuse_filter_action table exists? Were you able to use AbuseFilter before this?
Mon, Sep 17
@Jdforrester-WMF You are right. I was mostly thinking of spam links, but dick jokes in file captions also suck (no pun intended). Changing.
Sun, Sep 16
@Tbayer the problem here seems to be that is that rev_parent_id is set to the wrong revision for some reason.
Fri, Sep 14
Thu, Sep 13
I see no use case for triggeringUser that is not covered by $revision->getUser() or causeAgent. we either want to know who triggered an event (the causeAgent), or who made the edit ($revision->getUser()).
Wed, Sep 12
@Milimetric what'S yoour take on this? Can this do to last call, or should it see more discussion?
let's hope it stays closed this time :)
This use case seems similar to caching parsoid HTML, which is done in RESTbase and backed by Cassandra. It's similar, because it's re-generated upon edit, and accessed from clients upon view, via an API.
[10:55:36] <legoktm> AaronSchulz: Do you have ideas on where else we can hook in to determine when new links get added and by whom?[10:55:36] <legoktm> AaronSchulz: Do you have ideas on where else we can hook in to determine when new links get added and by whom?
Another option would be to force this to go into APC instead of memcached.