Using the script from T173774: Create script to dump recently changed categories, generate daily dumps of categories that were changed. This will allow to load only daily updates instead of reloading the whole category set (which with commonswiki can take significant time and stalls the updates for up to an hour now).
Description
Details
Event Timeline
Change 378355 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Generate daily diffs for categories RDF
On testwiki and test2wiki I get the following, running for the last seven day interval (all other wikis run ok):
Wikimedia\Rdbms\DBQueryError from line 1443 of /srv/mediawiki/php-1.32.0-wmf.10/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? Query: SELECT rc_timestamp,page_title,page_namespace,rc_title,rc_cur_id,pp_propname,cat_pages,cat_subcats,cat_files FROM `recentchanges` FORCE INDEX (new_name_timestamp) LEFT JOIN `page_props` ON (pp_propname = 'hiddencat' AND (pp_page = rc_cur_id)) LEFT JOIN `category` ON ((cat_title = rc_title)) WHERE (rc_timestamp >= '20180625095244') AND (rc_timestamp < '20180702095244') AND rc_namespace = '14' AND rc_new = '0' AND rc_log_type = 'move' AND rc_type = '3' ORDER BY rc_timestamp ASC LIMIT 200 Function: BatchRowIterator::next Error: 1054 Unknown column 'page_title' in 'field list'
Can we take care of that before this goes live? Even just removing those two wikis from the categoriedrdf db list would be ok.
See T198629 It turns out this is across many of the wikis, not just those two; probably the config settings are such that for test and test2 we get the error output displayed n the console is all. So if you could have a look? Thanks!
@ArielGlen found the bug:
In https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/categoryChangesAsRdf.php$223 we have:
$tables += $extra_tables;
But += does not work as expected on indexed arrays in PHP. using array_merge instead fixes the issue.
$tables = array_merge( $tables, $extra_tables );
This never worked. A local test run immediately failed with an exception.
Change 443449 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Use array_merge to merge indexed arrays in categoryChangesAsRdf.php.
Change 443449 merged by jenkins-bot:
[mediawiki/core@master] Use array_merge to merge indexed arrays in categoryChangesAsRdf.php.
Change 445720 had a related patch set uploaded (by Zfilipin; owner: Smalyshev):
[operations/mediawiki-config@master] Remove labs wikis from the categories-rdf list, don't need them
Change 445720 merged by jenkins-bot:
[operations/mediawiki-config@master] Remove labs wikis from the categories-rdf list, don't need them
Mentioned in SAL (#wikimedia-operations) [2018-07-19T11:46:15Z] <zfilipin@deploy1001> Synchronized dblists/categories-rdf.dblist: SWAT: [[gerrit:445720|Remove labs wikis from the categories-rdf list, dont need them (T198356)]] (duration: 00m 55s)
The dblist fix has been deployed, off to test the actual bash script now. Until now it's all been manual runs across the dblist with direct calls to the maintenance script.
Change 378355 merged by ArielGlenn:
[operations/puppet@production] Generate daily diffs for categories RDF
This is now deployed; I'll check tomorrow that the dailies ran ok, and we'll know about the fulls over the weekend.
Change 449994 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] category diffs: full ts dir is different than dailies ts dir
Change 449994 merged by ArielGlenn:
[operations/puppet@production] category diffs: full ts dir is different than dailies ts dir
I've run the script manually with the above change applied; results are available in the expected location.
@ArielGlenn thanks for your help! I'll watch now how it works over next week and then try to switch dump loading to dailies.