Page MenuHomePhabricator

compressOld.php and live now support arbitrary conditions
Closed, DeclinedPublic



If suitable, please merge the live compressOld.php and from /home/wikipedia/common/php-
1.4/maintenance with the 1.4 and 1.5 CVS versions.

I received a request to exclude categories and their tallk
pags, which are currently in considerable flux, from the
concatenated compression to make it easier to delete them. I
implemented that by adding support for arbitrary SQL
restrictions in the query which selects which articles to
compress. No safety checks - it's a raw SQL inclusion into
the query, which seems OK for a maintenance script.

It's currently running live on the site, most recently
started like this:

nice php compressOld.php en wikipedia -e 20050108000000 -q "
cur_namespace not in (10,11,14,15) " -a Burke | tee -
a /home/wikipedia/logs/compressOld/20050108enwiki

Now shows the query when starting it, in part because it can
take 700 seconds to run and in part to show the query in
case there's a problem with it:

Starting article selection query cur_title >= 'Burke' AND
cur_namespace not in (10,11,14,15) ...

This one is excluding template, category and their talk

EXPLAIN /* compressWithConcat */ SELECT
cur_namespace,cur_title FROM cur WHERE cur_title

'Burke' AND cur_namespace not in (10,11,14,15) ORDER BY


  • row 1 *** table: cur type: index possible_keys: cur_title key: cur_title key_len: 255 ref: NULL rows: 1420880 Extra: Using where

No problems with the explain result.

Priority set to high because someone is going to hit a
conflict for this when pushing CVS to the live site if it's
not merged first.

Version: 1.4.x
Severity: normal


TitleReferenceAuthorSource BranchDest Branch
Add a small tool to convert RDF results in CSV formatrepos/search-platform/IGUANA!3dcaussetransform-to-csvmain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 8:13 PM
bzimport set Reference to bz1518.
bzimport added a subscriber: Unknown Object (MLST). wrote:

Bug fix for the change in the live version - included an and for the
extra condition when it wasn't necessary. wrote:

Now includes a partial fix for the case where the concatenated
version would stop with a disconnected from database server error
after processing a large number of old record updates (15,000+ seen
in one case) - slaves are now checked for lag/pinged after every 500
old record examinations/updates. Also checked before starting for any
case with (currently 200) old records to consider.

It's still possible for the script to be disconnected from the master
when the script gets a large number of old records and takes many
minutes loading the results.

seems fixed, reducing priority.

Never merged, but the patch is no longer required since the deletion bug is fixed.