Page MenuHomePhabricator

Maintenance.php function purgeRedundantText() not able to deal with big data set
Open, NormalPublic

Description

deleteOldRevisions.php calls Maintenance.php function purgeRedundantText().

Running in on a big DB (more than 200.000 pages) the script dies like following:

Il database ha restituito il seguente errore "1153: Got a packet bigger than
'max_allowed_packet' bytes (localhost)".

I think the SQL reauests including "NOT IN ($set)" or "IN ($set)" are responsible for that because they can only accept a few hundreds or thousands of ids.

Please confirm if I'm right.


Version: 1.16.x
Severity: normal

Details

Reference
bz20651

Event Timeline

bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz20651.
bzimport added a subscriber: Unknown Object (MLST).
Kelson created this task.Sep 15 2009, 12:31 PM

Chad confirmed this. Basically need to refactor so that the function handles smaller chunks of data.

Ciencia_Al_Poder added a subscriber: Ciencia_Al_Poder.

Ugh, just found this problem. It wasn't breaking, but nukePage was taking a lot to delete one single page, and looked at the source code to see why.

Jesus Christ. What I found is horrible. This is Row By Agonizing Row programming, but copying everything in memory, and then construct an old_id NOT IN ( <insert several million of comma separated integers here> ), and expect the server not to choke on this big query.

I'd like to see what happens if someone runs nukePage.php on the English Wikipedia database... Still current as of MediaWiki 1.31: https://phabricator.wikimedia.org/source/mediawiki/browse/REL1_31/maintenance/Maintenance.php$1268