Page MenuHomePhabricator

Add maintenance script to wipe term_search_key and term_weight columns
Closed, ResolvedPublic

Description

To efficiently empty those columns (which we no longer use), let’s write a dedicated maintenance script. rebuildTermSqlIndex should effectively do the same thing, but it would probably be much slower, since it would have to load each entity from the database – that shouldn’t be necessary for us.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 6 2018, 1:37 PM

Does the script need an option to stop after a certain amount of work has been done? So far I’m following rebuildTermSqlIndex and adding --from-id and --batch-size options, but no --limit options or anything like that.

Change 425294 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Add maintenance script to clear term_search_key+term_weight

https://gerrit.wikimedia.org/r/425294

hoo added a subscriber: hoo.Apr 11 2018, 6:41 PM

Does the script need an option to stop after a certain amount of work has been done? So far I’m following rebuildTermSqlIndex and adding --from-id and --batch-size options, but no --limit options or anything like that.

We probably don't want one script to just run forever, so it will need some way to stop and continue. Stopping can usually just be done with timeout… but it then needs to be able to pick up the work again.

Also I wonder, why we even bothering clearing out term_weight… having 0.0 probably has very little/ no benefit compared to just having something in there.

Stopping can usually just be done with timeout… but it then needs to be able to pick up the work again.

The script prints the current term_row_id after each batch, so you should be able to resume (--from-id) from that.

Also I wonder, why we even bothering clearing out term_weight… having 0.0 probably has very little/ no benefit compared to just having something in there.

Hm, good point… but perhaps it could be confusing for replica db users if some rows still have the term_weight even if they’re not supposed to rely on it?

If you want, I can remove it from the script and clear only the term_search_key.

hoo added a comment.Apr 12 2018, 9:11 AM

Also I wonder, why we even bothering clearing out term_weight… having 0.0 probably has very little/ no benefit compared to just having something in there.

Hm, good point… but perhaps it could be confusing for replica db users if some rows still have the term_weight even if they’re not supposed to rely on it?
If you want, I can remove it from the script and clear only the term_search_key.

IMO the tables are primarily an internal interface, so there's no need for such niceness… not updating the field will also mean less UPDATE load when doing this.

I’ve made it an option so I wouldn’t have to adjust the existing tests (and also because I prefer --clear-term-weight=true to be the default behavior, even though we’ll use --clear-term-weight=false).

Change 425294 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add maintenance script to clear term_search_key+term_weight

https://gerrit.wikimedia.org/r/425294