To efficiently empty those columns (which we no longer use), let’s write a dedicated maintenance script. rebuildTermSqlIndex should effectively do the same thing, but it would probably be much slower, since it would have to load each entity from the database – that shouldn’t be necessary for us.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add maintenance script to clear term_search_key+term_weight | mediawiki/extensions/Wikibase | master | +513 -0 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Addshore | T188992 Make wb_terms table fancy | |||
Resolved | Lucas_Werkmeister_WMDE | T188993 Replace term_search_key and term_weight with empty values when wb_terms is not used for search | |||
Declined | Ladsgroup | T189779 Run clearTermSqlIndexSearchFields on Wikidata | |||
Resolved | Lucas_Werkmeister_WMDE | T191631 Add maintenance script to wipe term_search_key and term_weight columns |
Event Timeline
Does the script need an option to stop after a certain amount of work has been done? So far I’m following rebuildTermSqlIndex and adding --from-id and --batch-size options, but no --limit options or anything like that.
Change 425294 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Add maintenance script to clear term_search_key+term_weight
We probably don't want one script to just run forever, so it will need some way to stop and continue. Stopping can usually just be done with timeout… but it then needs to be able to pick up the work again.
Also I wonder, why we even bothering clearing out term_weight… having 0.0 probably has very little/ no benefit compared to just having something in there.
Stopping can usually just be done with timeout… but it then needs to be able to pick up the work again.
The script prints the current term_row_id after each batch, so you should be able to resume (--from-id) from that.
Also I wonder, why we even bothering clearing out term_weight… having 0.0 probably has very little/ no benefit compared to just having something in there.
Hm, good point… but perhaps it could be confusing for replica db users if some rows still have the term_weight even if they’re not supposed to rely on it?
If you want, I can remove it from the script and clear only the term_search_key.
IMO the tables are primarily an internal interface, so there's no need for such niceness… not updating the field will also mean less UPDATE load when doing this.
I’ve made it an option so I wouldn’t have to adjust the existing tests (and also because I prefer --clear-term-weight=true to be the default behavior, even though we’ll use --clear-term-weight=false).
Change 425294 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add maintenance script to clear term_search_key+term_weight