Page MenuHomePhabricator

Maintenance script for setting field data
Open, Needs TriagePublic

Description

In T265894 @Tgr suggested the idea of a maintenance script for CirrusSearch to allow setting arbitrary field data in the ES index, for local development.

In our suggested edits feature, we rely on ORES topics which are not populated in our local wiki; @Tgr wrote a script P10461 to populate this data. We would also like to set the hasrecommendation field (T269493) for articles. Having a maintenance script provided by the extension would be convenient.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The bash scripts could go into the scripts/ directory of CirrusSearch, this is where we keep scripts that are not strictly about running CirrusSearch but in some way related to development or operation.

Personally, I rarely use local pages in CirrusSearch. You can fetch real pages from the production apis, using search in generator mode with the cirrusdoc prop will return the raw documents. A little jq magic can turn that output into something you can pipe into the elasticsearch bulk indexing api. After that setting $wgCirrusSearchDevelOptions['ignore_missing_rev'] = true will let the search engine return pages from elastic that don't actually exist on the local wiki.

I was thinking it could be a proper MediaWiki maintenance script, a bash script would need a lot of configuring.

We would mainly care about articletopic and hasrecommendation though, and those need to have fairly specific field names and values, so I wonder if it would make more sense to put the script in GrowthExperiments.

Change 664411 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/CirrusSearch@master] [WIP] Add functionality to update weighted tags

https://gerrit.wikimedia.org/r/664411

Change 664411 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Add helper method and script for updating weighted tags

https://gerrit.wikimedia.org/r/664411

As far as the Growth use cases are concerned, this is done. I can see the value in a more generic script, not sure how easy that is though without knowing anything about the specific field that needs to be updated.

For Vagrant I took another approach: https://gerrit.wikimedia.org/r/c/mediawiki/vagrant/+/664554 adds an admin panel, which covers this functionality.