Page MenuHomePhabricator

Refactor slow global analysis components
Closed, ResolvedPublic13 Estimated Story Points

Description

T342444 was halted because the reindexing was too much slow.

  • Update config with a more efficient interim analysis chain in case any reindexing needs to be done.
  • Refactor recent analysis upgrades (acronyms and camelCase) to be acceptably efficient as custom filters in the extra plugin a new plugin
    • Enable plugin version-checking in analysis config (so we know we have the new extra plugin)
    • Enable less expensive fallback versions of camelCase and acronym processing for 3rd party users without the new plugin
  • Possibly investigate other slow points in global configs (implement immediately or open new tickets)

New dependency: We can/should link this with T332337, which also needs a new filter and put everything in one new plugin.

Event Timeline

TJones updated the task description. (Show Details)
TJones updated the task description. (Show Details)
TJones renamed this task from Refactor slow analysis components to Refactor slow global analysis components.Sep 11 2023, 3:21 PM
Gehel set the point value for this task to 13.Sep 11 2023, 3:39 PM

Change 957806 had a related patch set uploaded (by Tjones; author: Tjones):

[mediawiki/extensions/CirrusSearch@master] Refactor and Revert Analysis Harmonization

https://gerrit.wikimedia.org/r/957806

Change 957806 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Refactor and Revert Analysis Harmonization

https://gerrit.wikimedia.org/r/957806

We previously discussed how to bundle the new filters, but talked about it again today.

Since acronym and camelCase processing aren't language-specific, creating a separate plugin isn't an obvious requirement or even desirable. Moving them into the extra plugin made sense from an architectural point of view, but the added complexity for our own deployment, 3rd party users, and even developers is undesirable. Trying to resolve everything in the config builder by testing for specific WMF versions of plugins (e.g., "v7.10.2-wmf5 or newer") is possibly more complexity than it is worth at the moment.

OTOH, creating and checking for the presence of a new plugin is easy. Though it is possible that in the future that the overhead of many plugins is a problem, but at the moment there is no evidence of that. For now, our standard operating procedure will be to create a new plugin when we have a batch of new filters to create.

As a big-picture compromise, it makes sense to work on T332337 (ICU tokenizer repair) before returning to T332342 (folding), and bundle the new filter there with the two here, so that all three new filters can be in one plugin.

Change 965602 had a related patch set uploaded (by Tjones; author: Tjones):

[search/extra@master] Refactor Acronym Fixer Analysis into New Textify Plugin

https://gerrit.wikimedia.org/r/965602

Change 965603 had a related patch set uploaded (by Tjones; author: Tjones):

[search/extra@master] Refactor CamelCase Analysis into Textify Plugin

https://gerrit.wikimedia.org/r/965603

Change 965793 had a related patch set uploaded (by Tjones; author: Tjones):

[search/extra@master] Add limited_mapping to Textify Plugin

https://gerrit.wikimedia.org/r/965793

Change 965575 had a related patch set uploaded (by Tjones; author: Tjones):

[mediawiki/extensions/CirrusSearch@master] Allow Fallback Filters, Config CamelCase Plugin

https://gerrit.wikimedia.org/r/965575

Change 965576 had a related patch set uploaded (by Tjones; author: Tjones):

[mediawiki/extensions/CirrusSearch@master] Config Acronym Fixer Plugin

https://gerrit.wikimedia.org/r/965576

Change 967912 had a related patch set uploaded (by Tjones; author: Tjones):

[mediawiki/extensions/CirrusSearch@master] Allow limited_mapping when textify plugin is present

https://gerrit.wikimedia.org/r/967912

Change 965575 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Allow Fallback Filters, Config CamelCase Plugin

https://gerrit.wikimedia.org/r/965575

Change 965576 merged by Tjones:

[mediawiki/extensions/CirrusSearch@master] Config Acronym Fixer Plugin

https://gerrit.wikimedia.org/r/965576

Dev notes and details on Mediawiki.

Highlights:

Change 967912 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Allow limited_mapping when textify plugin is present

https://gerrit.wikimedia.org/r/967912

Change 965602 merged by jenkins-bot:

[search/extra@master] Refactor Acronym Fixer Analysis into New Textify Plugin

https://gerrit.wikimedia.org/r/965602

Change 965603 merged by jenkins-bot:

[search/extra@master] Refactor CamelCase Analysis into Textify Plugin

https://gerrit.wikimedia.org/r/965603

Change 965793 merged by jenkins-bot:

[search/extra@master] Add limited_mapping to Textify Plugin

https://gerrit.wikimedia.org/r/965793