Page MenuHomePhabricator

Reindex all wikis to enable dotted I fix, Yiddish ligatures, maybe Arabic normalization
Closed, ResolvedPublic

Description

There are a few merged patches that require re-indexing to take effect:

  • T358495—enabling dotted I
  • T362501—Yiddish ligatures (this is not yet deloyed, but is in 1.43.0-wmf.3)
  • T72899—Arabic script normalization... if this one gets merged while Trey is out, it can be reindexed, too!

One other ticket, T180387—disabling hiragana/katakan mapping for English—only needs English to be reindexed, but will be taken care of by the larger reindexing task.

Event Timeline

TJones renamed this task from Reindex all wikis to enable dotted I fix, yiddish ligatures to Reindex all wikis to enable dotted I fix, Yiddish ligatures, maybe Arabic normalization.Apr 30 2024, 9:25 PM
TJones updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2024-05-21T18:44:28Z] <ebernhardson> T363734: start reindex of cloudelastic

I've been working out a new reindexing orchestration for this, found in https://gitlab.wikimedia.org/repos/search-platform/cirrus-reindex-orchestrator/. It has run to completion now on cloudelastic, finishing in under a week (compared to ~3 weeks last time). A review of the logs and the set of live indices in cloudelastic looks like this has been succesfull. Making a few more cleanups to the codebase, and then will start reindexing eqiad and codfw.

Process has completed for eqiad and codfw. Total runtime was 3.5 days in codfw, 5.5 days in eqiad. The difference in time is mostly accounted for by commonswiki failing after more than a day and retrying. Based on review of this run I'm going to update the repository to work on a per-index basis instead of a per-wiki basis. This should reduce the effect of retries on large wikis, and also avoid a problem we see in the current process where commonswiki_content finishes reindexing, but then doesn't get backfilled for a day waiting for commonswiki_file to reindex.