Reindex Commons and Wikidata on eqiad and cloudelastic
Closed, ResolvedPublic3 Estimated Story Points
Actions

Assigned To

Authored By

	TJones
	Apr 19 2021, 10:55 PM

Description

All of the Italian-language wikis and most of the numerous English-language wikis from the parent task (T274200) have finished reindexing.

The following still need to be reindexed:

file index for commons on eqiad
file index for commons on cloudelastic
all indexes for wikidata on eqiad
all indexes for wikidata on cloudelastic

It may (or may not) make sense to wait until some of the other recent reindexing-related tasks are complete before working on this task.

Details

	Subject	Repo	Branch	Lines +/-
	Disable replicas while reindexing	mediawiki/extensions/CirrusSearch	master	+65 -138
	Limit load generated by Reindexer auto-slicing	mediawiki/extensions/CirrusSearch	master	+28 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Mstyles	T222669 Normalize homoglyphs in mixed-script tokens when possible
Resolved	TJones	T268730 startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards
Resolved	TJones	T274200 Reindex English and Italian wikis to enable homoglyph plugin
Resolved	TJones	T280601 Reindex Commons and Wikidata on eqiad and cloudelastic

Event Timeline

TJones created this task.Apr 19 2021, 10:55 PM

MPhamWMF moved this task from needs triage to Current work on the Discovery-Search board.Apr 26 2021, 3:52 PM

MPhamWMF edited projects, added Discovery-Search (Current work); removed Discovery-Search.

https://phabricator.wikimedia.org/T280184 will likely block the reindex efforts until we fix it (at least for cloudelastic)

EBernhardson mentioned this in T279636: cloudelastic* timeout while checking shards.Apr 26 2021, 9:15 PM

It looks like previous reindexing attempts caused T279636, by proxy of creating excessive load in cloudelastic and stressing out the garbage collector. Two things we can do to reduce the load of reindexing:

By default cirrussearch asks elasticsearch to reindex using a task per-shard. That would be 32 parallel reindexing tasks running on the 6 node cloudelastic (vs 35 node prod clusters). We can set this by passing --reindexSlices to the UpdateSearchIndexConfig.php script. If memory serves the number of shards needs to be divisible by the chosen number of slices for efficiency reasons. While this can be done from the command line, i suppose cirrus could be updated to check the number of nodes in the cluster and make a better decision when auto detecting number of slices.

While I didn't look too deeply, and we've already deleted the previous failures so can't check, it looks like we set the replica count when first creating the index. A more efficient method we use for completion suggester is to set the index to 0 replicas, push all the data in, and then set the replicas to it's final value. This allows the system to copy the final state between instances instead of repeating the indexing work for each replica. This will require some light verification, and then code changes if it's not already doing this in some indirect form.

Change 682768 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Limit load generated by Reindexer auto-slicing

https://gerrit.wikimedia.org/r/682768

gerritbot added a project: Patch-For-Review.Apr 26 2021, 10:49 PM

EBernhardson claimed this task.Apr 27 2021, 5:54 PM

EBernhardson moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.

Change 683117 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Disable replicas while reindexing

https://gerrit.wikimedia.org/r/683117

Change 682768 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Limit load generated by Reindexer auto-slicing

https://gerrit.wikimedia.org/r/682768

ReleaseTaggerBot added a project: MW-1.37-notes (1.37.0-wmf.4; 2021-05-04).May 3 2021, 11:00 AM

This will be ready for us to try reindexing again after this weeks train.

Change 683117 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Disable replicas while reindexing

https://gerrit.wikimedia.org/r/683117

ReleaseTaggerBot edited projects, added MW-1.37-notes (1.37.0-wmf.5; 2021-05-11); removed MW-1.37-notes (1.37.0-wmf.4; 2021-05-04).May 4 2021, 8:00 AM

Maintenance_bot removed a project: Patch-For-Review.May 4 2021, 8:10 AM

TJones mentioned this in T147505: [tracking] CirrusSearch: what is updated during re-indexing.May 13 2021, 6:23 PM

Removing Erik as the asignee because he worked on the code to improve reindexing (thanks!) but we still need to do the reindexing for these specific wikis, and anyone can pick up the task.

MPhamWMF set the point value for this task to 3.Jun 14 2021, 3:18 PM

TJones claimed this task.Jun 14 2021, 5:19 PM

TJones moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.

The Cloudelastic reindex of commonswiki finished without an explicit error, but died when it tried to create an archive index—so it didn't get to the file index.

TJones updated the task description. (Show Details)Jun 15 2021, 4:41 PM

hmm, that is "correct" operation with respect to the archive index. The archive index has private data that can't be exposed to cloud. We should make it fail in a more behaved manner. To do just the file index we can change UpdateSearchIndexConfig.php to UpdateOneSearchIndexConfig.php and add --indexType file to the args.

Yeah... I just wasn't thinking about it. I have a tiny patch for T280184 that turns that fatal error into an output message, so it can continue on to the File index under normal operation.

TJones updated the task description. (Show Details)Jun 16 2021, 1:36 AM

TJones updated the task description. (Show Details)Jun 16 2021, 2:59 PM

TJones moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

Gehel closed this task as Resolved.Jun 21 2021, 11:37 AM

Reindex Commons and Wikidata on eqiad and cloudelasticClosed, ResolvedPublic3 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Reindex Commons and Wikidata on eqiad and cloudelastic
Closed, ResolvedPublic3 Estimated Story Points
Actions

Related Objects
Search...