Page MenuHomePhabricator

Reindex all wikis to fix nnbsp regression
Closed, ResolvedPublic5 Estimated Story Points

Description

After the patch on T301131: Test Elastic 7.10 language analyzers is deployed, we need to reindex (almost*) all wikis to enable the change to fix the regression for "narrow no-break spaces".

This is a smaller problem, so it can also wait until after the 7.10 rollout is done. (And it'll be a fun test of the reindex process with 7.10!)

———
* Monolithic analyzers are not affected, but it would probably be easier to reindex everything than trying to carve out the few that this doesn't apply to.

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2022-09-19T22:59:04Z] <ebernhardson> T317200 start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic

This is the first reindex since we switched to elastic 7 and it's running into a couple of issues:

enwiki.eqiad.reindex.log:

[{"index":"enwiki_content_1664293791","type":"_doc","id":"AVQXnGmF62ewIKYZMTMQ","cause":{"type":"mapper_parsing_exception","reason":"failed to parse field [_source] of type [_source] in document with id 'AVQXnGmF62ewIKYZMTMQ'. Preview of field's value: 'id'","caused_by":{"type":"mapper_parsing_exception","reason":"Field [_source] is a metadata field and cannot be added inside a document. Use the index API request parameters."}},"status":400},{"index":"enwiki_content_1664293791","type":"_doc","id":"AVQXnGH_62ewIKYZMTMP","cause":{"type":"mapper_parsing_exception","reason":"failed to parse field [_source] of type [_source] in document with id 'AVQXnGH_62ewIKYZMTMP'. Preview of field's value: 'id'","caused_by":{"type":"mapper_parsing_exception","reason":"Field [_source] is a metadata field and cannot be added inside a document. Use the index API request parameters."}},"status":400}]

enwiki.cloudelastic.reindex.log:

Failed: [{"index":"enwiki_content_1664214849","type":"_doc","id":"AVQXnGH_62ewIKYZMTMP","cause":{"type":"mapper_parsing_exception","reason":"failed to parse field [_source]
of type [_source] in document with id 'AVQXnGH_62ewIKYZMTMP'. Preview of field's value: 'id'","caused_by":{"type":"mapper_parsing_exception","reason":"Field [_source] is a metadata field and cannot be added inside a document. Use the index
API request parameters."}},"status":400}]

viwikibooks.eqiad.reindex.log: Threw circuit breaking exception, P34965

A look at the enwiki_content id's that gave errors, it seems at some point (could probably check dumps to see if it was fairly recent) a couple search queries were somehow indexed as documents. For now I'm willing to call this a one-off error that happened sometime in the past, delete the docs, and then re-run the reindexing process. If it comes up again in the future though we will need to dig into this and understand how it happened.

The viwikibooks error is curious. We can rerun the reindexing process and it will likely work, but this means using elasticsearch's built-in reindexer can fail under some conditions. It's not clear if the reindexer received the 429 error and retried a few times with delays, or if it fails on the first received 429.

Change 836272 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] IndexCreator: Wait for green after creation

https://gerrit.wikimedia.org/r/836272

All wikis reindexed except wikidatawiki, testwikidatawiki, and commonswiki on varying clusters. They are having an issue where elasticsearch reports the index created, but then the settings validation that comes immediately after receives a 404. Above patch should make that less error prone and we can finish the reindex once it's deployed.

Change 836272 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] IndexCreator: Wait for green after creation

https://gerrit.wikimedia.org/r/836272

Change 837191 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] UpdateOneSearchIndexConfig: Cleanup failed index creations

https://gerrit.wikimedia.org/r/837191

Change 837191 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] UpdateOneSearchIndexConfig: Cleanup failed index creations

https://gerrit.wikimedia.org/r/837191

All indices have been reindexed, additionally ran the check_indices.py script and cleaned up extra indexes that were left behind