Page MenuHomePhabricator

Create archive indices and delete archive docs from general indices
Closed, ResolvedPublic

Description

As part of the migration to elasticsearch 6 we separated archive out into it's own index instead of being a second mapping type in general. We are currently in a mixed state where most general indices still have archive documents because the indices were created prior to the es6 upgrade. Use the appropriate scripts to create the new archive index, index all the archives into it, and then issue _delete_by_query with match_all against the general/archive type.

Event Timeline

Tested creating the archive index for testwiki, seems to have gone well. Unfortunately using the forceSearchIndex.php script is causing job runners to emit:

Search backend error during sending {numBulk} documents to the {index} index(s) after 195: action_request_validation_exception: Validation Failed: 1: id is missing;2: id is missing;3: id is missing;4: id is missing;5: id is missing;6: id is missing;7: id is missing;8: id is missing;9: id is missing;10: id is missing;

Will need to figure out what is happening here to move forward. This may suggest that archive indexing in general is broken, as forceSearchIndex.php issues the individual updates in the same way the indexing pipeline does, but it could also be some support code that sources the updates to run is providing the wrong format of data.

Change 508369 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Ignore ancient logging rows with log_page = null

https://gerrit.wikimedia.org/r/508369

Change 508369 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Ignore ancient logging rows with log_page = null

https://gerrit.wikimedia.org/r/508369

Change 508591 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@wmf/1.34.0-wmf.3] Ignore ancient logging rows with log_page = null

https://gerrit.wikimedia.org/r/508591

Change 508591 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@wmf/1.34.0-wmf.3] Ignore ancient logging rows with log_page = null

https://gerrit.wikimedia.org/r/508591

Mentioned in SAL (#wikimedia-operations) [2019-05-07T15:21:51Z] <ebernhardson@deploy1001> Synchronized php-1.34.0-wmf.3/extensions/CirrusSearch/maintenance/forceSearchIndex.php: T222641: Cirrus maint script handle ancient logging rows (duration: 00m 52s)

Maintenance scripts have completed. Verified with _count api that *_general/archive/_count returns 0 on all production clusters.