Page MenuHomePhabricator

Move bulk content out of the ElasticaWrite job
Closed, ResolvedPublic

Description

ElasticaWrite jobs can be multiple megabytes, which is an order of magnitude larger than should be reasonable. The problem is we put the entire elasticsearch update into the ElasticaWrite, and this contains two copies of the page text content, among other things.

Adjust CirrusSearch such that bulk content loading from mediawiki is done as part of the ElasticaWrite job, rather than before it. This will mean that each retry will need to recalculate the document, but should be acceptable. Care must be taken to not move every piece of document building behind ElasticaWrite. In particular we want to make sure link counting happens once and is written to all clusters, as that query is quite expensive.

Event Timeline

Change 546285 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Move bulk content for update after ElasticaWrite

https://gerrit.wikimedia.org/r/546285

Change 546285 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Move bulk content for update after ElasticaWrite

https://gerrit.wikimedia.org/r/546285