Page MenuHomePhabricator

CirrusSearch: investigate how to improve re-indexing
Closed, DuplicatePublic

Description

With the recent re-indexing, we encountered a number of issues that we'd like to take some time to figure out a better way to do the re-indexing.

Event Timeline

debt triaged this task as Medium priority.Oct 27 2016, 8:34 PM
debt moved this task from needs triage to Up Next on the Discovery-Search board.
Deskana raised the priority of this task from Medium to High.Nov 15 2016, 6:21 PM
Deskana added a subscriber: Deskana.

The recent reindex that we tried to do on Commons failed twice, which delayed the rollout of features to users. One of those was a partial failure which is a total pain to fix properly. This is infrastructure worth investing in.

@dcausse will work on this and break it down a bit. He already has a few ideas how the process could be improved to be more reliable.

dcausse changed the task status from Open to Stalled.Nov 21 2016, 4:47 PM

I wanted to use the reindex API wrote by Nik. It would greatly help to reduce code complexity in cirrus and hopefully have a more stable reindex process.
The reindex API supports a script param that would allow us to do update reindex docs (e.g. to populate the new wiki field).
Sadly with elastic 2.x the script language used is groovy and is disabled by default for security reasons.
We could add

script.engine.groovy.inline.update: true

to elasticsearch.yml to allow it.

With elasticsearch 5 we have access to painless a new script engine very similar to groovy that is enabled by default.

If we want to use the reindex API our options are:

  1. Enable groovy
  2. Wait for elasticsearch 5 and use painless

I'm still undecided, I'm not too keen enabling groovy, we've spent some effort to remove all groovy scripts from cirrus on the other hand reindexing is painful today...

We did have an upgrade to Elasticsearch 5 planned for Q3 (Jan - March 2017), which would seem to solve the problem permanently.

We may not save any time enabling Groovy compared to dealing with the pain of reindexing for a couple of months then upgrading to Elasticsearch 5 for the more permanent solution. My instincts say we should wait for the Elasticsearch 5 upgrade in a few months.

Thoughts?

@Deskana sounds good to me, I think I can suffer few more months waiting for elastic 5.
I'll make some more bash scripts to help me deal with recalcitrant reindex.