With the recent re-indexing, we encountered a number of issues that we'd like to take some time to figure out a better way to do the re-indexing.
Description
Related Objects
Event Timeline
The recent reindex that we tried to do on Commons failed twice, which delayed the rollout of features to users. One of those was a partial failure which is a total pain to fix properly. This is infrastructure worth investing in.
@dcausse will work on this and break it down a bit. He already has a few ideas how the process could be improved to be more reliable.
I wanted to use the reindex API wrote by Nik. It would greatly help to reduce code complexity in cirrus and hopefully have a more stable reindex process.
The reindex API supports a script param that would allow us to do update reindex docs (e.g. to populate the new wiki field).
Sadly with elastic 2.x the script language used is groovy and is disabled by default for security reasons.
We could add
script.engine.groovy.inline.update: true
to elasticsearch.yml to allow it.
With elasticsearch 5 we have access to painless a new script engine very similar to groovy that is enabled by default.
If we want to use the reindex API our options are:
- Enable groovy
- Wait for elasticsearch 5 and use painless
I'm still undecided, I'm not too keen enabling groovy, we've spent some effort to remove all groovy scripts from cirrus on the other hand reindexing is painful today...
We did have an upgrade to Elasticsearch 5 planned for Q3 (Jan - March 2017), which would seem to solve the problem permanently.
We may not save any time enabling Groovy compared to dealing with the pain of reindexing for a couple of months then upgrading to Elasticsearch 5 for the more permanent solution. My instincts say we should wait for the Elasticsearch 5 upgrade in a few months.
Thoughts?
@Deskana sounds good to me, I think I can suffer few more months waiting for elastic 5.
I'll make some more bash scripts to help me deal with recalcitrant reindex.