Change Details

Now that cloudelastic has elasticsearch installed (T214921), we need to define how this cluster should be used, which will influence how we setup access to it, how we import data and other steps that need to happen before it is fully usable. The list below is non exhaustive and is meant as the start of a discussion, not a definitive answer. **Update lag**: Update are asynchronous. The best case scenario is that changes are searchable after a few minutes, but it is perfectly normal to have lag up to a few hours, in particular during regular maintenance. We need to ensure that no workflow based on cloudelastic relies on low update lag. One way to do that is to introduce artificial lag, and keep the lag higher than a few hours at all time. An extreme version of this could be to only do weekly refresh of cloudelastic. **Data structure changes:** Overtime, we will refine the way we index documents into elasticsearch. This can include adding or removing fields, changing analyzers configuration or a number of other changes. Tools using cloudelastic should be aware that there will be breaking changes. We might want to define and document a subset of fields that we consider stable. **Elasticsearch upgrades:** Elastic is known to not be afraid of breaking backward compatibility often. Tools will break in unexpected ways during upgrades. We will need to communicate upgrade schedule ahead of time. We will probably not be able to provide an environment to test version N+1, so tool owner will bear the burden of testing the compatibility of their tools. We need a way to forward deprecation warnings from the elasticsearch logs to each tool owner. **Multi cluster:** We currently have 3 elasticsearch clusters, each wiki is mapped to one of those clusters. This mapping should not be considered stable. We need to provide a way for clients to be routed to the correct cluster, or to be able to discover this mapping in an automated way. We should probably restrict cross indices searches since they would only work inside a single cluster. **Quota / rate limiting:** At some point, we will probably need to introduce some form of per user quota or rate limiting. This will require some form of authentication. We should start early on authentication, the actual rate limiting can come later. **Read only:** We want updates to only come from production. This cluster should not be used as a generic elasticsearch cluster where anyone can index their own dataset.

Now that cloudelastic has elasticsearch installed (T214921), we need to define how this cluster should be used, which will influence how we setup access to it, how we import data and other steps that need to happen before it is fully usable. The list below is non exhaustive and is meant as the start of a discussion, not a definitive answer. **Update lag**: Update are asynchronous. The best case scenario is that changes are searchable after a few minutes, but it is perfectly normal to have lag up to a few hours, in particular during regular maintenance. We need to ensure that no workflow based on cloudelastic relies on low update lag. One way to do that is to introduce artificial lag, and keep the lag higher than a few hours at all time. An extreme version of this could be to only do weekly refresh of cloudelastic. **Data structure changes:** Overtime, we will refine the way we index documents into elasticsearch. This can include adding or removing fields, changing analyzers configuration or a number of other changes. Tools using cloudelastic should be aware that there will be breaking changes. We might want to define and document a subset of fields that we consider stable. **Elasticsearch upgrades:** Elastic is known to not be afraid of breaking backward compatibility often. Tools will break in unexpected ways during upgrades. We will need to communicate upgrade schedule ahead of time. We will probably not be able to provide an environment to test version N+1, so tool owner will bear the burden of testing the compatibility of their tools. We need a way to forward deprecation warnings from the elasticsearch logs to each tool owner. **Multi cluster:** We currently have 3 elasticsearch clusters, each wiki is mapped to one of those clusters. This mapping should not be considered stable. We need to provide a way for clients to be routed to the correct cluster, or to be able to discover this mapping in an automated way. We should probably restrict cross indices searches since they would only work inside a single cluster. **Quota / rate limiting:** At some point, we will probably need to introduce some form of per user quota or rate limiting. This will require some form of authentication. We should start early on authentication ({T220069}), the actual rate limiting can come later. **Read only:** We want updates to only come from production. This cluster should not be used as a generic elasticsearch cluster where anyone can index their own dataset.