User Story:
- As a searcher on Commons or Wikidata, I want to be able to take advantage of the latest language analysis improvements for all languages in use on those projects.
- As a Search Platform engineer, I don't want to have to reindex Commons and Wikidata every time any language is updated (it takes too long—multiple days), but I do want non-English language analysis improvements to make it to those projects in a reasonably timely manner.
Most wikis only use language analysis for one language, and that language can be recovered from the wiki config. Commons and Wikidata support many languages and use many analyzers, but their config indicates that their language is English. By default, then, Commons and Wikidata only get reindexed when the English analysis chain is updated, or when we have a reason to reindex everything, neither of which is very often.
A reasonable compromise would be to reindex Commons and Wikidata on a regular schedule. We discussed it in our weekly meeting and every 3–6 months (2–4 times a year) seems reasonable. Every 4 months/3 times a year seems like a good compromise to start with.
Elastic index names include a Unix timestamp, so determining how old an index is should be straightforward. Using the index timestamp also means the clock is automatically reset if a project gets reindexed sooner for some other reason, so there will be less unnecessary reindexing than if we were to follow a strict schedule like "the first week of every third month".
There are actually several indexes for each wiki project, and we may want to be able to make simple variations on this kind of alert (like reindexing every wiki at least once a year or something). All projects have a content index, and normally all related indexes are refreshed one right after another, so only checking the content index makes sense, though counter arguments are welcome.
Acceptance Criteria:
- Any concerns about the schedule (every 4 months/120 days / ~3 times a year) or letting the content index be the benchmark are discussed and resolved.
- An alert is set up to send email if the timestamp for the Commons content index is more than 120 days old.
- An alert is set up to send email if the timestamp for the Wikidata content index is more than 120 days old.
- Ideally a phab ticket should be automatically created and assigned to the Discovery-Search backlog