CirrusSearch does not index redirects as individual index document but rather maintain a list of redirects to the page they redirect to.
This data is not correlated to a particular revision of the target page and won't be handled by revision based updates. Thus the update will have to only update the redirect field:
{ "wiki": "testwiki", "page_id": 123, "namespace": 10, "title": "Target title", "redirect": [ { "namespace": 10, "title": "Redirect to target", }, { "namespace": 2, "title": "Another Redirect to target", }, { "namespace": 0, "title": "Yet Another Redirect to target", }, ] }
Note the absence of the version field here as we cannot attach this data to a particular revision of the page, it is the state of the redirects to the page Target title after one of its redirect is updated (added/removed).
There could be two approaches to do this:
Use the page-state stream state but use a different enrichment API
The cirrus doc api could be extended to support a builder and construct the update fragment with all the redirects targeting this page.
- When given the revision of redirect the target page has to be identified and then all redirects to this page must be collected
- List all the redirects like what's done in RedirectsAndIncomingLinks.php
- Produce the update fragment
While this technique works for adding a redirect, changing or removing a redirect might be tricky to do right:
- changing a redirect from one target to another might cause 2 updates and the page-state stream might not know what was the previous target page
- deleting a redirect might cause the call to the cirrus doc to fail as the content of deleted page cannot be extracted
- what to do when transforming a page into a redirect and vice-versa
So while this approach is tempting as it resembles in many points to the rest of the pipeline the couple unknowns might make it not the best one to implement.
Create a dedicated stream from CirrusSearch
Another approach might be to simply ignore redirects from the page-state stream but create a new stream directly from CirrusSearch.
CirrusSearch might have better access to all the components required to sort out the various edges-cases mentioned above and produce the right events.
AC:
- changes to redirect pages are properly reflected in the elasticsearch index