Domain renames (redirects, really) do happen from time to time (the latest example being {T31919}). In such instances, the new domain is set up on the ops/MW side (DB, Apache, etc) and the old domain is then redirected to the new one in Apache. We need to define the set of events / changes needed in order to fully comply with the rename/redirect.
The obvious thing that is needed on the RESTBase side is adding the new domain to the configuration. But, the real question is - what to do with the old domain config stanza and how to keep the historical data accessible?
=== Option 1: Remove the old domain ===
The (somewhat) naive approach would be to simply remove the entry. Alas, the decision to reroute the traffic to RESTBase is made in Varnish, i.e. before the request even reaches Apache, which means removing the domain effectively obscures the data present in RESTBase if a direct call to `https://old-domain.wp.org/api/rest_v1/...`. Moreover, issuing requests to `https://new-domain.wp.org/api/rest_v1/...` would not return the data already present in storage, but would need to be recomputed, voiding RESTBase's promise of keeping historical data accessible. In order to mitigate the issue, we could (manually) update the records in storage to reflect the change, but that still does not allow clients to have stable, reliable URIs, as removing a domain causes all requests to RESTBase for that domain to spit out 404s.
=== Option 2: Leave both domains ===
A direct alternative to option 1 is having both domains in RESTBase. This would allow us to keep serving the old data while collecting and storing updates for the new domain and its data as they happen. Moreover, apart from a small config change adding the new domain, no other intervention is needed on our side (which is a Good Thing). However, this approach is not scalable as it may lead us to have to exact copies of the exact same data in storage. As an example, consider the following sequential events:
1. (A lot of) data being stored for domain `a.wp.org`
2. `a.wp.org` is renamed to `b.wp.org` (so from now on updates will be coming for `b.wp.org`)
3. A client accesses the latest revision of an article (for which there was a new revision after the rename) via `https://a.wp.org/api/rest_v1/`
4. RESTBase will realise it hasn't got the latest revision and will fetch it from the MW API asking for `a.wp.org`
5. Apache will happily rewrite `a.wp.org` as `b.wp.org` and serve the revision information back to RESTBase
6. RESTBase will store that info and ask Parsoid (and other related back-end services) to render the content
7. Because of the Apache rewrite, everybody will be happy to do so and deliver the content to RESTBase, which stores it
Now, imagine a dump script or something similar doing this for every article for both `a.wp.org` and `b.wp.org` - we end up with two exact copies without ever realising they are, in fact, copies of each other.
Additionally, placing a request to `https://b.wp.org/api/rest_v1/...` for data that exists in storage for `a.wp.org` would not produce that version, but would be re-rendered.
=== Option 3: Handle rewrites ===
Another option would be to make RESTBase //redirect-aware//, so that it performs (internally) more or less the same domain-name manipulation as Apache does. This could be indicated in the configuration file with a simple redirect stanza:
```
/{domain:a.wp.org}:
x-redirect-to: b.wp.org
/{domain:b.wp.org}: *wp/default/1.0.0
```
This stanza would instruct RESTBase to rewrite the domain to `b.wp.org` whenever `request.params.domain === 'a.wp.org'`, which mitigates the double-storage problem of option 2 while providing stable URIs. In order to be able to expose old data as well, an additional check should be made during start-up that looks for data associated with `a.wp.org` and updates it to `b.wp,org` if such data is found.
=== Discussion ===
Let's discuss! What do you think?
Option 1 shouldn't be considered as it actually lacks correctness wrt RESTBase's goals. Option 2 is correct, but might lead to unnecessary storage issues in the long run. I personally feel option 3 is the way to go. A concern there might be start-up performance when there are redirected domains. We might want to expand `x-redirect-to` and perhaps include a flag whether to check the storage on start-up:
```
/{domain:a.wp.org}:
x-redirect:
to-domain: b.wp.org
migrate_data: false
```
To be on the safe side, `migrate_data` should default to `true`, but this can be changed to `false` as soon as the first worker finishes its start-up process.
Open question: do we need to think about situations where `a.wp.org` stops being redirected to `b.wp.org` and becomes a (new) wiki of its own? Option 3 allows that (after the data has been migrated to the new domain), but with that we lose URI stability. To be fair, though, so does MW at the same time.