There's a fair amount of traffic that crosses via redirects, plus login/meta fetches to desktop from mobile, etc. With the merge of the two caches, the only blocker for this is ensure it's ok with Zero - I don't think they'll take issue, as I believe partners are only paying attention to whitelist block difference for text-vs-multimedia, not desktop-vs-mobile, but it's best to check/coordinate first.
|Stalled||None||T116132 Consider allowing H2 coalesce for upload.wikimedia.org for images used in wiki articles|
|Resolved||BBlack||T124482 Use Text IP for Mobile hostnames to gain SPDY/H2 coalesce between the two|
|Resolved||BBlack||T109286 Merge mobile cache into text cache|
|Resolved||ori||T120151 Improve handling of mobile variants in Varnish|
|Resolved||Ottomata||T122650 Disable legacy tsv mobile, zero and 5xx-mobile jobs|
|Resolved||BBlack||T124165 Fix mobile purging|
|Resolved||BBlack||T124166 Fix varnish handling of mobile hostname rewriting|
- We're still trying to get to the bottom of historical and present mysteries about Zero-rated whitelist subnets, which holds up making a decision on whether it's ok to move the m-dot and/or zero-dot hostnames to the text IP.
- Over in T125979 we're experimenting with disabling SPDY altogether on the text caches for the foreseeable future due to the fact that the heavy performance loss on slow devices/networks may not be worth the modest gains on fast ones. This is all interrelated with the fact that our primary HTML output is heavy (article content not split to a separate fetch from the page/UI bits) and our main CSS isn't inlined, as discussed in T125208 . If we end up sticking with the SPDY-disable on cache_text, there's less reason to worry about this IP change in the first place (although it would still be nice to get it done just to clean up unnecessary IPs and LVS services, and prepare for future SPDY and/or H/2).
We didn't end up keeping SPDY disabled, and HTTP/2 is coming. From our end, this is a relatively simple change now, but there are still open questions about the effect on Zero which we need help resolving. Past email threads petered out with no common understanding between ops + zero on how the IP blocks work today...
https://gerrit.wikimedia.org/r/283364 above does the functional user-facing change. If it's successful without issue, there will eventually be a number of followup commits afterwards to clean up the leftover bits of the mobile addrs and eventually decom them from use completely at the DNS/LVS/etc levels. after we've confirmed traffic dropoff on the old IPs down to an acceptable level.
This was merged around 2016-04-25 18:40 UTC, and legit caches that honor TTLs correctly should have all stopped handing out the old IPs by ~19:00.
Next step here is auditing the trailing traffic, in case somewhere these old IPs or LB hostnames are hardcoded, or there's significantly broken DNS cache/client stuff, before we can finish decomming the IPs.
It's been 8.8 days since 10-minute TTL expiry, and the rates are low enough that we definitely don't have any kind of systemic issue with e.g. hardcoded IPs in our own apps or server-side code.
LVSes still show a tiny handful of connections to the mobile IPs, but it's very tiny. These are expected, from sources such as:
- One-off instances of 3rd parties hardcoding our IPs for debugging or something
- Broken DNS caches
- Random HTTP[S] hits to random IPs (scanning and probing)
- Probably, our own healthchecks in e.g. catchpoint / watchmouse -type stuff?
I don't expect the rate will ever reach zero, but it's close enough to kill it, IMHO. Will look into catchpoint/watchmouse first and see if I can eliminate anything there before it alerts on us.