Page MenuHomePhabricator

WP Zero workarounds for eqsin
Closed, ResolvedPublic

Description

Problem:

  • There is currently a set of problematic Zero carriers for eqsin which are defined by:
    • Being a Zero carrier with an active countract
    • Being located in one of the countries on eqsin's target country list
    • WMF not yet receiving any confirmation from the carrier that they've updated their IP whitelist for the new eqsin IPs
  • If we ignore the case of these problematic carriers and simply turn up eqsin service broadly over all of the Asia/Oceania regions without regard for them, users of these carriers might expect Zero-rated traffic to WMF, and even get a banner injected by WMF indicating the traffic is Zero-rated, but then be charged for the traffic by the carrier.

Background Info for various terms used above and below:

  • The IP Whitelist: Part of the functional requirements for Zero working at all is that partner mobile carriers configure (in their own infrastructure) a whitelist of public IPs the WMF provides service over. The carrier zero-rates (does not charge for) traffic based on matching the destination IPs of their clients' traffic against this whitelist. Adding a new edge site like eqsin means adding some new IPs to the whitelist from our perspective, but partner carriers can be very slow to react and acknowledge that they've updated such lists on their own end, and we'll break zero-rating for the affected users if we send their users to new IPs before the carrier has updated.
  • The Carrier Network Lists: Each carrier also provides to the WMF a list of the source IPs (or whole networks) their clients' traffic will use when talking to our servers. The functional purpose of these lists in our own (WMF) infrastructure is to tell our software when it should inject the carrier-specific Zero banners at the top of the page output, informing the user that the data is free courtesy of the agreement with their carrier.
  • EQSIN Target List: This is the initial list of target countries for the eqsin turn-up from T189252. This list is based on rough estimations done months ago of the countries that could potentially see benefits. In the long term, we'll of course make these routing decisions on latency alone and may find other cases outside the initial set. During the initial turn-up process, we may also opt to not route some of the target countries to eqsin based on performance. However, the critical bit here is that during the initial turn-up process, we're specifically limiting ourselves to not go outside the pre-determined target list here, so that we don't have to go back and re-evaluate Zero-related concerns from scratch in the middle of the process.
  • GeoDNS Routing: We target the countries above for eqsin usage via GeoDNS. GeoDNS is relatively-imprecise, but usually fairly reliable at the per-country level. In many common cases where client DNS caches lack a functioning implementation of edns-client-subnet, we end up matching on the client DNS cache's exit IP, which can be from a completely different network than the client's HTTPS traffic source IPs. Because of this discrepancy/inaccuracy, it's not useful/accurate to use the carrier-provided network lists above directly in our GeoDNS configuration to keep certain carriers from routing to eqsin.

Solution:

  1. We'll maintain a list of the problematic carriers and their containing countries here in this ticket. At this point the list can only possibly shrink, not grow. It will shrink as carriers' contracts expire or they update their whitelists. We anticipate that the list will become empty over the course of a hopefully-short period of months from now.
  2. So long as a country remains on this list, even though it's in the overall intended target list for eqsin, we won't actually turn-up eqsin service for that country. This will unfortunately delay eqsin benefits for the entire target country, but it's necessitated by the other constraints here. This is deemed to be a sufficient level of effort to prevent most of the possible problems. We'll mark the affected countries directly with commentary in our GeoDNS configuration to ensure there's no mistakes when doing other related work.
  3. As a fail-safe for the above, we'll add some VCL code to our frontend Varnishes that implements something like the pseudo-code below in order to ensure that if GeoDNS fails us above and a user from a problematic carrier's customers end up using eqsin, we don't show the user a Zero-rated banner at the top of the page:
foreach client_request:
    if wmf_site == eqsin:
        if carrier_lookup(client_request) in set_of_problematic_carriers:
            suppress_carrier_lookup(); // which disables the Zero banner injection

Problematic Country+Carrier List:
(to be updated here as conditions change!)

CCCountryCarrier MCC-MNC
THThailand520-18
MMMyanmar414-06
TLEast Timor514-02
FJFiji542-02
NRNauru542-02
VUVanuatu541-05
TOTonga539-88

TODO:

  • Mark these countries in GeoDNS config commentary
  • Initial implementation of VCL logic for carrier banner suppression
  • Maintain GeoDNS commentary and VCL carrier set until list size reaches zero and this ticket can be closed

Event Timeline

BBlack triaged this task as Medium priority.Mar 8 2018, 9:31 PM
BBlack created this task.
BBlack updated the task description. (Show Details)Mar 9 2018, 2:58 PM
BBlack added a subscriber: DFoy.
BBlack moved this task from Triage to Asia Cache DC on the Traffic board.Mar 12 2018, 4:26 PM

Change 421088 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] eqsin+zero fallback

https://gerrit.wikimedia.org/r/421088

Change 421089 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] geo-maps: mark eqsin+zero issues, split out OC

https://gerrit.wikimedia.org/r/421089

Change 421089 merged by BBlack:
[operations/dns@master] geo-maps: mark eqsin+zero issues, split out OC

https://gerrit.wikimedia.org/r/421089

Change 421088 merged by BBlack:
[operations/puppet@production] eqsin+zero fallback

https://gerrit.wikimedia.org/r/421088

Change 446992 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Revert "eqsin+zero fallback"

https://gerrit.wikimedia.org/r/446992

Change 446996 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] geo-maps: unblock eqsin routing for Zero-affected countries

https://gerrit.wikimedia.org/r/446996

Thanks @DFoy for chasing down the contracts and issues here, we're clear to remove these workaround and close up this ticket. \o/

Change 446992 merged by BBlack:
[operations/puppet@production] Revert "eqsin+zero fallback"

https://gerrit.wikimedia.org/r/446992

Change 446996 merged by BBlack:
[operations/dns@master] geo-maps: unblock eqsin routing for Zero-affected countries

https://gerrit.wikimedia.org/r/446996

BBlack closed this task as Resolved.Jul 19 2018, 11:46 PM