Page MenuHomePhabricator

GeoDNS: consider sending CN to eqsin
Open, Needs TriagePublic

Description

Follow up from {T377844} and especially T377844#10281548

I'm wondering if it would make sens to GeoDNS China to eqsin instead of ulsfo.

First would our users benefit from it ? It's geographically closer but the graph is quite spread out.

image.png (489×488 px, 39 KB)

Secondly, would our infra benefit from it ?
Here it would offload traffic from ulsfo, and depending on how it flows in eqsin, could be better balanced.

If we agree to give it a try we would need to do it in a supervised way as we can't precisely predict the traffic balance between the various transit links. Note that most of them will still land on NTT/Arelion, but those are in a different billing bucket than the US bandwidth, and less saturated. Some prefixes are also learned through Tata and Singtel.

If we decide to keep CN in ulsfo, I think it would make sens to have eqsin as option 2, so when we depool ulsfo, CN goes to eqsin.

Details

Related Objects

StatusSubtypeAssignedTask
Openssingh

Event Timeline

ayounsi added a parent task: Restricted Task.Oct 31 2024, 4:42 PM

Hi, thanks for the task. There are no objections from Traffic since the need for moving it is clear and so is the data around latency.

(Hopefully we become aware of such improvements automatically with the Probenet ingestion... that we will work on soon!)

Change #1085456 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/dns@master] geo-maps: switch CN to to eqsin (from ulsfo)

https://gerrit.wikimedia.org/r/1085456

Change #1085456 merged by Ssingh:

[operations/dns@master] geo-maps: switch CN to to eqsin (from ulsfo)

https://gerrit.wikimedia.org/r/1085456

Mentioned in SAL (#wikimedia-operations) [2024-11-04T15:29:14Z] <sukhe> running authdns-update to move CN traffic to eqsin from ulsfo: T378744

ssingh claimed this task.

Better to revert this. The data provided by RIPE Atlas probes in China doesn't make sense to me.

Current connected probes in China: https://atlas.ripe.net/probes/public?sort=asn_v4&country_code__in=CN&status=1&toggle=all&page_size=100&page=1

Metrics data provided by them: https://grafana.wikimedia.org/goto/X2DZI_-Hg?orgId=1

Wikimedia datacenters: https://zh.wikipedia.org/wiki/Help:如何访问维基百科#维基媒体服务器列表

According to :zh:中国互联网骨干网, There are 3 major ISPs in China; if you enter their ASN to the Grafana dashboard linked above, will find that

  • China Mobile: AS9808, AS24059 <- have data on eqsin and codfw
  • China Telecom: AS4134, AS4809 < have margu ipv4 and eqsin
  • China Unicom: AS4837, AS9929 <- only have eqsin ipv4

I entered all possible ASNs of current connected probes and found that the 3 major ISPs used by average people provided little data to the whole, but the significant data sources are,

  • AS23961 Misaka Network, Inc.
  • AS138997 Eons Data Communications Limited

Neither of them is any Chinese ISP, but a cloud provider based in Hong Kong. China Mobile has an independent network in Hong Kong and uses it to carry most of China's international outbound traffic, but the other two ISPs still rely on NTT.

And it's almost not possible to get any metrics data from China, the Data Security Law and Cybersecurity Law of China almost prohibit foreign actors from obtaining data generated in China without a permit.

And, from my network (China Unicom), ulsfo ipv6 has the lowest ping latency, ulsfo ipv4 is blocked by Great Firewall however. I wonder what was the reason before to send China to ulsfo?

And, why was the former task restricted?

I believe that this task needs further discussion.

We have a pipeline (currently a Jupyter Notebook) that ranks each DC for a given country/region by median latency. This is the result for China:

|continent|country|code|dc   |probenet_mean|probenet_median|probenet_std_dev|probenet_variance|probenet_sample_size|
+---------+-------+----+-----+-------------+---------------+----------------+-----------------+--------------------+
|Asia     |China  |CN  |ulsfo|228.86       |215.5          |45.76           |2093.98          |14                  |
|Asia     |China  |CN  |codfw|284.64       |228.0          |158.91          |25250.85         |11                  |
|Asia     |China  |CN  |esams|230.94       |249.0          |64.11           |4109.56          |17                  |
|Asia     |China  |CN  |eqiad|360.26       |266.0          |353.39          |124883.32        |19                  |
|Asia     |China  |CN  |drmrs|275.45       |272.0          |41.33           |1708.07          |11                  |
|Asia     |China  |CN  |eqsin|742.83       |334.0          |1067.6          |1139776.33       |12                  |
|Asia     |China  |CN  |magru|355.17       |350.5          |30.9            |954.74           |18                  |

Time-wise, this covers the maximum window (90 days). The notebook was run today.

I think it's mostly pointless to tune geodns for China. As documented by the zhwiki community (link in the comment above by Naruse_shiroha), currently *.wikipedia.org is blocked via TLS SNI filtering (as well as traditional DNS poisoning). Additionally, certain IPv4 address for text-lb or upload-lb at specific DC is also totally blocked by dropping packets (unreachable with ping). Users have to resort to elaborate techniques if they want to connect directly, where the IP will usually be explicitly specified in the tools (some examples are mentioned in preceding sections in the link).

Therefore, the DNS configuration only affects access to a few non-poisoned domains (such as mediawiki.org, meta wiki, and this site). It's probably not worth the effort. It might make sense to route traffic to a non-blocked DC so users can access those websites directly. However, the GFW can blackhole route the new IP at any time.


Generally the connection quality to the US west coast is acceptable for all three Chinese telecom operators. Connectivity to nearby regions is often congested, though the specifics vary depending on the pairing of domestic telecom and transit carrier. As the geodns policy here only considers geographical location without regard to telecom network (BGP ASN), using ulsfo would be the logical choice. However, the situation is complicated by GFW's blockage.