Page MenuHomePhabricator

Merge Wikipedia subdomains into one, to discourage censorship
Open, LowPublic

Description

According to the article Censorship of Wikipedia, one effect of the switch to https was that it is now not possible to censor individual articles.

Based on that, would there be any benefit in merging the different subdomains of the current language versions of Wikipedia into one?

That way, it would not be possible to censor an individual language version, and a full ban would be necessary.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Well you wouldn't be able to distinguish e.g. English Wikipedia from French Wikipedia traffic by looking at the DNS lookup or TLS SNI anymore. Encrypting SNI is already being covered in T205378: Support ECH on Wikimedia servers though. DNS-over-TLS/HTTPS should help with the DNS problem, though AFAIK that's something to sort out on your client rather than the Wikimedia servers.

Vpab15 claimed this task.

Thanks Krenair

I will mark this as resolved then

I wouldn't say it's resolved, I wouldn't say it's invalid and I don't think it would get outright declined either. I feel like this must've been discussed before somewhere so there should be a related (duplicate?) task hanging around somewhere.
There's some considerations that would go into this though - Is it worth it - is there any real strategic advantage here or are we just going to wind up with the people who would block one language version instead just blocking all language versions? How easy is it to change our setup (and redirect all existing URLs) to support this? Are there any significantly negative SEO implications that would follow such a URL change?

Vpab15 reopened this task as Open.EditedFeb 1 2019, 4:37 PM

I misunderstood then. I took a look at the ESNI task you mentioned, but couldn't really understand if implementing it would prevent individual language versions to be censored.

Obviously, if there is an easier way to prevent censorship of individual versions without updating the URL, that would be ideal.

Otherwise, I understand it might not be worth the hassle, but in any case I think it is worth discussing.

I will reopen then.

Vpab15 removed Vpab15 as the assignee of this task.Feb 1 2019, 4:52 PM
Aklapper renamed this task from Request to merge wikipedia subdomains into one to discourage censorship to Merge Wikipedia subdomains into one, to discourage censorship.Feb 1 2019, 10:05 PM
jijiki triaged this task as Medium priority.Feb 7 2019, 12:48 PM

The linked ESNI ticket is kind of a random user question ticket, and not actually one created for working on it (which still off in the Future, but obviously we'll implement ESNI as soon as we realistically can, as part of planned work).

The language codes leaked in our subdomain scheme are annoying from a censorship perspective. Our eventual ESNI rollout will incidentally close that up for modern ESNI-implementing clients, but not for legacy ones as long as they're still around. Getting the language codes out of the domains would help for the legacy cases as well. Either way, it's a relatively small step in the grand scheme of such things (to up the ante from per-language blocks to per-project blocks, which many blockers would likely be perfectly comfortable with anyways).

Regardless of the censorship angle, there are a whole host of other reasons we dislike the language subdomain scheme at the technical level (it makes things a PITA down in DNS, and in TLS certificate issuance, etc). I think it would be great to get rid of them and e.g. replace en.wikipedia.org/wiki/Foo with wikipedia.org/en/wiki/Foo. However, that would be a very deep and complex rabbithole of a project for anyone to start on. There are doubtless hundreds of edge-cases to be tripped here and there throughout our stack while going through such a change, and then even if we successfully move all internal references and all public canonicalization over to the new scheme, we'll likely still have to support the old scheme for a very long time as well. It could be years before the volume to the old naming scheme dies down enough that we can move it to 301 redirect service instead of direct rewrites in the traffic layer, and then another period of many more years (virtually forever) before we could realistically dump the certificates and DNS entries and really call them dead, which negates a whole lot of the perceived technical benefits :/

BBlack lowered the priority of this task from Medium to Low.Feb 7 2019, 1:41 PM

Expounding on the lamentations above in a more realistic triage sort of sense:

  • It's a very complex project which will likely take longer to execute than SNI/DOH encryption solutions becoming available for us and/or The World.
  • It would require buy-in and significant work commitments from multiple departments across the organization.
  • There are real breakage/disruption risks in all of the work that would be done.
  • The potential short-term benefits are minimal at best (slightly broadens censorship fate-sharing), but could potentially be negative (some countries would choose to block whole projects instead of selected languages).
  • The timeline for tech-level benefits (e.g. DNS/TLS simplifications, etc) are so far out (many, many years) that nobody can realistically make them part of a rational cost/benefit tradeoff.

However, it's still a Good Idea, and it's path I wish we were on, and a decade or more in the future someone will probably be very thankful we started the effort, so I'm disinclined to just close this off outright. Setting to Low for now and letting discussion simmer.

According to the article Censorship of Wikipedia, one effect of the switch to https was that it is now not possible to censor individual articles.

Conversely, now when China decides to censor articles about Tienanmen Square, their citizens also lose access to basic health information, STEM education background material and all other content that would probably have far more positive impact on their lives than articles about politics... it's a double edged sword. Arguably, strictly from the censorship / access angle, HTTPS was a bad trade-off. (There are a number of other reasons why it was absolutely necessary; but for merging subdomains that's probably not the case.)

then another period of many more years (virtually forever) before we could realistically dump the certificates and DNS entries and really call them dead

Literally forever, I'd hope. Cool URIs don't change. Cool domains even less so. At some point maybe we could downgrade their security and just letsencrypt them, but they would still have to resolve to the new domain.


Anyway, web browsers rely on domains for their security model, so collapsing domains would have all kinds of problematic side effects. All wikis could easily manipulate cookies for all wikis, which is probably the opposite direction from what we want (it's already too easy to take over a privileged account on some tiny Wikinews and escalate it to an attack against English Wikipedia); various client-side resource limits are applied per domain, so we'd run into all kinds of limitation with cookies and localStorage; the amount of useless cookies attached to every request would grow vastly, and so on.

At some point maybe we could downgrade their security and just letsencrypt them

How exactly would certs from LetsEncrypt be a downgrade in security?

According to the article Censorship of Wikipedia, one effect of the switch to https was that it is now not possible to censor individual articles.

Conversely, now when China decides to censor articles about Tienanmen Square, their citizens also lose access to basic health information, STEM education background material and all other content that would probably have far more positive impact on their lives than articles about politics... it's a double edged sword. Arguably, strictly from the censorship / access angle, HTTPS was a bad trade-off. (There are a number of other reasons why it was absolutely necessary; but for merging subdomains that's probably not the case.)

This is a tricky, subjective area, but my view is more positive on this angle (obviously). Using your example, when China can block a single article like Tiananmen Square without affecting access to basic health information, the blockage is stealthy and insidious: the users don't even realize the subtle, narrowly-focused bits of information that are being elided from their view. When you raise the collateral blockage requirements, it makes it harder to hide the censorship, and knowing that you're being censored at all is important. This argument extends to broader and broader scopes, and if you take it to its logical extreme, we really want to force censors to take or leave the whole public Internet and all free knowledge, so that it's very obvious and undeniable when censorship is happening (because literally everything is blocked). In many ways that's all the meta-information you need at that point to understand the world you inhabit and what needs changing.

How exactly would certs from LetsEncrypt be a downgrade in security?

I'm not an HTTPS expert but I imagine we wouldn't pay GlobalSign if we considered free LetsEncrypt certificates of equal value.

How exactly would certs from LetsEncrypt be a downgrade in security?

I'm not an HTTPS expert but I imagine we wouldn't pay GlobalSign if we considered free LetsEncrypt certificates of equal value.

We can't get some types of cert (Extended Validation, among others) from LE, so that is one reason for some of our certs still being purchased elsewhere.

Previously they didn't have wildcard cert support either.

And also a lack of tooling has meant we were previously limited (though AFAIK these limitations aren't an issue now) meaning we couldn't get all of the certs we needed (for example we could only get 100 certs on beta cluster, meaning some wikis were without SSL certs - see T202564: https://sv.wikipedia.beta.wmflabs.org/ has invalid certificate), making it a pointless exercise

See also Acme-chief and T213705: Deploy managed LetsEncrypt certs for all public use-cases for a plan to rollout the usage of LE certs more widely

How exactly would certs from LetsEncrypt be a downgrade in security?

I'm not an HTTPS expert but I imagine we wouldn't pay GlobalSign if we considered free LetsEncrypt certificates of equal value.

We can't get some types of cert (Extended Validation, among others) from LE, so that is one reason for some of our certs still being purchased elsewhere.

Yeah, I don't think Wikimedia has any non-DV certs outside of frack.

@Krenair

How exactly would certs from LetsEncrypt be a downgrade in security?

Because, as I've tried once on my localhost, their certs cannot provide anti-downgrade function of TLS v1.3, thus if your browser can't support TLS v1.3, then you're still using TLS v1.2 in the entire ethernet connection, and hence GFW can still check your client hello of SNI informations, and block your TCP access, even you can still successfully normal-ping (which we all know that's an ICMP stuff, and mostly nonsense for nginx fans e.g. me) that site. See more of the reasons at https://github.com/googlehosts/hosts/issues/87

I don't think any certificate could. The SNI is transferred before the certificate is presented by the server. The server can of course be configured not to negotiate any TLS version less than 1.3 However note that in the event that your browser can't support TLS v1.3, you won't be able to view the page at all (regardless of the GFW blocking it or not).

PS: I'm afraid those other reasons are Chinese to me :P

@Platonides

However note that in the event that your browser can't support TLS v1.3, you won't be able to view the page at all

Hey, my browser supports it, because I did these steps:

  1. visit chrome://flags/#tls13-variant
  2. select one of the "Enabled" options

China and Turkey seem to be the only countries blocking Wikipedia at the moment, and both block all languages.

The former situation of China blocking only zh.wikipedia.org may be an anomaly: if an authoritarian regime is prepared to block an official language, there is no benefit for it not to block the rest of Wikipedia along with it. If there is only one domain to consider, its decision to block actually becomes easier.

@BBlack raises some compelling technical reasons for compacting the 3LDs other than countering censorship. I suggest the ticket be renamed to emphasise those.

@Platonides

However note that in the event that your browser can't support TLS v1.3, you won't be able to view the page at all

Hey, my browser supports it, because I did these steps:

  1. visit chrome://flags/#tls13-variant
  2. select one of the "Enabled" options

Yeah, but that's behind a flag, and flags aren't 100% stable or there forever.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

Diskdance subscribed.

Closing this task due to the fact that Wikipedia has been blocked in all languages since Mar 2019, which leaves this task meaningless.

ssingh subscribed.

Please don't close this task pending further discussion. Thank you.