Page MenuHomePhabricator

Move bits traffic to text/mobile clusters
Closed, ResolvedPublic

Description

This helps for SPDY (see T94896), and would also make more-efficient use of our cache hardware and simplify the overall structure and complexity of our production varnish config (puppet manifests, templates).

Basically, this needs a few VCL-level fixups for the text/mobile clusters to support everything that the bits cluster does, and then making wmf-config changes to stop using the bits hostname. The hardware from the bits cluster, when it can eventually be decommed from this role, will become the new global misc cluster.

  • VCL changes in support of basic /static-* resources
  • VCL changes for gzip of icons/svg
  • VCL changes for beacon/statsv
  • Deploy bits-like VCL to text-cluster
  • Move bits.wm.o hostname to text-addrs
  • De-puppetize the bits cluster and all related things

Related Objects

Event Timeline

BBlack raised the priority of this task from to Medium.
BBlack updated the task description. (Show Details)
BBlack added projects: acl*sre-team, Varnish.
BBlack subscribed.

MediaWiki's default expectation is that static assets and text are served from the same host. It actually takes quite a bit of hackery to convince it to use bits. So (happily) this change is straightforward to implement -- it requires not much more than the removal of code.

See https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L227-244

Note also ori's couple of related varnish patches here for a potential path towards various solutions:
https://gerrit.wikimedia.org/r/#/c/206351/
https://gerrit.wikimedia.org/r/#/c/206348/

faidon noted that there's some risk if we just naively rid ourselves of the bits hostname, it could have a detrimental impact on non-SPDY clients, who currently get to use some $domain-vs-bits.wm.o connection parallelism. We need to think that through, and if it's an issue, maybe there are ways we can still get the best of both worlds with e.g. same-IP + certs, etc....

To elaborate: I'm worried that of the repercussions this would have for non-SPDY clients, as they wouldn't be able to use a different set of connections to fetch those resources (i.e. traditional domain sharding). "Non-SPDY" clients right now are the vast majority of anons, as we're not HTTPS by default, so this isn't something to take lightly.

I don't think we reach the connection limits for anything but upload, but still, this probably needs some further investigation (if it hasn't happened already?) and measuring its effect to page load time post-deployment.

This also means that the crazy humongous cookies that we hand out users would be sent for bits resources as well, but considering the low amount of requests due to RL bundling this may not have a huge effect overall. Hopefully :)

I think @ori 's contention last we discussed this on IRC was that the current count of $domain + bits connections in a typical page load (combined) is under the threshold for common client-side connection parallelism limits to a single domainname. Thus we'd see the same opportunities for parallelism in the non-SPDY case regardless of whether the bits domain was separate or not.

BBlack updated the task description. (Show Details)

looking at mediawiki-config repo, it's not immediately obvious to me how we'd test this on one wiki or anything like that. We could obviously gut the $wmfHostname['bits'] references globally in beta and roll out by groups, of course.

looking at mediawiki-config repo, it's not immediately obvious to me how we'd test this on one wiki or anything like that. We could obviously gut the $wmfHostname['bits'] references globally in beta and roll out by groups, of course.

Nah, it's easier than that. Done in I188bbb829 and applied to sewikibooks (a closed wiki) as a test-case.

This is live on mediawikiwiki now as well ( https://www.mediawiki.org/ )!

Change 208315 had a related patch set uploaded (by Yuvipanda):
Prepare for death of bits.wikimedia.org

https://gerrit.wikimedia.org/r/208315

Change 208315 merged by jenkins-bot:
Prepare for death of bits.wikimedia.org

https://gerrit.wikimedia.org/r/208315

Change 214822 had a related patch set uploaded (by Ori.livneh):
Move media beacon off of bits

https://gerrit.wikimedia.org/r/214822

Change 214822 merged by Ori.livneh:
Move media beacon off of bits

https://gerrit.wikimedia.org/r/214822

Change 215624 had a related patch set uploaded (by BBlack):
Add legacy bits.wm.o support to text-lb VCL

https://gerrit.wikimedia.org/r/215624

Change 219107 had a related patch set uploaded (by BBlack):
static URLs now seem to all be /static/, update regex

https://gerrit.wikimedia.org/r/219107

Change 219107 merged by BBlack:
static URLs now seem to all be /static/, update regex

https://gerrit.wikimedia.org/r/219107

Should this be closed?

/w/static and wikiname/load.php URLs still work on bits, e.g. https://bits.wikimedia.org/static/1.26wmf12/resources/assets/poweredby_mediawiki_88x31.png , but I don't see anything generating them.

Don't forget to update https://wikitech.wikimedia.org/wiki/Bits.wikimedia.org.

Yeah, to the degree possible, we've left everything we can still working on bits during the transition. The next major step we're coming up on is physically folding bits into the text cluster (moving the DNS resolution and VCL code there), while still leaving it as logically-functional as it is today. At that point, we can begin planning for its eventual long-term decom/demise.

Are we basically done with all of the bits.wm.o traffic removals we can accomplish quickly and easily? I'd like to merge the cluster over into text-lb at this point ( https://gerrit.wikimedia.org/r/#/c/215624/ + a followup DNS patch afterwards to move the hostname), so we can unblock the bits cluster decom/cleanup and misc-cluster plans.

Change 215624 merged by BBlack:
Add legacy bits.wm.o support to text-lb VCL

https://gerrit.wikimedia.org/r/215624

Change 228021 had a related patch set uploaded (by BBlack):
bits.wm.o -> text-cluster

https://gerrit.wikimedia.org/r/228021

Change 228029 had a related patch set uploaded (by BBlack):
decom bits service IPs

https://gerrit.wikimedia.org/r/228029

Change 228032 had a related patch set uploaded (by BBlack):
switch bits to meta refs in dataset/snapshot html

https://gerrit.wikimedia.org/r/228032

Change 228033 had a related patch set uploaded (by BBlack):
Remove cache::bits roles from bits-cluster hosts

https://gerrit.wikimedia.org/r/228033

Change 228034 had a related patch set uploaded (by BBlack):
Decom bits cluster varnish/lvs configuration

https://gerrit.wikimedia.org/r/228034

Change 228032 merged by BBlack:
switch bits to meta refs in dataset/snapshot html

https://gerrit.wikimedia.org/r/228032

We probably shouldn't move forward on the bits->text switch at the DNS level (and patches beyond that) until T106966 is resolved, so that we don't have zero-whitelisters seeing mobile loading load.php traffic from non-mobile.

Change 228021 merged by BBlack:
bits.wm.o -> text-cluster

https://gerrit.wikimedia.org/r/228021

Change 228029 merged by BBlack:
decom bits service IPs

https://gerrit.wikimedia.org/r/228029

Change 228033 merged by BBlack:
Remove cache::bits role from bits-cluster hosts

https://gerrit.wikimedia.org/r/228033

Change 228034 merged by BBlack:
Decom bits cluster varnish/lvs configuration

https://gerrit.wikimedia.org/r/228034

BBlack updated the task description. (Show Details)

Change 231778 had a related patch set uploaded (by BBlack):
bits-legacy: remove special https://bits redirects for secure wikis

https://gerrit.wikimedia.org/r/231778

Change 231777 had a related patch set uploaded (by BBlack):
bits-legacy: remove beacon/statsv support

https://gerrit.wikimedia.org/r/231777

Change 231777 merged by BBlack:
bits-legacy: remove beacon/statsv support

https://gerrit.wikimedia.org/r/231777

Change 231778 merged by BBlack:
bits-legacy: remove special https://bits redirects for secure wikis

https://gerrit.wikimedia.org/r/231778