Page MenuHomePhabricator

Upgrade beta-cluster caches to jessie
Closed, ResolvedPublic

Description

Beta cluster varnish endpoints are running Ubuntu Precise, should be Debian Jessie to match production.

This is indirectly causing some intermittent 503 issues due to prod+beta VCL changes related to gzip, which are tripping varnish bugs, which are present in the last varnish versions we built for precise (3.0.5plus~x-wm7), but fixed in the newer versions we're running on jessie (3.0.6plus-wm6).

Actions to be conducted:

  • create Jessie instances
  • prepare puppet.git patches to update Varnish config hash and DNS alias IP addresses
  • prepare mediawiki-config.git patch to change the HTCP purging entries
  • switch labs public/private IP binding in OpenStack manager (breaking change)

Then:

  • make sure purges are properly emitted
  • verify a labs instance can reach the various FQDN entries (such as en.wikipedia.beta.wmflabs.org)
  • run a few browser tests jobs that hits beta (to be determined)

Event Timeline

BBlack raised the priority of this task from to Needs Triage.
BBlack updated the task description. (Show Details)
BBlack subscribed.

Paste of induced 503s related to gzip: https://phabricator.wikimedia.org/P633 , where the fetching fails with:

13 FetchError   c Junk after gzip data
13 Gzip         c u F - 20 0 80 80 90
BBlack triaged this task as Medium priority.May 20 2015, 8:17 AM

Talked about this during the weekly Beta-Cluster-Infrastructure triage will further talk about it during our Release-Engineering-Team meeting. In short, seems a good opportunity for some team work and lot of 1/1 pairing and reviewing.

thcipriani raised the priority of this task from Medium to High.Jul 27 2015, 7:45 PM
thcipriani subscribed.

Sounds like Varnish packages won't be getting built for Trusty any longer, upping priority.

deployment-cache-text03 has been created with Jessie system. That is to prepare the migration of the Trusty cache deployment-cache-text02.

Change 227743 had a related patch set uploaded (by Chad):
beta: Swap text caches to -text04, which is jessie

https://gerrit.wikimedia.org/r/227743

Change 227744 had a related patch set uploaded (by Chad):
beta: swap text caches to text04, which is jessie

https://gerrit.wikimedia.org/r/227744

Got instances of deployment-cache-*04 running jessie, all succeeding with puppet (minus TLS stuff, which I'm skipping for now to prevent the eternal failures).

deployment-cache-mobile04 is still failing on its zerofetch.py check, but that seems more like beta's general problem...

I think we can move onto the puppet/mw-config bits now.

Change 227744 merged by BBlack:
beta: Swap caches to deployment-cache-*04, which is jessie

https://gerrit.wikimedia.org/r/227744

Change 227743 merged by BBlack:
beta: Swap caches to deployment-cache-*04, which is jessie

https://gerrit.wikimedia.org/r/227743

All seems working, just need to verify purges, make sure browser tests are still ok, and then decom the old instances.

No complaint from browser testing folks, assuming that's fine. Decom'd the old instances now.

I guess technically we need to check purges are working, but I assume they are at this point.

The Parsoid cache deployment-parsoidcache02 is still on Trusty :( T103660: Migrate Parsoid cache from Trusty to Jessie

Working on this. Failing on the usual TLS madness.

@demon any progress on this? I guess you had more important duties. Should we pair on it Release-Engineering-Team

T103660: Migrate Parsoid cache from Trusty to Jessie has finally been solved. That was the last Varnish cache still using Trusty.