Page MenuHomePhabricator

Upgrade beta-cluster caches to jessie
Closed, ResolvedPublic

Description

Beta cluster varnish endpoints are running Ubuntu Precise, should be Debian Jessie to match production.

This is indirectly causing some intermittent 503 issues due to prod+beta VCL changes related to gzip, which are tripping varnish bugs, which are present in the last varnish versions we built for precise (3.0.5plus~x-wm7), but fixed in the newer versions we're running on jessie (3.0.6plus-wm6).

Actions to be conducted:

  • create Jessie instances
  • prepare puppet.git patches to update Varnish config hash and DNS alias IP addresses
  • prepare mediawiki-config.git patch to change the HTCP purging entries
  • switch labs public/private IP binding in OpenStack manager (breaking change)

Then:

  • make sure purges are properly emitted
  • verify a labs instance can reach the various FQDN entries (such as en.wikipedia.beta.wmflabs.org)
  • run a few browser tests jobs that hits beta (to be determined)

Event Timeline

BBlack created this task.May 11 2015, 5:23 PM
BBlack raised the priority of this task from to Needs Triage.
BBlack updated the task description. (Show Details)
BBlack added a subscriber: BBlack.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 11 2015, 5:23 PM

Paste of induced 503s related to gzip: https://phabricator.wikimedia.org/P633 , where the fetching fails with:

13 FetchError   c Junk after gzip data
13 Gzip         c u F - 20 0 80 80 90
BBlack triaged this task as Medium priority.May 20 2015, 8:17 AM
thcipriani set Security to None.
hashar added a subscriber: hashar.Jun 15 2015, 7:43 PM

Talked about this during the weekly Beta-Cluster-Infrastructure triage will further talk about it during our Release-Engineering-Team meeting. In short, seems a good opportunity for some team work and lot of 1/1 pairing and reviewing.

hashar updated the task description. (Show Details)Jun 16 2015, 8:47 AM
thcipriani raised the priority of this task from Medium to High.Jul 27 2015, 7:45 PM
thcipriani added a subscriber: thcipriani.

Sounds like Varnish packages won't be getting built for Trusty any longer, upping priority.

Restricted Application added subscribers: Luke081515, Matanya. · View Herald TranscriptJul 27 2015, 7:45 PM

deployment-cache-text03 has been created with Jessie system. That is to prepare the migration of the Trusty cache deployment-cache-text02.

Krenair updated the task description. (Show Details)Jul 28 2015, 1:50 PM
demon claimed this task.Jul 28 2015, 4:47 PM

Change 227743 had a related patch set uploaded (by Chad):
beta: Swap text caches to -text04, which is jessie

https://gerrit.wikimedia.org/r/227743

Change 227744 had a related patch set uploaded (by Chad):
beta: swap text caches to text04, which is jessie

https://gerrit.wikimedia.org/r/227744

demon added a comment.Jul 30 2015, 3:05 PM

Got instances of deployment-cache-*04 running jessie, all succeeding with puppet (minus TLS stuff, which I'm skipping for now to prevent the eternal failures).

deployment-cache-mobile04 is still failing on its zerofetch.py check, but that seems more like beta's general problem...

I think we can move onto the puppet/mw-config bits now.

Change 227744 merged by BBlack:
beta: Swap caches to deployment-cache-*04, which is jessie

https://gerrit.wikimedia.org/r/227744

Change 227743 merged by BBlack:
beta: Swap caches to deployment-cache-*04, which is jessie

https://gerrit.wikimedia.org/r/227743

demon updated the task description. (Show Details)Aug 5 2015, 5:47 PM

All seems working, just need to verify purges, make sure browser tests are still ok, and then decom the old instances.

demon updated the task description. (Show Details)Aug 13 2015, 4:47 PM

No complaint from browser testing folks, assuming that's fine. Decom'd the old instances now.

I guess technically we need to check purges are working, but I assume they are at this point.

demon closed this task as Resolved.Aug 19 2015, 3:34 PM
hashar reopened this task as Open.Sep 8 2015, 10:28 AM

The Parsoid cache deployment-parsoidcache02 is still on Trusty :( T103660: Migrate Parsoid cache from Trusty to Jessie

demon added a comment.Sep 15 2015, 6:56 PM

Working on this. Failing on the usual TLS madness.

@demon any progress on this? I guess you had more important duties. Should we pair on it Release-Engineering-Team

hashar closed this task as Resolved.Oct 28 2015, 3:40 PM

T103660: Migrate Parsoid cache from Trusty to Jessie has finally been solved. That was the last Varnish cache still using Trusty.

BBlack moved this task from Triage to Done on the Traffic board.Nov 30 2015, 6:02 PM