Page MenuHomePhabricator

Update / maintain Beta Cluster restbase cluster: Up & working with VE
Closed, ResolvedPublic

Description

We should update the RESTBase test cluster in Beta Cluster so that it can be used for VE and other testing. We should also set up a mechanism to automatically keep this cluster updated with the latest restbase master.

Done:

  • Set up the VM instances in deployment-prep and hook them up with puppet/salt
  • Figure out a way to add the beta cluster wiki domains into config.yaml
  • Modify ops/puppet so that the Cassandra and RESTBase roles work in both production and beta cluster
  • Configure and activate the RestBaseUpdateJobs extension and integrate RB with VE
  • Figure out a way to update code in the beta cluster so that we can test stuff before releasing it into production

Event Timeline

GWicke claimed this task.
GWicke raised the priority of this task from to High.
GWicke updated the task description. (Show Details)
GWicke added subscribers: mobrovac, greg, GWicke.
greg renamed this task from Update / maintain beta labs restbase cluster to Update / maintain Beta Cluster restbase cluster.Mar 17 2015, 3:35 PM
greg updated the task description. (Show Details)
greg set Security to None.

What's the status of this? This was considered a blocker for the initial rollout.

@Eevans is looking into setting up a cluster in beta labs. In the meantime, labs testing can directly use the prod cluster.

Just to be pedantic: "Beta Cluster". "Labs" is too generic of a word in our world :)

Beta Cluster is lagging production (which shouldn't happen too often, if ever) since VE in Beta Cluster isn't use restbase, right?

@greg, beta labs VE can use restbase if configured to do so. Prod is mostly not using RB yet. We realized pretty late that the VE wmf20 code in prod doesn't have the restbase integration yet, so only group0 wikis actually use restbase so far. We'll re-start the roll-out after the train deploy tomorrow.

@greg, beta labs VE can use restbase if configured to do so. Prod is mostly not using RB yet. We realized pretty late that the VE wmf20 code in prod doesn't have the restbase integration yet, so only group0 wikis actually use restbase so far. We'll re-start the roll-out after the train deploy tomorrow.

  1. "Beta Cluster"
  2. So "it isn't behind prod now, but will be tomorrow". This was considered a blocker so we can actually test things before they hit production users. What is the timeline here?

I see two distinct things here:

  1. Testing that VE doesn't break itself when talking to RB (ie: testing VE master on prod RB)
  2. Testing that RB doesn't break VE (ie: testing RB master with VE master, probably, since that's easy)

We can do 1 now by pointing VE to production RB.

We need a RB install in Beta Cluster (puppetized so it can be setup in Staging cluster as well) to do 2.

@Eevans is looking into setting up a cluster in beta [cluster]. In the meantime, labs testing can directly use the prod cluster.

reseting assignee then :)

RB and cassandra are both fully puppetized. The difficult bits are dealing with trebuchet, and possibly updating things from master.

These seem to be the steps to success:

  • set up the instances in deployment-prep and their puppet/salt
  • tweak the ops/puppet beta config bits for RB and Cassandra (needs assistance from Yuvi)
  • add the master update job (assistance from Antoine)
  • tweak the mw-config labs settings to enable updates and the VRS

Change 197662 had a related patch set uploaded (by Mobrovac):
Adjust RESTBase / Cassandra settings for deployment-prep

https://gerrit.wikimedia.org/r/197662

Once the patch is merged we can test it and make sure RB and Cassandra work in the VMs. RB's config has been changed so that it uses the Parsoid instance available in the beta cluster. We will have to make sure the communication is set up properly. The relevant interwiki map can be found in Parsoid's beta settings

For hooking up the extension, the pertinent files are InitialiseSettings-labs.php and CommonSettings-labs.php. They should be modelled after the config settings for production in InitialiseSettings.php and CommonSettings.php.

Note that there is a patch already in motion to disable RESTBase completely in the beta cluster.

Figure out a way to update code in the beta cluster so that we can test stuff before releasing it into production

For parsoid we made the parsoid instance a Jenkins slave and have a job that pull the merged commit and reload the parsoid service. We have reused the same system for the content translation server (cxserver). It would probably be better to rely on git-deploy instead from the main work machine deployment-bastion.eqiad.wmflabs. A single Jenkins can be made that will cd to the appropriate /srv/deployment sub directory, git deploy start, git checkout the merged patch then git deploy sync. But I am not sure how much git-deploy can be made to run entirely automatically and wait for all minions to have completed their tasks.

Maybe worth a subtask?

For parsoid we made the parsoid instance a Jenkins slave and have a job that pull the merged commit and reload the parsoid service. We have reused the same system for the content translation server (cxserver). It would probably be better to rely on git-deploy instead from the main work machine deployment-bastion.eqiad.wmflabs. A single Jenkins can be made that will cd to the appropriate /srv/deployment sub directory, git deploy start, git checkout the merged patch then git deploy sync. But I am not sure how much git-deploy can be made to run entirely automatically and wait for all minions to have completed their tasks.

I am rather sceptical about this. Trebuchet seems to be having real problems with submodules lately - in the last week I had to manually intervene in every git deploy I did. Since this is a particular case, I am OK with just manually SSH into the boxes and update (or have a script doing that). This is not a solution, ofc, but it gets the job done for now. Alternatively, I'm also thinking of setting up cron jobs to git pull periodically and restart RESTBase if any changes happened.

Maybe worth a subtask?

Heh, not only a task but a whole small team working on the problem. The deployment (and update) process for WMF software in general needs to be rethought IMHO.

Figure out a way to add the beta cluster wiki domains into config.yaml
Modify ops/puppet so that the Cassandra and RESTBase roles work in both production and beta cluster

These are covered by the patch in ops/puppet, which has been tested in the beta cluster and confirmed to work.

Since the domains differ in prod and beta, the solution is to have two two config.yaml templates in the puppet module - one for production and another for beta. The alternative solution could have been to list the domains in hiera (and thus have one array for prod, one for beta) and then iterate through them while generating the configuration, but the approach has been discarded, as it entails uniformity in domain specs.

Change 198221 had a related patch set uploaded (by Mobrovac):
Activate RESTBase in the Beta Cluster

https://gerrit.wikimedia.org/r/198221

Configure and activate the RestBaseUpdateJobs extension and integrate RB with VE

Patch pending for this.

Change 197662 merged by Yuvipanda:
Adjust RESTBase / Cassandra settings for deployment-prep

https://gerrit.wikimedia.org/r/197662

Change 198221 merged by jenkins-bot:
Activate RESTBase in the Beta Cluster

https://gerrit.wikimedia.org/r/198221

VE is working (via RESTbase) in deployment-prep now

The only remaining thing is figuring out automatic updates from master. A simple option could be to simply perform a periodic checkout & restart from cron locally. Another would be to do the same via Ansible, again locally or from Jenkins.

GWicke renamed this task from Update / maintain Beta Cluster restbase cluster to Update / maintain Beta Cluster restbase cluster: Up & working with VE, only automatic updates missing.Mar 20 2015, 8:32 PM
GWicke renamed this task from Update / maintain Beta Cluster restbase cluster: Up & working with VE, only automatic updates missing to Update / maintain Beta Cluster restbase cluster: Up & working with VE, only automatic code updates missing.
GWicke lowered the priority of this task from High to Medium.
mobrovac renamed this task from Update / maintain Beta Cluster restbase cluster: Up & working with VE, only automatic code updates missing to Update / maintain Beta Cluster restbase cluster: Up & working with VE.Mar 24 2015, 2:25 PM
mobrovac closed this task as Resolved.
mobrovac updated the task description. (Show Details)

As a temporary solution, I have installed cron scripts on both VMs which check for changes every 3 minutes, rebuild the dependencies if package.json has changed and restart RESTBase.