Page MenuHomePhabricator

Migrate Beta cluster services to use Kubernetes
Closed, ResolvedPublic

Description

Once T198901 is done, it will be hard to update Beta cluster services to keep up to date with Production. This is task to track on possible migration of Beta clusters to Kubernetes too.

Event Timeline

I wonder if we can have an automatic CI job that pushes the new code upon merge. Or, even better, make that part of the pipeline.

It's not just going to become a problem once T198901 is done, it's already a problem - due to the roles that have been removed in favour of k8s in T200832 and T213194, puppet is already failing on deployment-mathoid, deployment-sca01, and deployment-sca02, meaning these servers/services are already going to be out of date and will eventually break.

The next thing I expect to break on instances with broken puppet will be their ability to resolve domain names (which will probably break LDAP integration and therefore people will not be able to log in), but more changes could be expected of those instances in future that they will not apply, and therefore break even more.

Edit: For the record it appears that access to these VMs did break as I suspected, it also appears the citoid service on one of them is down for some reason:

alex@alex-laptop:~$ curl 'https://en.wikipedia.beta.wmflabs.org/api/rest_v1/data/citation/mediawiki/http%3A%2F%2Fbbc.co.uk%2F'
{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/internal_http_error","method":"get","detail":"Error: connect ECONNREFUSED 172.16.5.112:1970","uri":"http://deployment-sca02.deployment-prep.eqiad.wmflabs:1970/api?format=mediawiki&search=http%3A%2F%2Fbbc.co.uk%2F"}

haven't checked all the other services running on those sca instances.

mobrovac raised the priority of this task from Medium to High.Apr 30 2019, 12:41 AM

Raising the prio and moving to next since we'll have to attack this problem sooner rather than later.

There is a simple solution to run services that are now packaged as containers on deployment-prep:

  • Create a "prefix puppet" on horizon you want to use.
  • Assign to said prefix the class role::beta::docker_services
  • Configure hiera for this role like follows:
profile::docker::runner::service_defs:                                                                                                                                                      
  mathoid:                                                                                                                                                                                  
      port: 100044                                                                                                                                                                            
      version: build-42                                                                                                                                                                       
      override_cmd: nodejs server.js -c /etc/mathoid/config.yaml                                                                                                                              
      config:                                                                                                                                                                                 
         num_workers: 1                                                                                                                                                                        
         worker_heap_limit_mb: 300                                                                                                                                                             
#  ... [all the config should go here] 
  • Create as many VMs as your service needs

Whenever you change the version parameter in this hiera file, the container running will be updated to the desired built version.

At the moment we don't allow using meta tags like latest, but I don't think having to change one value in hiera really creates a problem in terms of updating the running container.

The container will run as a normal systemd service on your VM, so it can be managed mostly as you're used to.

Why do I think this is "good enough"? Because I think the difference with production won't be larger than what we have currently, where no service lives behind a loadbalancer, and the cost, complexity and reliability of things is much simpler to attain.

Please also note you can run multiple services on the same VM if you really want to, it's enough to add a second stanza in the hiera definition.

Will the service run into any differences in its environment due to being run with role::beta::docker_services instead of k8s? I'm not 100% thrilled with the idea of introducing another infrastructure difference from production, but it might not be a big deal if the service behaves the same.
When @Ottomata mentioned that role to me e thought that role::beta::docker_services does not work on stretch, is that correct?

I believe the VM has to be Jessie atm, unfortuntely. Can't remember exactly why.

Since the docker container will be the same as the one running in production, I don't think the environmental differences will be more than they already are in beta. The config for the service will be manually specified in hiera (either in Horizon or in operations/puppet), and here you can get as close to or as far from prod as you like.

I think the main problem will come from the fact that in production the service configs live in Helm charts now, rather than in Puppet. This means we can't easily reuse config defaults between beta and production. If we eventually end up somehow rendering Helm release values files from Puppet (@akosiaris mentioned this might happen), then it might be easier to build some puppet abstraction on top of role::beta::docker_services that makes it a little easier to share configs with beta.

Since the docker container will be the same as the one running in production, I don't think the environmental differences will be more than they already are in beta. The config for the service will be manually specified in hiera (either in Horizon or in operations/puppet), and here you can get as close to or as far from prod as you like.

I think the main problem will come from the fact that in production the service configs live in Helm charts now, rather than in Puppet. This means we can't easily reuse config defaults between beta and production. If we eventually end up somehow rendering Helm release values files from Puppet (@akosiaris mentioned this might happen), then it might be easier to build some puppet abstraction on top of role::beta::docker_services that makes it a little easier to share configs with beta.

I was referring to secrets in that conversation, although that could be expanded to also cover some infrastructural stuff services might need to use like url-downloader endpoints. We are going to be brainstorming/investigating about this in the next couple of weeks as the helm release values pieces start falling into place. I 'll update correspondingly this task. But I have doubts it will be reusable in this context.

Will the service run into any differences in its environment due to being run with role::beta::docker_services instead of k8s? I'm not 100% thrilled with the idea of introducing another infrastructure difference from production, but it might not be a big deal if the service behaves the same.
When @Ottomata mentioned that role to me e thought that role::beta::docker_services does not work on stretch, is that correct?

No it's not, as I just proved (see https://phabricator.wikimedia.org/T218609#5170364).

Also the whole point of running in containers is that the deployment is as reproducible as possible. The only real difference with production of some relevance is networking, but that's already vastly different in Cloud VPS so it's really a non-argument IMHO.

An example of environmental differences: service-runner uses statsd. In prod we use prometheus-statsd-exporter in a k8s container with service specific metric mappings to get those metrics into prometheus.

Is there anything preventing us from running that same container in beta on docker? We have a prometheus instance IIRC

The status quo is that services always run their code in beta before it reaches production. For manually deployed code (MW) and most services (before k8s) this is/was automated by Jenkins using basically just git-pull, or cron (puppet). Services not interacting with MW, that use scap (such as webperf services), require individual teams to do their routine deployments in both beta and prod manually (and sometimes skip this in beta).

I think it should be a requirement that we do not regress from this status quo. Updating a key in Horizon isn't difficult, but it's certainly a much higher bar compared to automatically happening by Jenkins (and only doing something manually if that fails).

Are the maintainers of these services okay with having to manually deploy them via Horizon on a daily basis? If they, that could work. A more automated workflow is likely preferred, but I can see how that would not be worth investing in at this point with the proper pipeline coming up "soon".

The bottom line is, it would be a huge step backwards for deployment confidence if we lose this status quo. It does not just affect the service themselves. This affects all other team's ability to have confidence in their deployment as well. QA testing for most product features focusses heavily on Beta Cluster. We would be significantly devaluing and slowing down their work if these node services are usually out of date.

An example of environmental differences: service-runner uses statsd. In prod we use prometheus-statsd-exporter in a k8s container with service specific metric mappings to get those metrics into prometheus.

It's not like you can't do the same here by declaring the container if you want. I am not aware of beta being used to gather metrics and performance data.

The status quo is that services always run their code in beta before it reaches production. For manually deployed code (MW) and most services (before k8s) this is/was automated by Jenkins using basically just git-pull, or cron (puppet). Services not interacting with MW, that use scap (such as webperf services), require individual teams to do their routine deployments in both beta and prod manually (and sometimes skip this in beta).

I am not aware of the fact mathoid and other services were deployed by jenkins in beta. And given how things are deployed in beta too (via scap 3), they would only get deployed when a new release is prepared

Having said this, it would be relatively easy to add support for deployment via jenkins to this installation.

I think it should be a requirement that we do not regress from this status quo. Updating a key in Horizon isn't difficult, but it's certainly a much higher bar compared to automatically happening by Jenkins (and only doing something manually if that fails).

Are the maintainers of these services okay with having to manually deploy them via Horizon on a daily basis? If they, that could work. A more automated workflow is likely preferred, but I can see how that would not be worth investing in at this point with the proper pipeline coming up "soon".

Again, I'm not sure this is true, and given I don't see people preparing daily releases in their deploy repository today, I don't think this is the burden we're talking about. This is btw completely tangential to the solution used to deploy the containers, and it's actually way easier to automate this procedure with the solution of running a docker container in a VM rather than with a full-blown kubernetes cluster.

The bottom line is, it would be a huge step backwards for deployment confidence if we lose this status quo. It does not just affect the service themselves. This affects all other team's ability to have confidence in their deployment as well. QA testing for most product features focusses heavily on Beta Cluster. We would be significantly devaluing and slowing down their work if these node services are usually out of date.

In the deployment-pipeline meeting we talked about the need to automate the deployments, so I agree it's important; I am just not aware of this being the status quo - and, from what I can see on the VMs, I'm rather convinced it's not.

The status quo is that services always run their code in beta before it reaches production. [..]

I am not aware of the fact mathoid and other services were deployed by jenkins in beta. [..]

Looks like this was changed in 2016. See fb7c34e450b8, e6b3960571f20c, and:

[..] how are you going to keep it up-to-date on beta?

The Parsoid folks are ok with manually updating beta for now [..]

On beta, we had Parsoid and some other services deployed via Jenkins whenever a change got merged. Then each service/team had different needs: deploy automatically or manually, from the source repository or from the /deploy repository. I think in the end we have settled on manual deployment and dropped the Jenkins jobs.

I have not been active on the beta cluster front in age and really have no idea how the mediawiki services are maintained on it nowadays.

So I guess it is up to the services to define how the containers get updated?

Could we use image version: latest in beta hiera? And somehow pull down the new latest and restart the image whenever a new version is created and uploaded to the registry?

Could we use image version: latest in beta hiera? And somehow pull down the new latest and restart the image whenever a new version is created and uploaded to the registry?

Sure, you just have to add a more complex script to exec to service::docker I guess, so that you can properly check that 'latest' or any other similar metatag are correctly respected.

Be my guest, I'll happily review the change!

Lets put together a list of all the services we need to set up as containers within beta (either because stuff is already broken without it or because it will be soon), and figure out how best to arrange the VMs for it. We might not want to be making a new VM for each service if they all run inside containers?

Lets put together a list of all the services we need to set up as containers within beta (either because stuff is already broken without it or because it will be soon), and figure out how best to arrange the VMs for it. We might not want to be making a new VM for each service if they all run inside containers?

Looking at what's on kubernetes already, we have:

  • mathoid
  • eventgate-analytics
  • eventgate-main (?)
  • zotero / citoid
  • cxserver

and upcoming

  • wikidata-termbox
  • sessionstore/kask

as for stuffing more than one service on one VM: it's surely possible, but I'm not sure we want unless we add some resource constraining to our manifests, so that systemd will set up cgroups for the containers, and run docker accordingly as well.

Alright, I guess in that case I'll create a VM for citoid later and try getting the container for that running there? If it works out then maybe cxserver too, I don't know the status of eventgate-main or wikidata-termbox.

Could we use image version: latest in beta hiera? And somehow pull down the new latest and restart the image whenever a new version is created and uploaded to the registry?

Sure, you just have to add a more complex script to exec to service::docker I guess, so that you can properly check that 'latest' or any other similar metatag are correctly respected.

Be my guest, I'll happily review the change!

latest is NOT a tag that gets added by the pipeline FWIW.

eventgate-main is up in both beta in via service::docker and in prod k8s. Both eventgate-analytics and eventgate-main are running in beta on the deployment-eventgate-1 VM.

latest is NOT a tag that gets added by the pipeline FWIW.

Aye, right. I guess we could just list the tags and grab the one with the latest date, e.g. 2019-05-08-162821-production or whatever.

krenair@deployment-docker-citoid01:~$ curl 'http://localhost:1970/api?search=10.1038%2Fng.590&format=mediawiki'
[{"url":"https://www.nature.com/articles/ng.590","itemType":"journalArticle","issue":"6","DOI":"10.1038/ng.590","pages":"498–503","title":"Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved","volume":"42","publicationTitle":"Nature Genetics","date":"2010-05-23","ISSN":["1061-4036","1546-1718"],"abstractNote":"Sebastien Gagneux and colleagues report the genome sequences of 21 phylogeographically diverse Mycobacterium tuberculosis complex strains. Analysis of the global genetic diversity of these strains showed that most of the known human T cell epitopes were highly conserved.","language":"En","accessDate":"2019-05-14","author":[["Iñaki","Comas"],["Jaidip","Chakravartti"],["Peter M","Small"],["James","Galagan"],["Stefan","Niemann"],["Kristin","Kremer"],["Joel D","Ernst"],["Sebastien","Gagneux"]],"PMID":"20495566","PMCID":"PMC2883744","source":["PubMed","Crossref","citoid"]}]

it still relies on external zotero, but this is with:

profile::docker::engine::declare_service: true
profile::docker::engine::settings: {}
profile::docker::engine::version: 1.12.6-0~debian-jessie
profile::docker::runner::service_defs:
  mediawiki-services-citoid:
    config:
      allowPrivateAddresses: false
      cors: '*'
      mailto: services@lists.wikimedia.org
      maxRedirects: 10
      port: 1970
      proxy: null
      pubmed: false
      userAgent: Citoid (Wikimedia tool; learn more at https://www.mediawiki.org/wiki/Citoid)
      wskey: ''
      xisbn: false
      zotero: false
      zoteroInterface: deployment-zotero01.deployment-prep.eqiad.wmflabs
      zoteroPort: 1969
      zoteroUseProxy: false
    namespace: wikimedia
    port: 1970
    version: 2019-04-01-104952-production

conf data comes from deployment-sca01:/etc/citoid/config.yaml, version is listed at https://tools.wmflabs.org/dockerregistry/wikimedia/mediawiki-services-citoid/tags/ which fsero says is the one prod currently uses (for the moment). profile::docker::engine settings come from Joe's docker-mathoid instance

hold on a minute, that zotero instance does not exist, why does this work?

I've opened a couple of subtasks and T223346. Considering the Citoid service on sca01 is probably already broken in terms of Zotero interaction due to it pointing at a nonexistent Zotero instance, I'm just going to move stuff to use this new Citoid instance, so we can clean up the broken deployment-sca* instances.

Change 510290 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] deployment-prep: Use new Citoid service running inside Docker

https://gerrit.wikimedia.org/r/510290

Cherry-picked the above commit on deployment-puppetmaster and ran puppet on deployment-restbase0[12], VE-Citoid integration appears to work again (though as mentioned above any functionality relying on Zotero may still be broken)

krenair@deployment-docker-cxserver01:~$ sudo /usr/bin/docker run -p 8080:8080 -v /etc/mediawiki-services-cxserver/:/etc/mediawiki-services-cxserver --name alex-test docker-registry.wikimedia.org/wikimedia/mediawiki-services-cxserver:2019-05-08-064536-production -c /etc/mediawiki-services-cxserver/config.yaml
Error during DHT setup undefined
krenair@deployment-docker-cxserver01:~$

Anyone know what that means?

Change 510290 merged by Giuseppe Lavagetto:
[operations/puppet@production] deployment-prep: Use new Citoid service running inside Docker

https://gerrit.wikimedia.org/r/510290

krenair@deployment-docker-cxserver01:~$ sudo /usr/bin/docker run -p 8080:8080 -v /etc/mediawiki-services-cxserver/:/etc/mediawiki-services-cxserver --name alex-test docker-registry.wikimedia.org/wikimedia/mediawiki-services-cxserver:2019-05-08-064536-production -c /etc/mediawiki-services-cxserver/config.yaml
Error during DHT setup undefined
krenair@deployment-docker-cxserver01:~$

Anyone know what that means?

That's the rate limiter. It's failing to initialize it. The config in the repo[1] has some defaults, probably something missing in the config.yaml above?

[1] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/cxserver/+/refs/heads/master/config.prod.yaml#25

I fiddled around with that and now the container exits after 15s without printing or logging anything :|

Change 510586 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/mediawiki-config@master] deployment-prep: Use new cxserver running in Docker

https://gerrit.wikimedia.org/r/510586

Thank you to whoever has been following my progress and updating the *-beta.wmflabs.org proxies. I haven't yet figured out what the purpose of exposing the mathoid and citoid APIs is, mind shedding some light on that?

Change 510588 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] deployment-prep: Change restbase to talk to new working cxserver Docker container

https://gerrit.wikimedia.org/r/510588

cxserver runs but does not seem to know about the basics:

krenair@deployment-docker-cxserver01:~$ curl localhost:8080/v2/page/en/es/Main_Page
Page en:Main_Page could not be found. HTTPError: 404: https://mediawiki.org/wiki/HyperSwitch/errors/not_found#route

Niklas found the solution to that one: set mw_host: en.wikipedia.beta.wmflabs.org in conf - otherwise it defaults to en.wikipedia.org which will cause RB to generate 404s

Change 510588 merged by Andrew Bogott:
[operations/puppet@production] deployment-prep: Change restbase to talk to new cxserver Docker container

https://gerrit.wikimedia.org/r/510588

Change 510586 merged by jenkins-bot:
[operations/mediawiki-config@master] deployment-prep: Use new cxserver running in Docker

https://gerrit.wikimedia.org/r/510586

If we're content to stick with simple Docker instances long-term due to beta's relatively small scale, then I suggest we close this and have individual tasks for services needing to be migrated in future?

akosiaris claimed this task.

Agreed with @Krenair, closing for now.