Page MenuHomePhabricator

Automate deployment of heritage on Gerrit post-merge
Open, Needs TriagePublic

Description

After merging a patch on Gerrit, I need to

  • deploy the changes / run build
  • send a notification to SAL

On heritage, we automated the process as much as we could. The bin/deploy-to-toollabs.sh script does the following:

  • SSH into Labs
  • Become heritage
  • git pull
  • (could run build (installing requirements / build composer) − there is just no need for it right now)
  • generate the log message for SAL.

The manual steps are:

  • Running this bash script locally
  • Copy/Pasting the generated message, log into IRC, posting

We would want these tasks to be automatically done by Jenkins on post-merge.

Poking @bd808, @Krinkle, @hashar for their CI/Labs thoughts :)

Event Timeline

The current approach (sshi-ing into tool-labs bastion as the tool user and running a shell command) is unlikely to work from a post-merge job as that would require Jenkins slaves to have access to the Tool Labs bastion, as well as credentials to become said user. Since we don't differentiate between types of Jenkins jobs, and since we cannot trust what code executes within a Jenkins job, we wouldn't want to grant Jenkins ssh access to Tool Labs, especially as a specific user.

There are three alternative solutions like this in use, however. Some of which may help inspire a solution for you.

  1. Commits to the integration/docroot.git repository result in a post-merge Jenkins job assigned to a slave that is in fact the integration.wikimedia.org web server. Since we can trust Jenkins configuration (just not the jobs) we made that server a slave, and only assign this post-merge job to it. The job then simply runs git-pull, which means https://integration.wikimedia.org and https://doc.wikimedia.org are then updated. Note that this only applies to the base content for those domains. The generated documentation on doc.wikimedia.org comes from elsewhere (see point 2).
  2. Many repos have post-merge jobs that generate code documentation and/or code coverage reports. There is a dedicated server within the integration project in Wikimedia Labs that all CI slaves have rsync permissions for. The slave that ran the job will rsync it to a uniquely-identified directory (identified by a once token) on the rsync server. Jenkins then also spawns another job after this one that is run by the web server's Jenkins slave which will rsync from that unique directory (if it exists) to the web server's document root.
  3. After merging in operations/mediawiki-config.git, a post-merge job is spawn and assigned to the beta cluster's bastion. Jenkins has a dedicated user on that bastion, and stores SSH credentials for it in a secure location on the Jenkins master (not accessible by Jenkins slaves). The job in question does a scap deployment of the patch and runs the MediaWiki update script for the beta cluster.

Aside from the Jenkins-based solution, another solution might be to integrate with Striker (https://wikitech.wikimedia.org/wiki/toolsadmin.wikimedia.org) somehow, with the given that the repository is associated with the Tool Labs account and is hosted on Phabricator. Perhaps some kind of auto-deploy for a subset of repositories to a chosen destination in the tools' home directory. This could be run from a trusted account within Tool Labs that can write to other tool user's home directories.

@Legoktm has set up an interesting way to do this for wikibugs that might interest you :)

For wikibugs we have a jenkins post merge job that curl's https://tools.wmflabs.org/wikibugs/pull.php, which is basically running exec("cd /data/project/wikibugs/wikibugs2 && git pull --log") with some basic rate limiting. wikibugs checks the mtime of the config file every 60 seconds and reloads it if out of date, and since it's an IRC bot it can also send a message for SAL.

If you're worried about the endpoint being DDoS'd or something, we can probably do some kind of authenticated request using a jenkins secret. And for IRC, you could use the ircnotifier tool that Yuvi wrote that exposes an HTTP API for IRC messages.

@Krinkle Actually we are already using post-merge hooks on heritage, indeed to generate code coverage and upload it to doc.wikimedia.org. :)

Having an endpoint that exec git pull sounds horrible to be honest ^__^"

I understand the issues raised here, but I do find it quite frustrating. On the one hand, my day job involves developing continuous delivery pipelines which on post-merge notify half a dozen different services, and runs Terraform to recreate entire clusters of containers serving live-clients ; on the other hand, I have to manually run a script here and copy/paste a message on IRC :-) I feel like the Toolforge environment should be making these kind of operations easy (or even possible).

I feel like the Toolforge environment should be making these kind of operations easy (or even possible).

It is possible assuming that you trust Jenkins to store an ssh key attached to an LDAP user account which can become heritage and run your commands. There is nothing specifically in Toolforge that would block this. The main question becomes the matter of trust of Jenkins to hold the account's ssh private key. If we replaced ssh in this loop with any other authenticated transport mechanism you would still face the question of trust of Jekins to hold the privileged credentials, but the exposure could possibly be limited to less than full shell access as the tool account.

My more ideal future world would be a full featured PaaS system in Toolforge that overlays Kubernetes and provides push-to-deploy semantics like Heroku uses. See T136265: Develop evaluation criteria for comparing Platform as a Service (PaaS) solutions. This PaaS would still have authentication trust issues with any automation tool however the same as the current Jenkins possibility. I think secure credential storage and use is the weak link in any complete deployment automation setup for our shared hosting environments.

@Legoktm on T157893#3054609 commented how the CI Jenkins update wikibugs by running a short job that simply does a curl which triggers the update over an RPC. That is probably easy to reproduce for heritage.

We could have a Jenkins slave on tools, that would let us run jobs on tools. Jenkins would ssh to the instance as labs user jenkins-deploy and run whatever job we want there. Granted the user is added in the proper groups, it should be able to become and run the commands. We did that for the beta cluster, running jobs directly on deployment-tin (the deployment server).

Another possibility, is to set up a standalone Jenkins in tools which would listen for events in Gerrit and trigger jobs as needed. Possibly via the https://wiki.jenkins.io/display/JENKINS/Gerrit+Trigger which, for this use case, is an order of magnitude simpler than Jenkins Job Builder templating + Zuul layout file. That Jenkins would listen for Gerrit events (via gerrit stream-events) and trigger the appropriate job whenever a patch is merged. The opportunity there is that tools people would be able to manage the Jenkins and jobs however they want without having to channel everything via the CI team / the CI jenkins. That might easier in the long run.

We need the labs user jenkins-deploy to be added to the tools project on WMCS and then add the user to the tool user group. From there a Jenkins job running on the jenkins master (on contint1001) will be able to:

  • ssh to tools-login.wmflabs.org
  • ssh to tools-dev.wmflabs.org
  • become <toolname>
  • clone the repo and run the commands to build & deploy

We need the labs user jenkins-deploy to be added to the tools project on WMCS and then add the user to the tool user group.

@hashar if you log into https://toolsadmin.wikimedia.org/ as the jenkins-deploy user and submit a membership request for Tooloforge I would be happy to approve it.

@hashar and I discussed this on IRC a few days ago. Summary (please add/clarify if I missed anything Hashar!):

  • CI team doesn't have access to the jenkins-deploy LDAP user. However it should be straightforward to create a toolforge-jenkins account and hook it up into Jenkins.
  • We're looking at 2-3 jobs to start with, e.g. toolforge-deploy-python-webservice (with variants for nodejs, php, etc.) would do something like the following for a basic implementation:
    • git -C www/python/src pull
    • webservice restart
    • dologmsg "Updated $tool ...."
  • The scope of this exists more than just the heritage tool. Personally I would use this for all of my Gerrit-hosted tools.
  • Hashar is concerned about adding a new workflow/system to Jenkins when it is supposed to be phased out (though no replacement actually exists yet) adding burden to the CI team. He suggested either having Toolforge set up its own Jenkins, or have something listening to stream-events to trigger the automation.
    • I didn't think either of those alternatives made sense given the resources that Toolforge admins have right now. From my POV if this auto deploy system is successful, I expect that Toolforge admins (especially myself) would assist with migration to the new CI system. Given that this is just a glorified ssh mechanism, I don't expect much difficulties.

In the end we didn't reach a resolution on the final point on if/how to move forward.

  • CI team doesn't have access to the jenkins-deploy LDAP user. However it should be straightforward to create a toolforge-jenkins account and hook it up into Jenkins.

How about we fix access to the jenkins-deploy user instead of making a new one? The account exists in LDAP as a proper Developer account and is already a member of the puppet-diffs, integration, and deployment-prep projects exactly so that Jenkins jobs can use ssh in those projects.

The registered email address for the Developer account is "jenkins-bot@wikimedia.org". Does anyone know where that routes to? We can always force a password change using a maintenance script, but just using https://wikitech.wikimedia.org/wiki/Special:PasswordReset is easiest if the email actually goes somewhere. And honestly if it doesn't go anywhere we should change the address to something that does.

The registered email address for the Developer account is "jenkins-bot@wikimedia.org". Does anyone know where that routes to? We can always force a password change using a maintenance script, but just using https://wikitech.wikimedia.org/wiki/Special:PasswordReset is easiest if the email actually goes somewhere. And honestly if it doesn't go anywhere we should change the address to something that does.

It doesn't appear to be in exim, so it's probably in GSuite. I sent an email to it with a pointer here, so we'll see who (if anyone) replies :) If not, we can ask ITS to see who it's supposed to go to and find more active/relevant people if necessary.

@hashar Do you have any objections to me manually updating the email address for the uid=jenkins-deploy,ou=people,dc=wikimedia,dc=org Developer account to be releng@lists.wikimedia.org?

  • CI team doesn't have access to the jenkins-deploy LDAP user. However it should be straightforward to create a toolforge-jenkins account and hook it up into Jenkins.

How about we fix access to the jenkins-deploy user instead of making a new one? The account exists in LDAP as a proper Developer account and is already a member of the puppet-diffs, integration, and deployment-prep projects exactly so that Jenkins jobs can use ssh in those projects.

The registered email address for the Developer account is "jenkins-bot@wikimedia.org". Does anyone know where that routes to? We can always force a password change using a maintenance script, but just using https://wikitech.wikimedia.org/wiki/Special:PasswordReset is easiest if the email actually goes somewhere. And honestly if it doesn't go anywhere we should change the address to something that does.

Missed this question when it was asked, but noticed an email to jenkins-bot pointing here :)

jenkins-bot is managed as a google group currently and as far as I can see goes only to @hashar and I.

jenkins-bot is managed as a google group currently and as far as I can see goes only to @hashar and I.

Yes that the migration to Gsuite was done via T220664 to phase out custom aliases we had on our own mail servers. https://groups.google.com/a/wikimedia.org/g/jenkins-bot/

The sole usage is to have an emitter email for Jenkins email notifications. Some jobs emails mailing list and the email is then whitelisted/allowed to let the email pass through moderation.

@hashar Do you have any objections to me manually updating the email address for the uid=jenkins-deploy,ou=people,dc=wikimedia,dc=org Developer account to be releng@lists.wikimedia.org?

The jenkins-bot account must not have an email. The account used by Zuul to interact with Gerrit, which mean it is added as a reviewer to almost every changes in Gerrit. If the account is attached to an email, that will result in a large flow of emails. I can't find the task right now, but we tried once cause it looked odd to have an email less account, that resulted in heavy spam in our personal mailboxes :]

@hashar Do you have any objections to me manually updating the email address for the uid=jenkins-deploy,ou=people,dc=wikimedia,dc=org Developer account to be releng@lists.wikimedia.org?

The jenkins-bot account must not have an email. The account used by Zuul to interact with Gerrit, which mean it is added as a reviewer to almost every changes in Gerrit. If the account is attached to an email, that will result in a large flow of emails. I can't find the task right now, but we tried once cause it looked odd to have an email less account, that resulted in heavy spam in our personal mailboxes :]

Aren't jenkins-bot and jenkins-deploy (which this task is about) different users? The jenkins-deploy user has an email address currently pointing to jenkins-bot@wikimedia.org, see https://ldap.toolforge.org/user/jenkins-deploy

What we need is someone to login as jenkins-deploy and go through the https://toolsadmin.wikimedia.org/ workflow to submit a request to join Toolforge.

AH YEAH sorry confused every thing. So the recap is:

UserEmailPurpose
jenkins-botNONEZuul connection to Gerrit, added as reviewer everywhere so no email address.
jenkins-deployjenkins-bot@wikimedia.orgSsh from Jenkins to WMCS instances with some sudo privileges

So yes indeed we should replace jenkins-deploy email in LDAP to point to releng@lists.wikimedia.org. Though I don't think that email is of any use since we do not use that account outside of sshing to WMCS. Sorry @bd808 !

OK. Let's regroup here. @Legoktm and I would like to poke at the idea of this ticket: making it possible to have the Wikimedia Foundation Jenkins server perform actions as a tool maintainer within the Toolforge environment. These actions would involve establishing an SSH connection to a Toolforge bastion (probably dev.toolforge.org), running become <toolname> within that ssh session, and then doing some deploy action as the tool account.

There is an existing uid=jenkins-deploy,ou=people,dc=wikimedia,dc=org Developer account which is used to perform similar activities currently. The account already has ssh credentials in the LDAP directory that controls access to Cloud VPS instances and is used in at least the deployment-prep project to automate actions via ssh.

@Legoktm and I would like this account to become a Toolforge member so that it can then be added as a tool maintainer to tools which opt-in to the wacky new deployment automation idea. For this membership to happen, we need someone with access to the jenkins-deploy account's password (the password for the https://wikitech.wikimedia.org/wiki/User:Jenkins-deploy Developer account) to login to https://toolsadmin.wikimedia.org/ and fill out the membership form at https://toolsadmin.wikimedia.org/tools/membership/apply.

Once that has been done, we can approve the account as a Toolforge maintainer which will unblock the rest of the work.

OK. Let's regroup here. @Legoktm and I would like to poke at the idea of this ticket: making it possible to have the Wikimedia Foundation Jenkins server perform actions as a tool maintainer within the Toolforge environment. These actions would involve establishing an SSH connection to a Toolforge bastion (probably dev.toolforge.org), running become <toolname> within that ssh session, and then doing some deploy action as the tool account.

As @hashar mentioned, I don't think we have any objections to this if it's useful. As long as we're able to build this in a way where it's not too much overhead for folks who +2 in integration/config or toolforge maintainers.

There is an existing uid=jenkins-deploy,ou=people,dc=wikimedia,dc=org Developer account which is used to perform similar activities currently. The account already has ssh credentials in the LDAP directory that controls access to Cloud VPS instances and is used in at least the deployment-prep project to automate actions via ssh.

I added a seperate key for this use-case. I'm not sure if there's a good reason to keep the keys separate, but it probably won't introduce too much overhead and may be useful at some point. This is stored in jenkins as jenkins-deploy-toolforge.

@Legoktm and I would like this account to become a Toolforge member so that it can then be added as a tool maintainer to tools which opt-in to the wacky new deployment automation idea. For this membership to happen, we need someone with access to the jenkins-deploy account's password (the password for the https://wikitech.wikimedia.org/wiki/User:Jenkins-deploy Developer account) to login to https://toolsadmin.wikimedia.org/ and fill out the membership form at https://toolsadmin.wikimedia.org/tools/membership/apply.

Once that has been done, we can approve the account as a Toolforge maintainer which will unblock the rest of the work.

{{done}}. For future reference I've stored the credentials in the releng secrets repository so anyone in releng should be able to access them in future.

There is an existing uid=jenkins-deploy,ou=people,dc=wikimedia,dc=org Developer account which is used to perform similar activities currently. The account already has ssh credentials in the LDAP directory that controls access to Cloud VPS instances and is used in at least the deployment-prep project to automate actions via ssh.

I added a seperate key for this use-case. I'm not sure if there's a good reason to keep the keys separate, but it probably won't introduce too much overhead and may be useful at some point. This is stored in jenkins as jenkins-deploy-toolforge.

This seems like a reasonable idea, but be aware that on the Cloud VPS instance side of things any and all ssh keys attached to the Developer account can access any and all instances where the user is an authorized project member. Which is mostly to say that both keys will work equally to unlock any door into Cloud VPS.

{{done}}. For future reference I've stored the credentials in the releng secrets repository so anyone in releng should be able to access them in future.

Awesome! @Legoktm beat me to approving the Toolforge membership. :)