Page MenuHomePhabricator

Automate deployment of heritage on Gerrit post-merge
Open, Needs TriagePublic

Description

After merging a patch on Gerrit, I need to

  • deploy the changes / run build
  • send a notification to SAL

On heritage, we automated the process as much as we could. The bin/deploy-to-toollabs.sh script does the following:

  • SSH into Labs
  • Become heritage
  • git pull
  • (could run build (installing requirements / build composer) − there is just no need for it right now)
  • generate the log message for SAL.

The manual steps are:

  • Running this bash script locally
  • Copy/Pasting the generated message, log into IRC, posting

We would want these tasks to be automatically done by Jenkins on post-merge.

Poking @bd808, @Krinkle, @hashar for their CI/Labs thoughts :)

Event Timeline

JeanFred created this task.Feb 12 2017, 9:52 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 12 2017, 9:52 AM
JeanFred moved this task from Backlog to Watching on the User-JeanFred board.
Restricted Application added a project: Cloud-Services. · View Herald TranscriptFeb 25 2017, 12:20 AM

The current approach (sshi-ing into tool-labs bastion as the tool user and running a shell command) is unlikely to work from a post-merge job as that would require Jenkins slaves to have access to the Tool Labs bastion, as well as credentials to become said user. Since we don't differentiate between types of Jenkins jobs, and since we cannot trust what code executes within a Jenkins job, we wouldn't want to grant Jenkins ssh access to Tool Labs, especially as a specific user.

There are three alternative solutions like this in use, however. Some of which may help inspire a solution for you.

  1. Commits to the integration/docroot.git repository result in a post-merge Jenkins job assigned to a slave that is in fact the integration.wikimedia.org web server. Since we can trust Jenkins configuration (just not the jobs) we made that server a slave, and only assign this post-merge job to it. The job then simply runs git-pull, which means https://integration.wikimedia.org and https://doc.wikimedia.org are then updated. Note that this only applies to the base content for those domains. The generated documentation on doc.wikimedia.org comes from elsewhere (see point 2).
  2. Many repos have post-merge jobs that generate code documentation and/or code coverage reports. There is a dedicated server within the integration project in Wikimedia Labs that all CI slaves have rsync permissions for. The slave that ran the job will rsync it to a uniquely-identified directory (identified by a once token) on the rsync server. Jenkins then also spawns another job after this one that is run by the web server's Jenkins slave which will rsync from that unique directory (if it exists) to the web server's document root.
  3. After merging in operations/mediawiki-config.git, a post-merge job is spawn and assigned to the beta cluster's bastion. Jenkins has a dedicated user on that bastion, and stores SSH credentials for it in a secure location on the Jenkins master (not accessible by Jenkins slaves). The job in question does a scap deployment of the patch and runs the MediaWiki update script for the beta cluster.

Aside from the Jenkins-based solution, another solution might be to integrate with Striker (https://wikitech.wikimedia.org/wiki/toolsadmin.wikimedia.org) somehow, with the given that the repository is associated with the Tool Labs account and is hosted on Phabricator. Perhaps some kind of auto-deploy for a subset of repositories to a chosen destination in the tools' home directory. This could be run from a trusted account within Tool Labs that can write to other tool user's home directories.

@Legoktm has set up an interesting way to do this for wikibugs that might interest you :)

For wikibugs we have a jenkins post merge job that curl's https://tools.wmflabs.org/wikibugs/pull.php, which is basically running exec("cd /data/project/wikibugs/wikibugs2 && git pull --log") with some basic rate limiting. wikibugs checks the mtime of the config file every 60 seconds and reloads it if out of date, and since it's an IRC bot it can also send a message for SAL.

If you're worried about the endpoint being DDoS'd or something, we can probably do some kind of authenticated request using a jenkins secret. And for IRC, you could use the ircnotifier tool that Yuvi wrote that exposes an HTTP API for IRC messages.

ircnotifier no longer exists.

@Krinkle Actually we are already using post-merge hooks on heritage, indeed to generate code coverage and upload it to doc.wikimedia.org. :)

Having an endpoint that exec git pull sounds horrible to be honest ^__^"

I understand the issues raised here, but I do find it quite frustrating. On the one hand, my day job involves developing continuous delivery pipelines which on post-merge notify half a dozen different services, and runs Terraform to recreate entire clusters of containers serving live-clients ; on the other hand, I have to manually run a script here and copy/paste a message on IRC :-) I feel like the Toolforge environment should be making these kind of operations easy (or even possible).

bd808 added a comment.Aug 10 2017, 3:37 AM

I feel like the Toolforge environment should be making these kind of operations easy (or even possible).

It is possible assuming that you trust Jenkins to store an ssh key attached to an LDAP user account which can become heritage and run your commands. There is nothing specifically in Toolforge that would block this. The main question becomes the matter of trust of Jenkins to hold the account's ssh private key. If we replaced ssh in this loop with any other authenticated transport mechanism you would still face the question of trust of Jekins to hold the privileged credentials, but the exposure could possibly be limited to less than full shell access as the tool account.

My more ideal future world would be a full featured PaaS system in Toolforge that overlays Kubernetes and provides push-to-deploy semantics like Heroku uses. See T136265: Develop evaluation criteria for comparing Platform as a Service (PaaS) solutions. This PaaS would still have authentication trust issues with any automation tool however the same as the current Jenkins possibility. I think secure credential storage and use is the weak link in any complete deployment automation setup for our shared hosting environments.

@Legoktm on T157893#3054609 commented how the CI Jenkins update wikibugs by running a short job that simply does a curl which triggers the update over an RPC. That is probably easy to reproduce for heritage.

We could have a Jenkins slave on tools, that would let us run jobs on tools. Jenkins would ssh to the instance as labs user jenkins-deploy and run whatever job we want there. Granted the user is added in the proper groups, it should be able to become and run the commands. We did that for the beta cluster, running jobs directly on deployment-tin (the deployment server).

Another possibility, is to set up a standalone Jenkins in tools which would listen for events in Gerrit and trigger jobs as needed. Possibly via the https://wiki.jenkins.io/display/JENKINS/Gerrit+Trigger which, for this use case, is an order of magnitude simpler than Jenkins Job Builder templating + Zuul layout file. That Jenkins would listen for Gerrit events (via gerrit stream-events) and trigger the appropriate job whenever a patch is merged. The opportunity there is that tools people would be able to manage the Jenkins and jobs however they want without having to channel everything via the CI team / the CI jenkins. That might easier in the long run.

Krinkle removed a subscriber: Krinkle.
hashar added a comment.EditedMay 18 2018, 1:58 PM

We need the labs user jenkins-deploy to be added to the tools project on WMCS and then add the user to the tool user group. From there a Jenkins job running on the jenkins master (on contint1001) will be able to:

  • ssh to tools-login.wmflabs.org
  • ssh to tools-dev.wmflabs.org
  • become <toolname>
  • clone the repo and run the commands to build & deploy
bd808 added a comment.May 18 2018, 8:30 PM

We need the labs user jenkins-deploy to be added to the tools project on WMCS and then add the user to the tool user group.

@hashar if you log into https://toolsadmin.wikimedia.org/ as the jenkins-deploy user and submit a membership request for Tooloforge I would be happy to approve it.

@hashar and I discussed this on IRC a few days ago. Summary (please add/clarify if I missed anything Hashar!):

  • CI team doesn't have access to the jenkins-deploy LDAP user. However it should be straightforward to create a toolforge-jenkins account and hook it up into Jenkins.
  • We're looking at 2-3 jobs to start with, e.g. toolforge-deploy-python-webservice (with variants for nodejs, php, etc.) would do something like the following for a basic implementation:
    • git -C www/python/src pull
    • webservice restart
    • dologmsg "Updated $tool ...."
  • The scope of this exists more than just the heritage tool. Personally I would use this for all of my Gerrit-hosted tools.
  • Hashar is concerned about adding a new workflow/system to Jenkins when it is supposed to be phased out (though no replacement actually exists yet) adding burden to the CI team. He suggested either having Toolforge set up its own Jenkins, or have something listening to stream-events to trigger the automation.
    • I didn't think either of those alternatives made sense given the resources that Toolforge admins have right now. From my POV if this auto deploy system is successful, I expect that Toolforge admins (especially myself) would assist with migration to the new CI system. Given that this is just a glorified ssh mechanism, I don't expect much difficulties.

In the end we didn't reach a resolution on the final point on if/how to move forward.

bd808 added a comment.Jul 2 2020, 12:20 AM
  • CI team doesn't have access to the jenkins-deploy LDAP user. However it should be straightforward to create a toolforge-jenkins account and hook it up into Jenkins.

How about we fix access to the jenkins-deploy user instead of making a new one? The account exists in LDAP as a proper Developer account and is already a member of the puppet-diffs, integration, and deployment-prep projects exactly so that Jenkins jobs can use ssh in those projects.

The registered email address for the Developer account is "jenkins-bot@wikimedia.org". Does anyone know where that routes to? We can always force a password change using a maintenance script, but just using https://wikitech.wikimedia.org/wiki/Special:PasswordReset is easiest if the email actually goes somewhere. And honestly if it doesn't go anywhere we should change the address to something that does.

Krinkle updated the task description. (Show Details)Jul 2 2020, 12:29 AM
Krinkle added a subscriber: Krinkle.