Application Security Pipeline in Gitlab: A Journey!

By: @mmartorana and @sbassett

Some history

For about a decade now, the combination of gerrit, zuul and jenkins have been used as the primary means of code review and continuous integration for most Wikimedia codebases. While these systems have been used successfully and are customized to support various workflows and developer needs, they have not helped facilitate the development of a robust application security pipeline within CI. While efforts have been made within the security space - with phan and the phan-taint-check plugin, libraryupgrader, and an occasional custom eslint rule - Wikimedia codebases have not taken full advantage of the current suite of open-source application security tooling that drives modern security automation. Given the aforementioned deficits and the announcement of Wikimedia migrating to Gitlab as a git front-end and CI/CD system, the Wikimedia Security-Team decided to explore what a modern application security pipeline within Gitlab could look like.

Our development path and roadmap

When the Gitlab migration was announced, the Wikimedia Security-Team saw great potential in the development of a robust application security pipeline to further improve application security testing and to make a concerted effort to shift left (wikipedia, snyk, Accelerate). Gitlab and its modern CI/CD functionality was a great candidate to help us explore the architecture and implementation of an application security pipeline for Wikimedia codebases, as it satisfied a number of desired outcomes including user-friendliness, convenience and impact.

Over the past couple of quarters, members of the Wikimedia Security-Team have created a number of security includes which employ Gitlab’s intuitive CI/CD functionality, particularly their means of including various yaml configuration files as components within different CI/CD stages. We initially focused this work upon several common languages used within Wikimedia projects: PHP, JavaScript, Python and Golang. Though it should be noted that the Gitlab security includes project is open to all contributors and, given Gitlab’s flexibility and simplicity, will hopefully encourage both improving existing include files while also driving support for the creation of new include files to support additional languages.

A basic example

During the aforementioned development cycle, the Wikimedia Security-Team compiled some basic mediawiki.org documentation to help developers get started with the configuration of their Gitlab repositories to run various security-related tests during CI. One specific example we explored was that of the function-schemata codebase, as used for the Abstract Wikipedia project. We migrated a test version of the repository over to Gitlab and set up a simple, security-focused .gitlab-ci.yml. This obviously would not be a complete .gitlab-ci.yml file for most codebases, but let’s focus upon the security-relevant pieces for now. First we see several environment variables defined under the variables yaml key. These serve to configure various docker images, tool cli options, etc. and are documented within the application security pipeline documentation. Then we see a list of included CI files, referenced via raw file URLs and indicating a specific tagged release. These correspond to specific tools to run during the default test phase of a repository’s CI pipeline. We can see that npm audit, npm outdated, semgrep (with certain javascript-specific rules sets) and osv’s scanner cli will all be run. In addition to these included files, we are also including Gitlab’s built-in SAST functionality (currently blocked on T312961) which, while limited in certain ways, can provide for additional security analysis. We can then see some sample pipeline output which displays the output of the tools which were run and indicates passing and failing tests.

Some opinionated decisions and current caveats would include:

  1. Only being able to run the tools within the security include files under Gitlab’s test CI stage.
  2. Having the security include files run for every branch which triggers the default CI pipeline (we’d definitely like to support custom branch and tag configurations at some point)
  3. Only utilizing OSI- and free-culture-compliant tools and databases (likely perceived as a positive for many)
  4. Presenting all results publicly as is the default configuration for repositories and pipelines within Wikimedia’s installation of Gitlab, as it currently is within gerrit and jenkins and a value of most FOSS projects.

It should be noted for the last two issues that some discussion did occur within various Phabricator tasks (T304737, T301018) and the current state of the CI includes was determined to be the best path forward at this time.

The future we would like to embrace

The Wikimedia Security-Team is obviously very enthusiastic about our work thus far in developing an application security pipeline for Wikimedia codebases migrating to Gitlab. In the coming development cycles, we plan to address bugs, evaluate and improve current CI include offerings as well as develop (and strongly encourage others to develop) new and useful CI includes. Finally - we welcome any and all constructive feedback on how to best improve upon this initial offering of security-focused CI includes.


Written by sbassett on Wed, Jul 20, 4:55 PM.
Staff Security Architect
"Barnstar" token, awarded by Quiddity.

Event Timeline