Page MenuHomePhabricator

Evaluate and confirm potential licensing issues for gitlab appsec pipeline tools
Closed, DeclinedPublic

Description

On their face, the gitlab appsec ci templates are just a collection of yaml include files that control security-related tests within a given pipeline. Each include file allows for the specification of a wikimedia docker image and then a specific security tool, e.g. audit-ci. This repo of yaml include files is planned to be licensed under the Apache 2.0 license. For the most part, all of the security tools we've been using within these yaml files are FOSS with OSI-compliant licenses. But there is a bit of a grey area for python's safety-db (as used by the MIT-licensed safety cli) which is licensed under CC-BY-NC 4.0, none of the CC licenses being OSI-compliant as they aren't really targeted towards code. And some of the semgrep rules we'd like to use are apparently licensed under an OSI-non-compliant Commons Clause license. Is this a potential problem? The yaml files aren't a traditional piece of software, per se, and we aren't bundling the security tools with the code, as one might with various packages and dependencies for more traditional applications. But these yaml include files heavily imply a required usage of gitlab, wikimedia docker images and the installation and running of various security tools, in the opinionated way that they are written. Would this imply a need for compliance in the licensing chain similar to a more traditional application? If an opportunity presented itself to, say, purchase commercial SAST/DAST software to be run within wikimedia CI, would that be a non-starter as things currently stand?

Event Timeline

Tagging @Legoktm and @greg as subject matter experts here to see if they have any input on this before I try to confirm with WMF-Legal.

Sorry about the delay.

tl;dr: By letter of policy, anything that exists in Cloud Services, like on-disk or in-memory should be under a OSI-approved license, but talking/interacting to non-free network services is fine. In spirit, non-free network services should be avoided as much as possible.

I haven't looked to see how the safety-cli works, if it's hitting some external endpoint that processes the safety-db, that would be fine. If it's downloading the safety-db and then reading it, that wouldn't be fine. I recently discovered https://github.com/pypa/advisory-database, which is another advisory database by PyPA except it's CC-BY 4.0, which is a free license. Is it possible to use that instead?

Re the non-free semgrep rules, where are those stored? My guess is that either way, semgrep has to load those rules into memory and process them, so it wouldn't be OK regardless.

If an opportunity presented itself to, say, purchase commercial SAST/DAST software to be run within wikimedia CI, would that be a non-starter as things currently stand?

Well, if they let us buy it under a free license, that would be OK :) But assuming that doesn't happen, according to current policy it would not be allowed. (Personally I agree with that policy, and would not support an exemption.)

I hope this helps clarify things.

Personal commentary:

  • I would think that semgrep rules are something we could do a clean room reimplementation of.
  • Proprietary advisory databases suck (and IMO have questionable legal status in the US where database rights don't exist) and we really should be pushing for and contributing to free ones, like the FriendsOfPHP one we use, as well as the new Python one.

Sorry about the delay.

No problem, thanks for the reply!

tl;dr: By letter of policy, anything that exists in Cloud Services, like on-disk or in-memory should be under a OSI-approved license, but talking/interacting to non-free network services is fine. In spirit, non-free network services should be avoided as much as possible.

Ok, that's fair.

I haven't looked to see how the safety-cli works, if it's hitting some external endpoint that processes the safety-db, that would be fine. If it's downloading the safety-db and then reading it, that wouldn't be fine. I recently discovered https://github.com/pypa/advisory-database, which is another advisory database by PyPA except it's CC-BY 4.0, which is a free license. Is it possible to use that instead?

So if I'm reading the safety code correctly, it looks like, by default, the cli fetches data from the open mirror (if there's no license key) and then stores it in memory, by default. The cli can also create a cache file on-disk, but does not appear to do so by default.

Anyhow, I'm still not certain if various CC licenses are technically OSI-compliant. Yes, they are definitely similar to many proper OSI-compliant licenses for software, but being as they are typically used for "content", I'm not sure they are actually compliant. For example, they are not listed under the big list of approved licenses. And it seems like there has been discussion around CC0 (and similar CC licenses) being able to be approved as OSI-compliant. And the "free" version of the python safety db is technically licensed under CC-BY-NC 4.0. So I guess it comes down to the question of whether certain CC licenses are "good enough" (like ShareAlike) by virtue of being respectable, free licenses but not necessarily OSI-compliant. And whether a "database" or collection of data files should be under a software license or a content license.

Regardless, it looks like the data from the pypa/advisory-database gets pulled into osv.dev, which appears to be fully-licensed under Apache 2.0. And there's already a cli for that, so that's probably a good-enough, workable alternative.

Re the non-free semgrep rules, where are those stored? My guess is that either way, semgrep has to load those rules into memory and process them, so it wouldn't be OK regardless.
...
I would think that semgrep rules are something we could do a clean room reimplementation of.

Given that their rules repo is Commons Clause-licensed, I assume that means it can be forked and re-licensed under an OSI-compliant license? AIUI, Commons Clause only forbids "re-selling" of code, etc. and doesn't interfere with any other features of the base license, which in this case, is LGPL. I'm not sure return2corp would love this idea, but I don't see anything that would technically forbid it from a legal perspective. And then we can either set up our own policy registry or just build all of the yaml rules into one giant config and have our semgrep clis import those.

Update: I've emailed WMF-Legal requesting guidance on some of these specific licensing issues.

sbassett moved this task from Back Orders to In Progress on the Security-Team board.
sbassett added a project: user-sbassett.
sbassett moved this task from Backlog to In Progress on the user-sbassett board.

Given that their rules repo is Commons Clause-licensed, I assume that means it can be forked and re-licensed under an OSI-compliant license? AIUI, Commons Clause only forbids "re-selling" of code, etc. and doesn't interfere with any other features of the base license, which in this case, is LGPL.

The GPLv3 family of licenses tried to make that the the case with their §7 "Additional Terms" which permits a licensee to remove any "further restriction", such as the non-commercial restriction of the Commons Clause, added to the base license. There has been a recent ruling in a case directly related to this where the US Court of Appeals for the Ninth Circuit held that such removals are only possible when the restriction has been added by a downstream and not when it was added by the original licensor. See https://blog.opensource.org/modified-agplv3-removes-freedoms-adds-legal-headaches/ and https://sfconservancy.org/blog/2022/mar/30/neo4j-v-purethink-open-source-affero-gpl/ for additional analysis of this ruling.

The semgrep-rules license is based on LGPLv2.1 which does not have the "Additional Terms" section, so even without that recent ruling there would not be a way to both comply with the license and re-license in a OSI approved way.

sbassett changed the task status from Open to In Progress.May 3 2022, 8:06 PM
sbassett triaged this task as Medium priority.

Thanks for the info on the recent case law, @bd808. I still have not heard back from WMF-Legal on these issues just yet, but I noticed something about Gitlab while playing around with their included sast scanning options, which includes semgrep. The FOSS components of Gitlab appear to be licensed under MIT/Expat, which is fine. But their usage of semgrep within their sast functionality appears to reference and use rules directly from semgrep.dev/r, which, as we know, are commons-clause licensed on top of the LGPL 2.1. So I'm wondering if Gitlab is violating their own MIT/Expat license by including an implementation of semgrep which makes use of Commons Clause-licensed semgrep rules and polices? Or would this be a grey area where, technically, the license would only be violated if one runs the semgrep tool within Gitlab's inlcuded sast CI template within an environment like wmcs that demands specific FOSS/Free Culture licenses?

Declining this for now as we never heard back from WMF-Legal. I think we can use the basic WMF principle of "FOSS until we can't" going forward, with maybe an exceptional case here or there.