Page MenuHomePhabricator

Default license for operations/puppet
Closed, DeclinedPublic

Description

operations/puppet uses a few different licenses that are explicitly stated in parts (e.g. different modules). However, that does not seem to be a default license in the root that applies unless otherwise stated.

So if code doesn't specify a license explicitly (which some code in the repo does not), no particular license clearly applies.


Version: wmf-deployment
Severity: normal
URL: https://gerrit.wikimedia.org/g/operations/puppet

Revisions and Commits

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This kind of stalled out here.

Am I right in thinking that this would only apply to code that does not have a more specific license and that where these cases exist now the current ambiguity is worse than a default CC0 at top level?

In regards to

This kind of stalled out here.

Am I right in thinking that this would only apply to code that does not have a more specific license and that where these cases exist now the current ambiguity is worse than a default CC0 at top level?

and

Yes and yes.

Is the consensus to oppose https://gerrit.wikimedia.org/r/#/c/183862/ ?

if CC0 is opposed. Does anyone oppose Apache2?

puppet itself uses Apache license, so that seems good because you can argue a reasonable assumption is that the puppet manifests will be under it as well. that made me say on https://gerrit.wikimedia.org/r/#/c/183862/2 it should be that instead of CC

https://puppetlabs.com/apache

but also:

http://www.apache.org/licenses/GPL-compatibility.html

`Apache 2 software can therefore be included in GPLv3 projects, because the GPLv3 license accepts our software into GPLv3 works. However, GPLv3 software cannot be included in Apache projects. The licenses are incompatible in one direction only, and it is a result of ASF's licensing philosophy and the GPLv3 authors' interpretation of copyright law.

This licensing incompatibility applies only when some Apache project software becomes a derivative work of some GPLv3 software, because then the Apache software would have to be distributed under GPLv3. This would be incompatible with ASF's requirement that all Apache software must be distributed under the Apache License 2.0.`

Note that there are also files in the repository that explicitly retain copyright, e.g. modules/wmflib/lib/puppet/parser/functions/ipresolve.rb:

# == Function: ipresolve( string $name_to_resolve, bool $ipv6 = false)
#
# Copyright (c) 2015 Wikimedia Foundation Inc.
#
# Performs a name resolution (for A AND AAAA records only) and returns
# an hash of arrays.

but maybe I should create a subtask for fixing the license situation for those files?

Note that there are also files in the repository that explicitly retain copyright, e.g. modules/wmflib/lib/puppet/parser/functions/ipresolve.rb:

# == Function: ipresolve( string $name_to_resolve, bool $ipv6 = false)
#
# Copyright (c) 2015 Wikimedia Foundation Inc.
#
# Performs a name resolution (for A AND AAAA records only) and returns
# an hash of arrays.

but maybe I should create a subtask for fixing the license situation for those files?

In this specific case, the code has been written by Jeff Gerard a WMF employee. I guess most of the code comes from WMF and thus applying a free license should be straightforward.

Looking at puppet I found a code I wrote under a joint copyright agreement with WMF. I have added both copyright notice and a license header (Apache 2.0 in this case). See modules/zuul/files/zuul-gearman.py.

Change 282405 had a related patch set uploaded (by BryanDavis):
Add .mailmap to cleanup duplicate authors

https://gerrit.wikimedia.org/r/282405

Change 282405 merged by Alexandros Kosiaris:
Add .mailmap to cleanup duplicate authors

https://gerrit.wikimedia.org/r/282405

Bump.

I think that we should either

  1. merge @chasemp's patch, or
  2. amend it to say Apache 2.0 and merge that

Either one would be better than ambiguity.

ambiguitytape

I have BOLDly amended @chasemp's patch to:

  • Use Apache 2.0 as the default license
  • Add a NOTICE file that includes the per-file/module disclaimer suggested by @LuisVilla in T67270#964695
  • Add a CONTRIBUTORS file generated via git log --format='%aN <%aE>' | sort -f | uniq > CONTRIBUTORS
  • Updates the README to be a bit prettier and to include a license section

Can we light this candle?

You're all very right that we should finally fix this. I like the latest patchset personally but I don't feel comfortable with me or just a tiny few of us making a license choice for a repository where hundreds, many outside of the ops team have contributed. just emailed our Legal team to ask for their opinion, pointing to this ticket and explaining that Apache2 was the license we seem to be agreeing to so far.

For other repositories on which we wanted to set/change the license, we usually have done a list of non-wmf contributors in a task detail and then reached out to them. Then eventually just moved forward with the licensing.

From a quick review of the top authors by commit counts. There are a few volunteers we might want to reach out to:

There is a very long tail of authors with 1 or 2 commits, most probably the contributions are not elligible to a copyright (eg: a typo fix).

I hereby license all my existing contributions to the operations/puppet under the Apache 2.0 license = https://gerrit.wikimedia.org/r/#/c/183862/4/LICENSE.

With the latest updates to https://gerrit.wikimedia.org/r/#/c/183862 :

$ grep -v wikimedia.org CONTRIBUTORS | wc -l
     112

A number of the 112 non-Foundation author attributions are actually current or former Foundation employees who chose to contribute using non-Foundation email accounts. Even with them excluded however, there are around 100 people who we need to make a good faith attempt to contact for license approval. It has been suggested that a reasonable way to do this would be to send an email to the authors asking for a response within 4 weeks.

At the end of the response period we would likely need to evaluate the responses and further investigate the remaining contributions at the HEAD revision of the production branch which were made by authors who did not respond or who are not willing to agree to the new licensing terms. Such contributions may be factual (e.g. a hostname, ip address, etc) and therefor questionable for licensability in the first place. They may on the other hand be something that we consider to be substantial in which case we will need to replace/rewrite the unlicensed contribution.

A number of the 112 non-Foundation author attributions are actually current or former Foundation employees who chose to contribute using non-Foundation email accounts. Even with them excluded however, there are around 100 people who we need to make a good faith attempt to contact for license approval. It has been suggested that a reasonable way to do this would be to send an email to the authors asking for a response within 4 weeks.

I think that's totally fine. We can send those emails and if people don't reply in time, we can re-evaluate whether to re-ping them (and how often) and failing that what do with their contributions.

OpenSSL also has an ongoing license migration from their old, custom license to Apache2 and they even went even a step further and assumed implicit approval for people who don't reply (which I have mixed feelings about myself, but their process was certainly vetted by lots of company lawyers, so it's probably legally acceptable).

They may on the other hand be something that we consider to be substantial in which case we will need to replace/rewrite the unlicensed contribution.

That also happened for OpenSSL, BTW: https://www.openssl.org/blog/blog/2017/06/17/code-removal/ (But in that case they switched to a license imcompatible with the LibreSSL fork, which doesn't apply for us).

I was wondering about this myself today and found this task. Without a license, it's not possible for someone to fork our work.

It's also important to figure out how copyright is claimed. Right now there are lots of @wikimedia.org copyright claims, other generic ones ("Wikimedia Foundation", "Wikimedia and contributors", etc) and claims by non-WMF volunteers. I don't remember signing a CLA but that would be interesting to simplify copyright assignment, although that would be a major undertaking and the having a license would be more urgent.

Dzahn raised the priority of this task from Low to Medium.May 6 2019, 6:17 PM

The Wikimedia Engineering Architecture Principles now became an official policy. [1]

One part of it is "software we write MUST be published under a free license" which i guess raises the priority for this ticket. Changing from Low to Normal at least.

[1] https://lists.wikimedia.org/pipermail/wikitech-l/2019-May/092049.html

I hereby license all my existing contributions to the operations/puppet under the Apache 2.0 license

I hereby license all my existing contributions to the operations/puppet under the Apache 2.0 license.


Maybe we can get the patch from 2015 merged some day. The various comments on it are still valid I guess.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/183862

Retroactively finding all contributors to the repository at once is a task which will be humongous and full of obstacles (think of contributions by volunteers who left etc.). And realistically too huge to make it enough of a priority to happen.

I think instead we should choose a more pragmatic way: Let's approach this from a per module level and indicate the license within the source files using SPDX headers. (https://en.wikipedia.org/wiki/Software_Package_Data_Exchange)
So once it has been clarified that

  • Apache 2.0 is out preferred license
  • All contributions by Wikimedia staff (let's say as indicated by commits using a wikimedia.org address) can be relicensed at large

Then on a per module basis we can inspect git logs /git blame (careful with renames/refactors) to find all non-staff contributors and ask for their approval. If that's the case we add the SPDX header to the respective file, e.g.

# SPDX-License-Identifier: Apache-2.0

And this can trickle into the repo over time, no need for a central flag day. There might be a few random modules which have unreachable authors (and which might be removed in a future cleanup or Puppet purge anyway), but at least they are properly separated and we provided clarity for the classes that have been vetted.

Yes. With this time period, there might be even volunteers who have passed away. This is going to be next to impossible.

I'm happy with starting on module by module and see what we can do about the complicated ones.

Change 768114 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] utils: create blame-stats script

https://gerrit.wikimedia.org/r/768114

See https://gerrit.wikimedia.org/r/768114 i have hacked together a quick script that uses https://github.com/mergestat/mergestat to get stats who continued code still in use by using e.g. parsing git blame. Its a bit slow but it dose have the capability to filter out wikimedia.org contributors and can work on a module by module level. Let me know what you think and feel free to directly hack on my Cr if you are intrested.

Output looks like

$ utils/blame_stats.py  -e 'wikimedia.org' acme_chief
krenair@gmail.com:
--------------------
9062d8b7ba51589412f0a98d842f3e583db102ce:
        modules/acme_chief/files/designate-sync.py: 1-11,25-49,56-61,63-77
        modules/acme_chief/files/designate-tidyup.py: 1-11,24
ad1937532215cf8363bba74531404fffa7ab7eab:
        modules/acme_chief/files/designate-sync.py: 12-24,50-55,62
        modules/acme_chief/files/designate-tidyup.py: 12-23,25-40
071bde8c969bf4c5e8d37e135464d03e105bc4f1:
        modules/acme_chief/manifests/server.pp: 262
legoktm@member.fsf.org:
--------------------
f4e8ed232d6b902049fb770989f1a10852489d13:
        modules/acme_chief/manifests/cert.pp: 1

I hereby license all my existing contributions to the operations/puppet under the Apache 2.0 license.

If we can get an explicit approval by legal to license all contributions of wikimedia.org email addresses to apache 2.0. I can start making patches here and there.

If we can get an explicit approval by legal to license all contributions of wikimedia.org email addresses to apache 2.0. I can start making patches here and there.

Ack, I'll sort this out, but haven't found the time yet.

email sent to legal.

@jbond In the meantime, maybe we can add a rule to lint -1ing any new puppet/or otherwise file that doesn't SPDX-License-Identifier?

Change 786310 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] rake_modules: add check for spdk licence header

https://gerrit.wikimedia.org/r/786310

@jbond In the meantime, maybe we can add a rule to lint -1ing any new puppet/or otherwise file that doesn't SPDX-License-Identifier?

See https://gerrit.wikimedia.org/r/c/operations/puppet/+/786310

Change 787708 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] Remove the puppet lvm module

https://gerrit.wikimedia.org/r/787708

https://gerrit.wikimedia.org/r/787708 removes the puppet lvm module which is GPL-2 and incompatible with apache 2.0. So that removes an interesting blocker in adopting a cross repo license.

Change 768114 merged by Jbond:

[operations/puppet@production] utils: create blame-stats script

https://gerrit.wikimedia.org/r/768114

We have had a response from leagle which state3s that it is fine to licence all *@wikimedia.org contributions under the Apache licence. As such i think we can start to do this with some module that we know have been developed completly internaly e.g. apereo_cas (which almost exclusively developed by myself). In order to move a module to this new licenced model i propose that we update modules to:

  • add an spdx-licence header to each file in the module
  • add the apache licence file to the route of the module
  • create and spdx file in the module directory root

Ill create a change to convert apereo_cas to demonstrated and critice the list above.

Further to this we will need some way to collect and track authorisations from third party contributors. I wonder if we could do this via a Phabricator form. or perhaps have a contributors files which also acts like a CLA, third party contributors would first need to make a commit to this file adding there name and email, before being able to make further contributions to the repo?

I hereby license all my current and future contributions to the operations/puppet under the Apache 2.0 license.

Change 786310 merged by Jbond:

[operations/puppet@production] rake_modules: add check for spdk licence header

https://gerrit.wikimedia.org/r/786310

Change 806473 had a related patch set uploaded (by Thcipriani; author: Thcipriani):

[operations/puppet@production] Docker homepage builder: relicense Apache-2.0

https://gerrit.wikimedia.org/r/806473

Can we clarify what the goal here is? More recently I've been good about throwing a GPL-3.0-or-later header on substantial scripts committed to puppet (e.g. https://codesearch.wmcloud.org/puppet/?q=%5C(C%5C)%20.*%20Kunal%20Mehta&i=nope&files=&excludeFiles=&repos=), do we actually want/need to relicense those to Apache 2.0? My understanding was that we wanted everything to just have a license, defaulting to Apache-2.0 if one wasn't already specified.

Can we clarify what the goal here is? More recently I've been good about throwing a GPL-3.0-or-later header on substantial scripts committed to puppet (e.g. https://codesearch.wmcloud.org/puppet/?q=%5C(C%5C)%20.*%20Kunal%20Mehta&i=nope&files=&excludeFiles=&repos=), do we actually want/need to relicense those to Apache 2.0?

We have a handful of scripts which are imported from external sources (e.g. some RAID Icinga scripts from the DebiansSystemadmin repo) or which have pre-existing licenses. Those are fine to simply retain their existing license.

But I think we have a few cases where e.g. GPL was mostly picked to at least have a license at all (without any specific preference since we didn't have a default license until now). In such cases it might make sense to align such files by relicensing to Apache 2, but that's totally optional. When in doubt always stick with the pre-existing license.

In such cases it might make sense to align such files by relicensing to Apache 2

starting of with the obligatory IANAL :). My understanding is that if we have something with a GPL licence then that would mean that any resulting body of work, weather that is a specific module or the entire puppet repo would need to be licenced under GPL. This is why, i personally, have been treating GPL a bit different to MIT/BSD. however of course the licence of a piece of work is completely up to the original author and didn't/don't intend for my queries to explore a licence change to be construed as pressure.

In such cases it might make sense to align such files by relicensing to Apache 2

starting of with the obligatory IANAL :). My understanding is that if we have something with a GPL licence then that would mean that any resulting body of work, weather that is a specific module or the entire puppet repo would need to be licenced under GPL. This is why, i personally, have been treating GPL a bit different to MIT/BSD. however of course the licence of a piece of work is completely up to the original author and didn't/don't intend for my queries to explore a licence change to be construed as pressure.

I think in the case of Puppet code this might very well extend to a puppet source file (to the effect that the entire catalogue might be GPLed), but the files in here are scripts which are being executed and I'm very sure that this is a boundary to which the virality of the GPL does not extend :-)

Change 183862 abandoned by Jbond:

[operations/puppet@production] Add a default Apache 2.0 license

Reason:

boldy closing this. as per the linked task we have started to use SPDX tags to add licences to puppet code

https://gerrit.wikimedia.org/r/183862

I'm resolving this task in favour of https://phabricator.wikimedia.org/T308013. We're not going to be able to assign a default license, but by means of SPDX headers we're nearing a very good approximation.