Page MenuHomePhabricator

Default license for operations/puppet
Open, MediumPublic

Description

operations/puppet uses a few different licenses that are explicitly stated in parts (e.g. different modules). However, that does not seem to be a default license in the root that applies unless otherwise stated.

So if code doesn't specify a license explicitly (which some code in the repo does not), no particular license clearly applies.


Version: wmf-deployment
Severity: normal
URL: https://phabricator.wikimedia.org/diffusion/OPUP/

Revisions and Commits

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:25 AM
bzimport set Reference to bz65270.
bzimport added a subscriber: Unknown Object (MLST).

What are the existing licenses? And do we have them because people are copying/pasting from other people's puppet confs or just because of random individual choice by our developers?

(Probably incomplete) list:

Apache 2.0
GPL (at least 2+ and 3+)
BSD
MIT

I think the main reason for the inconsistency is importing existing open source code. However, I didn't really look into whether the in-house modules are consistently licensed.

It would be good to ask some of the ops people about this. I'll link this on the ops list.

The puppet repository contains many small scripts imported directly from upstream repositories. I imagine not all of them reference where they came from properly. I also wouldn't be surprised if a few other them have been copied to the repository and modified beyond recognition. And still I wouldn't be surprised if some came upstreams which never bothered to set a license.

(In reply to Matthew Flaschen from comment #2)

It would be good to ask some of the ops people about this. I'll link this
on the ops list.

That thread on ops@ ("License for operations/puppet") didn't trigger any responses so far.

(In reply to Andre Klapper from comment #4)

That thread on ops@ ("License for operations/puppet") didn't trigger any
responses so far.

Yes, the ops and the legal teams will probably have to take the lead on followup. Feel free to ping whoever you think is appropriate.

Aklapper set Security to None.

@mark, do you know what license we should put in a LICENSE file in the puppet repo?

For simple puppet confs, I'd recommend CCO (mostly they are so factual it is unlikely there is much that is copyrightable in them anyway).

However, since there is other stuff in the repo that is copy-pasted from elsewhere, LICENSE should probably say something like:

This repository incorporates work under a variety of licenses. Users of particular files or modules in the repository should look at those individual files or modules to verify their licenses.

Except where one of the above licenses applies, any part of the repository created by the Wikimedia Foundation, its employees, or contributors to the Wikimedia Foundation is made available under the Creative Commons Zero license, whose terms are reproduced below.

[CC0]

In a better world, we'd be able to say specifically "should look at the LICENSE file in each module" but I suspect doing the software archaeology to unearth each of those would be painful. :/ Putting this down in my "examples" file...

hashar added a subscriber: hashar.Jan 9 2015, 4:10 PM

Some of my puppet stuff are borrowed from OpenStack with an Apache License iirc.

I don't think I am willing to use CC0.

faidon added a subscriber: faidon.Jan 9 2015, 4:16 PM

@LuisV_WMF, I think puppet manifests are very much "code", especially some of our quite complicated manifests. Think of it as a large collection of bash scripts, just implemented in a different (imperative) domain-specific language. There are statements, functions, classes, inheritance, include statements etc. (it's a mess of a languge too). Besides that, we have tons of scripts in at least Bash, Python, Perl, Ruby that we've written within that repository.

FWIW, the way that puppet works (in our environment): each server runs a "puppet agent", which transmits a set of "facts" to a central host (called a "puppet master") to be used as input. The puppet master takes this input, combines it with some other input that is configured on the server, parses the so-called "manifests" (= what we have in our repository) and compiles a large document called a catalog, in a different format. The catalog is then transmitted into the agent, which applies it. This is a very simplified description, there's multiple other things at play here (such as Ruby code that gets executed on the server, Ruby code executed on the agents, a file server etc.) that I won't bore you with.

With all that in mind, do you still think CC0 is a good choice for this? I certainly don't, but you're the lawyer :) Happy to give more details in case you need them.

I'd say that in the puppet community, Apache2 is probably the most common license for puppet modules. It's also upstream's default, for the original puppet code plus configuration modules they're authored themselves (such as stdlib). Personally, I'd prefer something more copyleft, but OTOH puppet code usually on most installation runs internally, so it'd be of little benefit. AGPL wouldn't make sense either, as the "service" (= the puppet master) is also almost always internal. The only installations that would /not/ be internal would be ones based on openness such as other FL/OSS projects, which would probably respect the license anyway.

I realize a work's size doesn't say much about its copyrightability but just to show what's at stake here, lines of code (including whitespace and also including third-party code) in the puppet repository: 62524 puppet, 11274 Ruby, 17926 Python, 3788 Perl, 2391 shell. This is only by extension, so e.g. a Perl script named check_ssl would not be included in the figure above.

LuisV_WMF added a comment.EditedJan 12 2015, 7:27 PM

I don’t have a lot of time to respond in detail today, but I will try to do so soon. In the meantime, I’m confused - why is Apache acceptable where CC0 is not, @hashar? The attribution requirement, or…? I feel like I must be missing something.

I don’t have a lot of time to respond in detail today, but I will try to do so soon. In the meantime, I’m confused - why is Apache acceptable where CC0 is not, @hashar? The attribution requirement, or…? I feel like I must be missing something.

My main issue with CC0 is one can drop the copyright / author right, which mean my contribution might not be recognized. Moreover, in France waiving moral rights (specially attribution) is impossible. it works well for common laws country, but probably not for civil law countries. That imho renders the license moot, and anyway I would like to retain attribution.

Apache 2 seems to be used by puppet and the puppet modules, it retains the copyright so that seems fine to me.

But then I am not a lawyer. Any reason for pushing for CC0 ?

LuisV_WMF added a comment.EditedJan 12 2015, 10:44 PM

The basic theory for CC0 is:

  1. The patent grant in Apache is not useful for this sort of larger-than-I-realized, but still pretty minimal, code - we're not going to be getting code from third parties that contains patented material, so making them pledge to license these patents to us does not buy us very much.
  2. The attribution is not very valuable, because it isn't seen very much/by very many people. You'll get as much (if not more!) attribution from a proper CONTRIBUTORS file at the top-level.
  3. In practice, this code gets copied around and mangled very badly even by very good-faith people (like us!) so putting the fewest possible restrictions on its use is helpful to others without being very damaging to us (because of #1 and #2).
chasemp added a comment.EditedFeb 2 2015, 6:13 PM

This kind of stalled out here.

Am I right in thinking that this would only apply to code that does not have a more specific license and that where these cases exist now the current ambiguity is worse than a default CC0 at top level?

In regards to

This kind of stalled out here.

Am I right in thinking that this would only apply to code that does not have a more specific license and that where these cases exist now the current ambiguity is worse than a default CC0 at top level?

and

Yes and yes.

Is the consensus to oppose https://gerrit.wikimedia.org/r/#/c/183862/ ?

if CC0 is opposed. Does anyone oppose Apache2?

Does anyone oppose Apache2 as a default?

Apache 2 is good to me.

Dzahn added a subscriber: Dzahn.EditedApr 28 2015, 9:50 PM

puppet itself uses Apache license, so that seems good because you can argue a reasonable assumption is that the puppet manifests will be under it as well. that made me say on https://gerrit.wikimedia.org/r/#/c/183862/2 it should be that instead of CC

https://puppetlabs.com/apache

but also:

http://www.apache.org/licenses/GPL-compatibility.html

`Apache 2 software can therefore be included in GPLv3 projects, because the GPLv3 license accepts our software into GPLv3 works. However, GPLv3 software cannot be included in Apache projects. The licenses are incompatible in one direction only, and it is a result of ASF's licensing philosophy and the GPLv3 authors' interpretation of copyright law.

This licensing incompatibility applies only when some Apache project software becomes a derivative work of some GPLv3 software, because then the Apache software would have to be distributed under GPLv3. This would be incompatible with ASF's requirement that all Apache software must be distributed under the Apache License 2.0.`

Note that there are also files in the repository that explicitly retain copyright, e.g. modules/wmflib/lib/puppet/parser/functions/ipresolve.rb:

# == Function: ipresolve( string $name_to_resolve, bool $ipv6 = false)
#
# Copyright (c) 2015 Wikimedia Foundation Inc.
#
# Performs a name resolution (for A AND AAAA records only) and returns
# an hash of arrays.

but maybe I should create a subtask for fixing the license situation for those files?

Restricted Application added a subscriber: Matanya. · View Herald TranscriptJun 17 2015, 7:19 AM
hashar added a comment.EditedJun 17 2015, 8:45 AM

Note that there are also files in the repository that explicitly retain copyright, e.g. modules/wmflib/lib/puppet/parser/functions/ipresolve.rb:

# == Function: ipresolve( string $name_to_resolve, bool $ipv6 = false)
#
# Copyright (c) 2015 Wikimedia Foundation Inc.
#
# Performs a name resolution (for A AND AAAA records only) and returns
# an hash of arrays.

but maybe I should create a subtask for fixing the license situation for those files?

In this specific case, the code has been written by Jeff Gerard a WMF employee. I guess most of the code comes from WMF and thus applying a free license should be straightforward.

Looking at puppet I found a code I wrote under a joint copyright agreement with WMF. I have added both copyright notice and a license header (Apache 2.0 in this case). See modules/zuul/files/zuul-gearman.py.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 14 2015, 8:24 PM
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptApr 14 2016, 1:28 AM
ZhouZ moved this task from Backlog to Legal Done on the WMF-Legal board.Apr 14 2016, 1:28 AM

Change 282405 had a related patch set uploaded (by BryanDavis):
Add .mailmap to cleanup duplicate authors

https://gerrit.wikimedia.org/r/282405

Change 282405 merged by Alexandros Kosiaris:
Add .mailmap to cleanup duplicate authors

https://gerrit.wikimedia.org/r/282405

Bump.

I think that we should either

  1. merge @chasemp's patch, or
  2. amend it to say Apache 2.0 and merge that

Either one would be better than ambiguity.

ambiguitytape

Paladox updated the task description. (Show Details)Feb 15 2017, 10:23 PM
Paladox added a subscriber: Paladox.

I have BOLDly amended @chasemp's patch to:

  • Use Apache 2.0 as the default license
  • Add a NOTICE file that includes the per-file/module disclaimer suggested by @LuisVilla in T67270#964695
  • Add a CONTRIBUTORS file generated via git log --format='%aN <%aE>' | sort -f | uniq > CONTRIBUTORS
  • Updates the README to be a bit prettier and to include a license section

Can we light this candle?

You're all very right that we should finally fix this. I like the latest patchset personally but I don't feel comfortable with me or just a tiny few of us making a license choice for a repository where hundreds, many outside of the ops team have contributed. just emailed our Legal team to ask for their opinion, pointing to this ticket and explaining that Apache2 was the license we seem to be agreeing to so far.

hashar added a subscriber: scfc.Mar 22 2017, 9:57 PM

For other repositories on which we wanted to set/change the license, we usually have done a list of non-wmf contributors in a task detail and then reached out to them. Then eventually just moved forward with the licensing.

From a quick review of the top authors by commit counts. There are a few volunteers we might want to reach out to:

There is a very long tail of authors with 1 or 2 commits, most probably the contributions are not elligible to a copyright (eg: a typo fix).

I hereby license all my existing contributions to the operations/puppet under the Apache 2.0 license = https://gerrit.wikimedia.org/r/#/c/183862/4/LICENSE.

hashar removed a subscriber: hashar.Jun 6 2017, 2:19 PM
bd808 added a comment.Sep 19 2017, 4:18 AM

With the latest updates to https://gerrit.wikimedia.org/r/#/c/183862 :

$ grep -v wikimedia.org CONTRIBUTORS | wc -l
     112

A number of the 112 non-Foundation author attributions are actually current or former Foundation employees who chose to contribute using non-Foundation email accounts. Even with them excluded however, there are around 100 people who we need to make a good faith attempt to contact for license approval. It has been suggested that a reasonable way to do this would be to send an email to the authors asking for a response within 4 weeks.

At the end of the response period we would likely need to evaluate the responses and further investigate the remaining contributions at the HEAD revision of the production branch which were made by authors who did not respond or who are not willing to agree to the new licensing terms. Such contributions may be factual (e.g. a hostname, ip address, etc) and therefor questionable for licensability in the first place. They may on the other hand be something that we consider to be substantial in which case we will need to replace/rewrite the unlicensed contribution.

A number of the 112 non-Foundation author attributions are actually current or former Foundation employees who chose to contribute using non-Foundation email accounts. Even with them excluded however, there are around 100 people who we need to make a good faith attempt to contact for license approval. It has been suggested that a reasonable way to do this would be to send an email to the authors asking for a response within 4 weeks.

I think that's totally fine. We can send those emails and if people don't reply in time, we can re-evaluate whether to re-ping them (and how often) and failing that what do with their contributions.

OpenSSL also has an ongoing license migration from their old, custom license to Apache2 and they even went even a step further and assumed implicit approval for people who don't reply (which I have mixed feelings about myself, but their process was certainly vetted by lots of company lawyers, so it's probably legally acceptable).

They may on the other hand be something that we consider to be substantial in which case we will need to replace/rewrite the unlicensed contribution.

That also happened for OpenSSL, BTW: https://www.openssl.org/blog/blog/2017/06/17/code-removal/ (But in that case they switched to a license imcompatible with the LibreSSL fork, which doesn't apply for us).

I was wondering about this myself today and found this task. Without a license, it's not possible for someone to fork our work.

It's also important to figure out how copyright is claimed. Right now there are lots of @wikimedia.org copyright claims, other generic ones ("Wikimedia Foundation", "Wikimedia and contributors", etc) and claims by non-WMF volunteers. I don't remember signing a CLA but that would be interesting to simplify copyright assignment, although that would be a major undertaking and the having a license would be more urgent.

GTirloni removed a subscriber: GTirloni.Dec 20 2018, 6:42 PM
CDanis added a subscriber: CDanis.Feb 5 2019, 3:39 PM
jbond added a subscriber: jbond.Feb 5 2019, 3:41 PM
Dzahn raised the priority of this task from Low to Medium.May 6 2019, 6:17 PM

The Wikimedia Engineering Architecture Principles now became an official policy. [1]

One part of it is "software we write MUST be published under a free license" which i guess raises the priority for this ticket. Changing from Low to Normal at least.

[1] https://lists.wikimedia.org/pipermail/wikitech-l/2019-May/092049.html

TK-999 added a subscriber: TK-999.Jun 5 2020, 12:30 AM