Page MenuHomePhabricator

ResourceLoader: Convey license information in HTTP requests serving minified builds of javascript files
Closed, DeclinedPublic

Description

Author: kete

Description:
Nontrivial JavaScript should have a free license, so we can know what our browsers are doing. With libre licenses, we can study the JavaScript source code. Your wiki code could use a simple license declaration in the header as described at http://www.gnu.org/philosophy/javascript-trap.html#AppendixA

Regards


Version: unspecified
Severity: enhancement

Details

Reference
bz36866

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:25 AM
bzimport set Reference to bz36866.
bzimport added a subscriber: Unknown Object (MLST).

All of the content served from the WMF cluster is under the Creative Commons Attribution-ShareAlike License. I'll ask Legal if you've raised any new concerns here.

(In reply to comment #0)

Your wiki code could use a simple license declaration in the header as
described at http://www.gnu.org/philosophy/javascript-trap.html#AppendixA

I think you mean the labeling described here: https://www.gnu.org/licenses/javascript-labels.html. Since there are multiple licenses covering different JS files on any given page, I'm not sure what is needed.

But, just to be clear, this bug is requesting we follow this suggestion from javascript-labels.html:

On each page that uses JavaScript, include a link that points to the labels
page described above. Mark this link with the attribute rel="jslicense", so
that automated tools can find it. For example, your final link might look
like this:

 <a href="/about/javascript" rel="jslicense">JavaScript license
 information</a>

This link can be small, but it should be clearly visible to people who
visit your site.

In any case the JS code that is distributed with MediaWiki and served from there is GPL. For example, the jQuery library (included with MediaWiki) is covered by the GPL -- http://jquery.org/license/.

I believe files like Common.js -- http://en.wikipedia.org/wiki/MediaWiki:Common.js -- are under the CC-BY-SA/GFDL like the rest of the content on wikipedia since that is the license text displayed when editing these files on-wiki.

  • This bug has been confirmed by popular vote. ***

Greetings! And thanks to Kete for filing this bug and to Mark for replying and enagaging with it.

This bug is both real, a problem in terms of Wikimedia's stated commitment to free software, and should be very easy to fix. Every user of Wikipedia is being served software, which they are running on their computers, with no license information explaining to them that the software is free.

The arrangement with Common.js and others (see http://en.wikipedia.org/wiki/MediaWiki:Common.js and similar) seems like a bug. It was my understanding that all of the JavaScript should be distributed under the free software license that MediaWiki is. If you download MediaWiki as a tarball, this seems pretty clear. It -- or whatever the situation is -- should be clear to users who download small parts of MediaWiki that happen to be written in JavaScript as part of browsing Wikipedia.

I think the javascript-labels approach would be a simple method to deal with this and to make the situation clear. If that won't work, or if people want help from the licensing compliance expert at the FSF, I'm happy to make the introduction and to invite him to participate in this bug.

I'm looking forward to getting this resolved! Let me know how I can help.

The bug is filed against MediaWiki, but Mark's response was about Wikipedia. Mako's comment suggests that the concern is actually "pieces of MediaWiki" -- perhaps static files distributed with MediaWiki that are served in minified form via load.php?

MediaWiki:Common.js is not part of MediaWiki. Distributing it under the GPL would require the consent of its authors. It was contributed to Wikipedia under the usual click-through licensing.

Either way, the minifier will strip ordinary file headers. A string header could be added which wouldn't be stripped, that would be much simpler than implementing "JavaScript License Web Labels".

Thanks for the quick followup Tim!

If I'm following this right, you're suggesting we should create a string headers that make it clear that the JavaScript license is: (1) GPL (or whatever else) for the static files distributed with MediaWiki and (2) BY-SA/GFDL for the content contributed to MediaWiki is the correct way to go.

The license for wiki-contributed content (which is only served through that particular wiki) is not always CC-BY-SA/GFDL. It is whatever is configured.

For Wikimedia wikis this is mostly CC-BY-SA 3.0/GFDL. Though Wikinews has CC-BY 2.5.

And other wikis (since this is a MediaWiki software feature request) outside the Wikimedia Foundation may have even different licenses.

Just making sure we don't hardcode it.

The license string afaik only has to say what this file is licensed as. I don't see any point in making a list of possibilities (e.g. the 1), 2) thing, not sure if you meant that literally). Even more so, doing that isn't even practically possible because we also ship third-party files that have built-in license headers already.

So all we need to do is:

  • Make sure our .js files in the MediaWiki repository have license headers (which we already do for PHP files).

Do we have to include those headers in the package responses to the browser as well? Seems a bit weird to me. Since those requests are not deliberate distributions of those applications (in that the user doesn't "see" the code). I mean, we don't send the PHP file headers to the browser either, that code is compiled and the user sees what it outputs. Same for images used in the interface as part of the software. They have license information in the repository when someone downloads the software to use it or modify it.

But when visiting a site that uses the software, users are not shown the licenses of PHP, database, image, audio, video files. So why would it be for javascript files?

Meaning, we can put them in source code (we should definitely do that), but it is okay to strip them as part of the general minification when serving it as a package.

If we do need to include them, we would:

  • Figure out a way to simplify them (e.g. compare http://code.jquery.com/jquery.js to http://code.jquery.com/jquery.min.js)
  • How to deal with separators? Because these javascript application requests are packages of sometimes 100s of different javascript files
  • And at it, they don't just contain javascript files. Also css files and base64 embedded images. The CSS files would also need license headers, and then the base64 embedded images need headers somehow?

I'm not objecting it, if it is required it is required. But I don't think anyone else in the world has even considered doing something crazy like that. So if at all possible, I really think we can keep them out of the production browser service.

My $0.02:

It seems like this could be reasonably implemented in ResourceLoader--add in a tag for modules like "@license GPLv2" or similar, and when RL detects such a tag, it inserts the proper licensing info into the javascript....then again, maybe we need to require this for each file, else the debug mode would get confusing.

As for compiled (minified) files, I think RL might be able to either A) intelligently group modules into like-licensed chunks (so we distribute the GPLv2 files in one go, then the MIT files, then the GPLv3 files, and so on) or B) Tag each chunk with the proper licensing and be done with it. Adding in a few bytes for newlines and a few hundred bytes per license doesn't seem unreasonable to me.

We may also want to offer a preference and/or config option to turn this on and off. I'm sure server admins might want the option to say no, and I'm sure users might want the same option.

That said, thanks much to Mako for bringing this up, and I hope we can do something about it soon!

Greetings Krinkle! I have a few responses.

I think it is important to convey the license of the Javascript to visitors of the site. The reason this does not apply to the PHP software is that the PHP is not actually distributed to visiters. Distribution is both a legally meaningful step in terms of copyright, very relevant in terms of the GPL, and also a substantively different thing than just letting someone use your software. When the PHP code is distributed (i.e., when someone downloads the code), we also convey the license.

This can be done very concisely, as Tim suggests, with a string at the top of the file that says "Copyright (c) WHOEVER 2012 - Distributed under the GNU General Public License Version 2 or any later version" or something like that. The benefit of the string-approach is that it won't be stripped out with the minifier like a comment would be.

We can follow the same approach for the stuff that is contributed to and served from wikis except, as other have pointed out, the license will have to be whatever the license of the wikis is since this is apparently the license that they were contributed to us under. We can start doing this part on wikis right away.

I just looked through the Javascript in the MW git repository and, basically except for JQuery and test stuff, none has any license information even in a comment header. About half of it does not even have a copyright statement and some like it might have been taken/borrowed from elsewhere but has no indication of the license.

Instead of a string, we can use a bang comment.

It appears other libraries are following this convention as well, for "required" comments.

/*! jQuery v@1.8.0 jquery.com | jquery.org/license */

(In reply to comment #9)

This can be done very concisely, as Tim suggests, with a string at the top of
the file that says "Copyright (c) WHOEVER 2012 - Distributed under the GNU
General Public License Version 2 or any later version" or something like that.

Since a full list of authors is going to be hard to keep track of (major authors are sometimes listed on top of the file). Whenever a file is part of MediaWiki core (e.g. not a third party file with its own license header) we can refer to mediawiki.org/wiki/License for more information on the current license and list of authors. ("COPYING" and "CREDITS" in the repo are linked from there).

/*! Distributed as part of MediaWiki | mediawiki.org/wiki/License */

As far as I know a link is enough, it doesn't have to contain the license title (let alone the full GPL license text) directly in the header. We do the same for image embedding.

It'd be good if we could find a reasonably pragmatic solution to clarify the licensing status of files, including after RL minification. Insert a bang comment based on the site license for files served from the wiki, and ensure we consistently add bang comments to files that are served from MediaWiki itself, and that those comments survive RL minification?

Also, on wikitech-l, GNU LibreJS and "Seemingly proprietary Javascript" http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/67972
(Adding link not for drama but for keyword discoverability of this report.)

I'm going to raise this issue again after a talk by Luis Villa and others at Wikimania about problems with missing license information on MediaWiki code. I'm also changing the title to make the nature of this bug more clear and to I'm adding Luis to the CC list.

There are least two workable solutions described here since 2012:

  1. Create strings with license information at the head of JavaScript files that are included in Mediawiki. This can be done immediately and will solve the issue.
  1. Add bang comments as Erik described in late 2012 and which seems to be a standard used "important" comments used in other JavaScript programs. This will involve modifying the minimizer ot make sure that these important comments are preserved.

The benefit of 2 is that we can use that method to also add support for licensing for "content" Javascript under the wiki's license (e.g., MediaWiki:Common.j on Wikipedia). But really, either of these would be fine. A LibreJS solution would be great but we don't need to let LibreJS complications hold up an otherwise easy-to-fix bug.

Change 153573 had a related patch set uploaded by Tim Starling:
In minifier, preserve comments that start with /*!

https://gerrit.wikimedia.org/r/153573

Fantastic! Thanks Tim!

Looks like support for (2) in Comment 14 is now handled. As Tim says in his commit log, all we're missing is adding comment information inside "bang comments" to the Javascript files that ship with Mediawiki. I think the only reasonable thing to assume is that, unless they are marked oherwise, any Javascript in Mediawiki should be under the same license at Mediawiki itself.

Once this has gone through, I can try to ensure that MediaWiki:Common.js is appropriated labeled.

Change 153573 abandoned by Tim Starling:
In minifier, preserve comments that start with /*!

Reason:
Per Krinkle

https://gerrit.wikimedia.org/r/153573

Hi, I'm curious, what is the state of things regarding this bug? It's been three years since it was reported.

All known discussion and progress is in (or linked in) the comments of this task. :)
Age of a task is rather irrelevant, but if you have specific questions please ask them here!

I agree with @MarkAHershberger 's comment earlier. The "license labels" approach is much more realistic. It's also a widely adopted principle in many other contexts as well. I'll name some examples below.

We should not pursue the endless rabbit hole of adding comments everywhere, or refactoring our minification pipeline to group similarly licensed chunks together.

Examples of the separate label pointers:

  • Binary files, images, SVGs, HTML files etc. typically don't have a license included in the file, but rather are distributed with a separate license file that is "discoverable through an appropiate interface".
  • For a public server directory (or zip file) this is typically a file in the same directory, or index page footer, that one would naturally see.
  • jQuery git repositories have adopted the pragmatic principle of not putting license headers on every single file. A LICENSE file in the repository root suffices.
  • Chromium ships lots of compiled and uncompiled scripts and freely licensed interface components. Licenses are exposed via a special panel in the application settings dialog.

Worrying about people copying individual files without looking is imho an inherently invalid use case. More likely they'll copy just a few functions without looking. Should we start prefixing every single line of our program with a license indication just to be sure?

/*! @license .. */ class Everything {
/*! @license .. */   public function random() {
/*! @license .. */     return 42;
/*! @license .. */   }
/*! @license .. */ }

Back to the label pointers. I think the closest thing to a page with license labels is our very own Special:Version. We can build on that to fill any gaps.

For people coming to this issue from google, etc. In the meantime until this is resolved, fsf has patched their instances to mostly satisfy librejs through a bit of a hack by inserting license info into every page, their instances are around 1.25 and 1.23. I made my own patch to do the same on mediawiki 1.28, at https://iankelling.org/git/?p=mediawiki-librejs-patch;a=tree

Ugh, that patch is a copyright violation itself: it assigns a GPL header (and a wrong version of GPL, not used by MW) to code that may come under a different license (for example, we use jQuery which comes under MIT), and assigns it to an nonexistent "Mediawiki Foundation", ignoring real authors' copyrights.

@MaxSem, Other than the attribution issue, I doubt it.
It's minified js (ie. binary code) that had no license notice at all, but it's all from
gplv3 compatible source code. I'll fix the attribution problem.

Minified JS is not binary code.

@MaxSem, I should have said, it's a non-source form, aka object code, as gpl v3 defines it. often referred to as a binary.

Also, fsf used a patch like it, but not anymore.

@MaxSem there's also the issue that no url is provided for the source of the non-source js. Anyways, I'll also add to the readme mentioning the problems. At this point, it's incomplete and I don't recommend using it.

The GPL defines source as the "preferred form for modification" and
minified-JS is clearly not that.

We should provide a licensing information and offer/pointer to source
when we distribute the software.