Page MenuHomePhabricator

MediaWiki ships a copy of swagger-ui with license problems
Closed, ResolvedPublic

Description

MediaWiki includes a copy of the swagger-ui NPM package in /resources/lib/swagger-ui that's affected by https://github.com/swagger-api/swagger-ui/issues/8317. The fact that it's shipping various libraries (swagger-ui-bundle.js.map seems to have a list of them) without complying the requirement to include any license/copyright notices present in most free software licenses. And as we're now re-distributing that as a part of MediaWiki, we're also seemingly in violation of those licenses.

Introduced by rMW84fe1b9ccd96: REST: Introduce discovery endpoint

Event Timeline

taavi updated the task description. (Show Details)

Tagging MW-1.43-release since this means I'll have to entirely patch out the Swagger functionality from the MediaWiki-Debian packages if this isn't fixed in the 1.43 release tarballs.

Reedy triaged this task as High priority.Dec 17 2024, 11:53 AM
Reedy added a project: Upstream.
Reedy moved this task from Backlog to Reported Upstream on the Upstream board.
Reedy updated the task description. (Show Details)
Reedy subscribed.

Noting that resources/lib/swagger-ui/LICENSE is there, but is a copy of the Apache 2.0.

There's no LICENSE.txt in the upstream repo either...

Noting that resources/lib/swagger-ui/LICENSE is there, but is a copy of the Apache 2.0.

There's no LICENSE.txt in the upstream repo either...

Yea, I'm not sure I fully understand the issue or the solution. I don't think we are expected/required to include license infromation in minified JS that we send to the client for execution. So I think the contents of the actual .js files is fine.... am I wrong about this?

On the other hand, swagger-ui-bundle presumably contains code taken from various libraries it depends on. I don't see the license information for that code anywhere. How is this usually handled for minified bundles?

We do include the license file for SwaggerUI in our distribution, but perhaps we are lacking the license info for code that SwaggerUI depends on? Is that the issue here? Where would we get that information?

Per "And as we're now re-distributing that as a part of MediaWiki, we're also seemingly in violation of those licenses" -- just want to clarify that redistribution is explicitly allowed in the swagger.io license, unless I'm missing something. The content below was pulled directly from their license page:

Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
a) You must give any other recipients of the Work or Derivative Works a copy of this License; and

(b) You must cause any modified files to carry prominent notices stating that You changed the files; and

(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and

(d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.

You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.

It also explicitly refers to using Apache 2.0 License, so if that's what's included in the non-minified versions, I would assume that's intentional and fine?

are free to use and licensed under the Apache 2.0 License.

That being said, we are somewhat outside of my wheelhouse. In addition to getting more context about how we handle this scenario in other parts of the code, I'm curious if there are there folks in legal that should be weighing in for compliance?

Per "And as we're now re-distributing that as a part of MediaWiki, we're also seemingly in violation of those licenses" -- just want to clarify that redistribution is explicitly allowed in the swagger.io license, unless I'm missing something.

The problem is not with the Swagger code itself - it's Swagger's dependencies included in the bundle file that Swagger is distributing (and that MW is now re-distributing). For example, the bundle file includes the Highlight.js library (at around character 84,000) and the license of it requires that "Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer". But the highlight.js copyright notice is nowhere to be seen in the Swagger bundle and subsequently the code that MediaWiki releases are now distributing.

Can we just include a file that has the URLs of all the licenses? Swagger depends on some 35 packages, it seems: https://github.com/swagger-api/swagger-ui/blob/master/package-lock.json#L12C1-L47C30

I would argue that the minified javascript is not the "source form" certainly not the "preferred form for modification" and thus doesn't trigger the license requirements which are conditional on distributing source files.

We may have an obligation to make the *source* files available upon request, but I don't think there are any restrictions in the Apache license for the form of the 'object' aka non-source forms of the library.

(GPL is a little weird, in that it does have the "[if the program] normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty" clause, which implicitly requires a copy of that copyright notice to be embedded in some way in the executable -- but Apache does not have that requirement.)

tl;dr distribution of the 'object form' is distinct from distribution of the 'source code', and the minimized version of the source is not the "preferred form for modification" and thus not the "source form" and thus does not trigger the license message inclusion requirements under Apache.

From https://www.apache.org/licenses/LICENSE-2.0 :

"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
...

  1. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
  1. You must give any other recipients of the Work or Derivative Works a copy of this License; and
  2. You must cause any modified files to carry prominent notices stating that You changed the files; and
  3. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
  4. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.

It's pretty clear from the definitions that the minimized distribution bundle is an "object" form of swagger.

Condition 1 is satisfied if we include a copy of the apache license in our release tarball (which we apparently do, in resources/lib/swagger-ui/LICENSE).

Condition 2 is satisfied because we do not make any modifications to the files in swagger's "object form", as far as I am aware.

Condition 3 is satisfied because we are not making any changes to the *source form* of swagger (and possibly not even distributing the source form, which is permitted by apache).

Condition 4 is satisfied if we include a copy of the upstream NOTICE file in our release tarball (is there an upstream NOTICE file?)

From https://github.com/swagger-api/swagger-ui/ I ran npm run deps-license:

> swagger-ui@5.18.2 deps-license
> license-checker --production --csv --out $npm_package_config_deps_check_dir/licenses.csv && license-checker --development --csv --out $npm_package_config_deps_check_dir/licenses-dev.csv

Which gives F58028339 and the breakdown:

$ cut -d,  -f2 .deps_check/licenses.csv |sort|uniq -c|sort -rg
    125 "MIT"
     37 "Apache-2.0"
      7 "ISC"
      5 "BSD-3-Clause"
      1 "Unlicense"
      1 "Python-2.0"
      1 "(MPL-2.0 OR Apache-2.0)"
      1 "(MIT OR WTFPL)"
      1 "(MIT OR CC0-1.0)"
      1 "(MIT AND BSD-3-Clause)"
      1 "license"
      1 "CC0-1.0"
      1 "(BSD-2-Clause OR MIT OR Apache-2.0)"
      1 "BSD-2-Clause"
      1 "0BSD"

That came from the very first commit of the repository. The directory was referred to by a webpack.check.js which was removed when they upgraded to Webpack v4. My guess is this task needs someone familiar with webpack to have it generate the list of licenses included in the bundle.

Upstream has

dist/swagger-ui-bundle.js
/*! For license information please see swagger-ui-bundle.js.LICENSE.txt */

That swagger-ui-bundle.js.LICENSE.txt does not exist in the upstream repository. The comment was added with the release of swagger-ui v3.34.0 which only has a few changes.

webpack 4.44.14.44.2 but more importantly terser-webpack-plugin 1.4.54.2.0 which is from where the message comes from.

The plugin is https://github.com/webpack-contrib/terser-webpack-plugin/ I had to lookup terser in wiktionary, that means more clean and by extension briefer or more concise. https://github.com/webpack/webpack/commit/71933e979e51c533b432658d5e37917f9e71595a has the rationale: licenses texts take a lot of place and thus any comment having @license is in its own file rather than in the bundle.

The extractcomments doc has:

The terserOptions.format.comments option specifies whether the comment will be preserved

webpack/_config-builder.js has:

output: {
  comments: false,
},

The terser options has the description:

comments
(default "some") -- by default it keeps JSDoc-style comments that contain "@license", "@copyright", "@preserve" or start with !.
pass true or "all" to preserve all comments,
false to omit comments in the output,
a regular expression string (e.g. /^!/) or a function.

So maybe if that is false nothing is kept / copied to that LICENSE file.

That is the end of my adventure with webpack.

As the WMF-Legal project tag was added to this task, some general information to avoid wrong expectations:
Please note that public tasks in Wikimedia Phabricator are in general not a place where to expect feedback from the Legal Team of the Wikimedia Foundation due to the scope of the team and/or nature of legal topics. See the project tag description.
Please see https://meta.wikimedia.org/wiki/Legal for when and how to contact the Legal Team. Thanks!

Just noting that I'm figuring out who specifically in Legal to reach out to. Will [hopefully] report back with more support soon!

Request for legal review is officially submitted!

There are both security and legal issues here.

Do we actually know exactly what is being shipped and bundled inside swagger-ui, so we can accurately monitor it for security issues? Earlier taavi flagged that it's bundling at least highlight.js, my guess is it's version 10.7.3 (based on https://github.com/swagger-api/swagger-ui/blob/5bf8e57e1be9a6992888b3db4c8fa27a44ea4e4d/package-lock.json#L13157), which per https://github.com/highlightjs/highlight.js/security is marked as "no longer supported". I see that this was looked at by the security team in T325558: Application Security Review Request: Swagger UI, but it's not clear to me if it was understood that all the dependencies were being bundled, and therefore also should've been examined.

On the legal front, MediaWiki is distributed, via the tarball, as a GPL project. This includes bundled dependencies. Moreover we want to support distributions like Debian (and Fedora, etc.) in redistributing MediaWiki, which do more stringent license checks. Binary or source format doesn't really matter, as https://github.com/highlightjs/highlight.js/blob/main/LICENSE (BSD-3-clause) says in pretty clear language, "Redistributions of source code must retain the above copyright notice..." and "Redistributions in binary form must reproduce the above copyright notice...". except we're not.

From a license compliance point of view, this isn't a complex, nuanced, or even interesting issue. It's just a shocking lack of willingness to comply with the terms of the free licenses of software we use in our free software.

We wouldn't tolerate anyone shipping MediaWiki like this.

We shouldn't tolerate MediaWiki doing it either.

As I wrote above, the minimized JS is not the "source form" by the definition used by the GPL, and so arguably doesn't trigger the requirements conditioned on distribution of source.

We also seem to be shifting goalposts here. Is the problem swagger, or is it highlight.js? BSD-with-advertising licenses are notoriously problematic, but that the requirement is that we publish the copyright *in the documentation* not necessarily in the bytes we ship to browsers.

If the problem is advertising clause licenses, we could probably whip up a patch to add them to a readme somewhere without much of a problem.

Change #1133746 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] SwaggerUI: Include licenses of packages used by Swagger

https://gerrit.wikimedia.org/r/1133746

As I wrote above, the minimized JS is not the "source form" by the definition used by the GPL, and so arguably doesn't trigger the requirements conditioned on distribution of source.

The BSD license says:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

That sounds to me like we do need to include at least the BSD licenses with our distribution the minified ("binary") form of the libraries.

Thanks for weighing in and helping us hold a high bar for open source standards. Based on the guidance that we received from the WMF Legal team, we are not currently in violation of the redistribution requirements. However, WMF Legal also agrees that including the missing licenses in our distribution, per Daniel's comment and related patch, enforces the spirit of being a good open source citizen and makes our license compliance clearer. For this release, we will include the missing licenses within the docs directory and will update the README for context about their inclusion. Once that change is merged, we will consider this ticket resolved.

Please note that this should also be a relatively short lived issue. We have work planned to rebuild the sandbox using Codex (https://phabricator.wikimedia.org/T388910), which will allow us to remove SwaggerUI as a MediaWiki dependency. Although we do not yet have an exact delivery date for the updated sandbox experience, we hope to see initial development start at the Wikimedia Hackathon in May. We will then prioritize putting on the finishing touches to productionalize it following the Hackathon.

Change #1133746 merged by jenkins-bot:

[mediawiki/core@master] swagger-ui: Add licenses of packages used by Swagger UI bundle

https://gerrit.wikimedia.org/r/1133746

Change #1136445 had a related patch set uploaded (by Jforrester; author: Daniel Kinzler):

[mediawiki/core@REL1_43] swagger-ui: Add licenses of packages used by Swagger UI bundle

https://gerrit.wikimedia.org/r/1136445

Change #1136445 merged by jenkins-bot:

[mediawiki/core@REL1_43] swagger-ui: Add licenses of packages used by Swagger UI bundle

https://gerrit.wikimedia.org/r/1136445

HCoplin-WMF added a subscriber: MSantos.

Thanks for the merge, @Reedy ! Moving this to 'Ready for demo' on our board, then we will close it out officially at our next sprint planning :)

@MSantos consider this resolved for MW1.44; we will also need a back port for MW 1.43.