Page MenuHomePhabricator

Support appropriate documentation of CC BY SA data on Commons
Closed, ResolvedPublic

Assigned To
Authored By
Doc_James
Aug 2 2018, 4:39 AM
Referenced Files
None
Tokens
"Like" token, awarded by Akuckartz."Party Time" token, awarded by Naveenpf."Party Time" token, awarded by MSantos."Love" token, awarded by Liuxinyu970226."Love" token, awarded by John_Cummings.

Description

Here is an example of a data on commons interactive map on Wikipedia https://en.wikipedia.org/wiki/Epidemiology_of_obesity

It is based on this underlying data https://commons.wikimedia.org/wiki/Data:Sandbox/Doc_James/Obesity_Males_CC-BY-SA.tab

The footer inappropriately says it is CC-0 when it is in fact CC BY SA 4.0

Event Timeline

eranroz added a project: JsonConfig.
eranroz updated the task description. (Show Details)
eranroz added subscribers: Yurik, Steinsplitter.

Change 450397 had a related patch set uploaded (by Eranroz; owner: Eranroz):
[mediawiki/extensions/JsonConfig@master] Adding support for CC-BY-SA

https://gerrit.wikimedia.org/r/450397

This requires mainly configuration changes and little bit coding (to indicate the license based on the current page rather than config, if such available) - https://gerrit.wikimedia.org/r/450397

However, it requires some knowledge and thinking about the appropriate settings to make sure we are compatible with valid licenses of commons. I appreciate @Steinsplitter as a well known sysop on commons who can give us good advises here.

Licenses: Should we support only CC-BY-SA 4.0 or any other type of CC-BY? and what about ODbL? (I'm less familiar with the last one)

@Yurik Is this task and the patch are also relevant for T179440?

@eranroz thanks, reviewed the code, looks awesome, but need a few minor tweaks. The proper license codes are at https://spdx.org/licenses/ .

I think we should support ODBL-1.0, CC-BY-*, and CC-BY-SA-* (all known versions). In addition, we should also allow "CC-BY-SA-4.0+" (with the plus symbol) to indicate "or later". There are many by-sa versions, so it is clearly a "work-in-progress" license. If the contribution locks it to a specific version, consumer cannot use it under the terms of the newer version of the same license. On the other hand, if user is copying data from somewhere that didn't have the "or later" clause, they have to use the specific version without the plus.

Note that we do not need "3.0+" or "2.0+", because if someone published data under "2.0 or later", we can re-publish it as "4.0 or later".

Last thought: alternatively we may only allow the ones that have "Y" in the "FSF Free/Libre?" column on the SPDX page to keep things simpler.

@MaxSem With respect to ODbL, Commons already allows it so should be no problem here either.

Change 450397 merged by jenkins-bot:
[mediawiki/extensions/JsonConfig@master] Adding support for CC-BY-SA

https://gerrit.wikimedia.org/r/450397

OK, we are almost done - technical change is ready, so in the next deployment we can support it.

Please review my change in Tabular Data documentation - I tried to summarize the policy of preferring CC0 while still allowing CC-BY, CC-BY-SA and ODBL and added there also:

Any templates that pull data from non-CC0 licensed datasets will need to comply with the relevant attribution terms.

Following the recent release that include this patch, I announced it on commons (here)

Mentioned in SAL (#wikimedia-operations) [2018-08-30T21:57:15Z] <jforrester@deploy1001> Synchronized php-1.32.0-wmf.19/extensions/JsonConfig/: Hot-deploy Ieaded578ffd revert of T200968 due to bugs (duration: 00m 51s)

Jdforrester-WMF subscribed.

We've had to revert this code due to some serious bugs and a lack of integration checks, I'm afraid. Hopefully it can be fixed up and re-applied soon.

Change 456506 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/JsonConfig@master] Re-apply "Adding support for CC-BY-SA"

https://gerrit.wikimedia.org/r/456506

For what it is worth, with the revert applied non-CC-0 maps on Commons no longer work. See, for example, c:Data:Highway_192_in_Iowa_(3).map, which shows the error message Parameter "license" must be one of the valid license codes, for example CC0-1.0 instead of a map.

Possibly related, if you edit c:Data:Highway_192_in_Iowa_(3).map you see "⧼jsonconfig-license-notice-box-CC0-1.0⧽" immediately above the edit box. Presumably this is supposed to be a more intelligible message about CC0 licensing.

For what it is worth, with the revert applied non-CC-0 maps on Commons no longer work. See, for example, c:Data:Highway_192_in_Iowa_(3).map, which shows the error message Parameter "license" must be one of the valid license codes, for example CC0-1.0 instead of a map.

Yeah, if you "fix" the license field to say CC0-1.0 it will work again.

Possibly related, if you edit c:Data:Highway_192_in_Iowa_(3).map you see "⧼jsonconfig-license-notice-box-CC0-1.0⧽" immediately above the edit box. Presumably this is supposed to be a more intelligible message about CC0 licensing.

I've just now finished the hour-long deploy to get that message in production (oy, caching).

Were are we at with getting the patch fixed?

Were are we at with getting the patch fixed?

The main concern is that we are lacking integration tests, so it is hard to tell that code breaks important behavior. So basically we need to pick the old patch: https://gerrit.wikimedia.org/r/#/c/456506/
and carefully test it with different configurations to fix all the issues.

Anyway, I think this is important capability which may open opportunities for new contributions and I'll get to it by early November (if no one will take it before).

@eranroz I'm happy to help with documentation on this, I've mostly written a page for Wikidata on map data and these licensing issues need fixing before I can publish

https://www.wikidata.org/wiki/User:John_Cummings/Map_data

Thanks

@Mrjohncummings thank you for your comments, such comments with good practical usecases really helps to motivate me and other volunteers to improve the software.
The current status- I started to work on it (with help and good comments of @Yurik and @Jdforrester-WMF ) but we decided to revert the changes as they were buggy, but I didn't find the time to work on it. will try to get it soon in the upcoming weekends.

Meanwhile, maybe James/Yurik have any idea how to approach the integration tests? (or more directly: what kinds of different configurations are deployed across different wikis that would be important to test before reapplying such patch)

Thanks very much @eranroz , let me know where I can be helpful as a muggle, I'd really like to get this working :)

@eranroz Do not think I can add anything to the integration testing.

@eranroz do you have a link to the abandoned/WIP patch?

@eranroz is there anything non technical people can help with with this task?

I think the list of licenses which this supports should be the same list of licenses as Commons since it is being hosted on Commons? Is this correct?
If yes I think this should provide a full list if you add the lists up together? https://commons.wikimedia.org/wiki/Commons:Copyright_tags
If yes then we should make a long list and add it to the task description?

@eranroz I have the possibility for a very large donation of shape files (100,000+) for different species distribution, this won't be possible until we can properly attribute the files. Is there anything I can do to help this happen?

Yes we have a number of collaborations that are waiting on this...

I uploaded a fixed patch to:
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/JsonConfig/+/456506/

I tested it also with Dashiki extension and fixed an issue reported earlier (when we deployed this feature without enough testing - e.g T203173 )

In T200968#5295620, @Mrjohncummings wrote:

@eranroz is there anything non technical people can help with with this task?

I think the list of licenses which this supports should be the same list of licenses as Commons since it is being hosted on Commons? Is this correct?
If yes I think this should provide a full list if you add the lists up together? https://commons.wikimedia.org/wiki/Commons:Copyright_tags
If yes then we should make a long list and add it to the task description?

The patch should provide basic support for licenses - this isn't the full list of copyrights in Commons (in particular - no GFDL), but comprehensive enough to include much more content than CC0... (CCs 0-4+ BY and no BY, ODbL-1.0)
See list in:
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/JsonConfig/+/456506/8/extension.json

Wonderful thanks. How long before this patch is applied?

Does this mean we need not wait for T155290 to be resolved before uploading CC-BY-SA data?

Does this mean we need not wait for T155290 to be resolved before uploading CC-BY-SA data?

  • Once the above patch is reviewed, merged and deployed -yes
  • I'm not sure T155290 can open the path to uploads other than CC0 because JsonConfig is tightly coupled to CC0 in many system messages.

Change 456506 merged by jenkins-bot:
[mediawiki/extensions/JsonConfig@master] Re-apply "Adding support for CC-BY-SA"

https://gerrit.wikimedia.org/r/456506

eranroz updated the task description. (Show Details)

With the patch merged a week ago and already deployed in production, it is now possible to set license from the allowed licenses in:
https://www.mediawiki.org/wiki/Help:Tabular_Data