Page MenuHomePhabricator

[Spike] Audit existing licenses for attribution requirements
Open, HighPublic3 Estimated Story PointsSpike

Description

Description

The Attribution API aims to return all information required to appropriately attribute Wikimedia content. However, every Wikimedia project and file can have different licenses applied. We would therefore like to understand the full scope of licenses in use, so that we may infer what the most common and/or restrictive licenses available are.

Conditions of acceptance

  • Verify the scope of licenses in use across Wikimedia projects and media files.
  • Audit existing projects for global license type in use.
  • Audit commons files for most common media license.
    • Audit if/how frequently multiple licenses are used on a single file.
  • Audit Wikipedia project media files for most common licenses in use, if allowed to be different than the Wikipedia project as a whole.

Event Timeline

Hey @pmiazga -- hope the initial research is going well! Just flagging a comment that Birgit made earlier. Apparently Commons historically allowed multiple licenses to be used on the same image files. Based on the newer upload interfaces, it seems like they do push more for a single license, but we should poke around to determine if a) it is still technically possible to assign multiple licenses for a single image, and b) the proportion of images where that is true.

If that is a case we need to worry about, we should then have a discussion about what it might mean for the current license response. We're currently returning a string, so the two main options I see are 1) only return the most restrictive license, or 2) update that to be an array of all assigned licenses (likely still ordered by level of importance or restriction).

I also updated the description with slightly more context, and a note about multiple licenses being a thing.

An example files that has multiple licences: https://ga.wikipedia.org/wiki/%C3%8Domh%C3%A1:Roman_-_Portrait_of_the_Emperor_Marcus_Aurelius_-_Walters_23215.jpg

An example when licence is incorrectly applies - https://commons.wikimedia.org/wiki/File:Statue_of_Liberty_7.jpg
This returns a license for the object ( Statue of Liberty ), not the picture of the Statue.

Another example of dual-licence file - https://commons.wikimedia.org/wiki/File:Statue_of_Liberty_2.jpg ( both CC BY-SA 3.0 and GNU )
same goes with https://commons.wikimedia.org/wiki/File:%22_Igreja_S%C3%A3o_Jo%C3%A3o_em_Porto_Alegre,_Brasil_%22.JPG

aaron changed the subtype of this task from "Task" to "Spike".Thu, Feb 19, 4:08 PM
BPirkle renamed this task from Audit existing licenses for attribution requirements to [Spike] Audit existing licenses for attribution requirements.Thu, Feb 19, 4:08 PM
BPirkle set the point value for this task to 3.Thu, Feb 19, 4:12 PM

a) it is still technically possible to assign multiple licenses for a single image, and b) the proportion of images where that is true.

Looks like yes it is, because the license is not an attribute or property of the entity - it's stored as a content of page, therefore editors can put whatever they want, and I don't think there is anything stopping them from putting content like

== {{int:license-header}} ==
{{Self|GFDL|Cc-by-sa-3.0-migrated|cc-by-sa-2.5}}

From my quick click trough I see a pretty common case of having both GFDL and CC BY-SA used together.

My concern is the having the two licences, when one applies to the photographed object and the second to the photograph itself. How should we tackle this case?