Description
Users of the Attribution API may want or need to alter their experience depending on the type of license used by the reused content. For example, the Future Audiences team is interested in displaying the specific license icon depending on the type of license in use. To make these types of use cases easier, we should make it easier for reusers to confidently handle different license values by normalizing them to the best of our ability.
Although the license type is a community and human set value, there are style standards defined by the Creative Commons organization that we can adopt to transform the data before it is returned.
Conditions of acceptance
- Standardize how license titles are returned for the most common license types
- Follow rules set out in https://creativecommons.org/licenses/list.en to support jurisdiction, igo, and other permutations.
- Always return appropriate capitalization; abbreviations should be all caps (for example, CC BY-SA 3.0 IGO, CC BY-SA 2.0 DE)
- If 'Generic' is used, do not include it
- If 'Unported' is used, return the full word
- For public domain licenses, always return "PDM" for public domain mark, per CC standards
- Includes both "Public domain" and "PD-XX" classifications
- If an unknown license type is present, return it in the raw format
Implementation details
Below is an image of some existing licenses that are being returned.
Claude provided regex to use as a starting point for all CC & PDM licenses:
'\bCC0(?:\s+1\.0)?|PDM\s+1\.0|\bCC BY(?:-(?:NC(?:-(?:ND|SA))?|ND|SA))? (?:1\.0|2\.0|2\.1|2\.5|3\.0|4\.0)(?:\s+(?:IGO|Unported|Generic|[A-Z]{2}(?:\s+[A-Z]+)*))?\b'