Group session/exercise on basic data modelling challenges for StructuredDataOnCommons at Wikimania-Hackathon-2019 - Thu 15 August 2019 in the afternoon.
Documented in Etherpad: https://etherpad.wikimedia.org/p/SDC_modelling_Wikimania2019
Group session/exercise on basic data modelling challenges for StructuredDataOnCommons at Wikimania-Hackathon-2019 - Thu 15 August 2019 in the afternoon.
Documented in Etherpad: https://etherpad.wikimedia.org/p/SDC_modelling_Wikimania2019
Session was attended by 10+ participants - many thanks!
Now how to process/summarize the input in the Etherpad and how to translate it to on wiki input?
https://etherpad.wikimedia.org/p/SDC_modelling_Wikimania2019
Backing up the etherpad here, follow up at https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling
https://commons.wikimedia.org/wiki/Commons:Structured_data/Properties_table
https://commons.wikimedia.org/wiki/Special:MostTranscludedPages
https://commons.wikimedia.org/wiki/Commons:Infobox_templates
Wikidata projects
Test images:
https://commons.wikimedia.org/wiki/Wiki_Loves_Monuments_2018_winners#Winners First ten
Author (P50) can be used if the author has a Wikidata item
If the author does not have a Wikidata item, there are a couple other possibilities to identify the author:
Author properties
Numerous Wikidata properties are available for author IDs in various databases, but we should probably pull this informtation automatically from the Author item in most cases.
The role of each author could be specified as a qualifier to the Author or Author name string value using the Subject has role (P2868) property. For example, photographer, painter, architect, scultpor, etc.
A new property is probably needed for Author attribution (which accepts a string). This will likely go under the licensing data, however, rather than the authorship data. There are only 13 attribution templates per https://commons.wikimedia.org/wiki/Category:Attribution_templates
Uploader to be treated separately from authorship.
Get the data! If we look at 100,000 random images, what is in the source field ?
t
Also: some operations -- rotation, colour modification, cropping, etc may have been undertaken by user prior to upload.
(eg : a photo of a 2D collage of objects)
Esp. important because these things may have different copyright status -- qualifiers below "depicts" statement ? -- how to indicate things if there is no obvious Q-item for something in the image, but neverthess one wants to identify it & record information relating to it? Should a "depicts" = "somevalue" statement be created to record information about particular parts of the image ? Will often be handled by the Q-items for the value(s) of the depicts statements
eg {{tl|BL cat credit}}
Need to both show complexity of copyright situation as well as straightforward information to end-user on usability of image
we need publication date besides creation dates (copyright relies on publication date, and in special situations creation date)
we need copyrightholder besides author
'attributed as' how to deal with that? 'author name string' P2093? or new property? P2093 will also be used for the normal names as mentoned, attribution names will differ. Better to have a specific attribution property
Restrictions
Portrait rights, that is a right for the depicted persons, how could we model that: 'depicts': qid/unknown and qualifier for rights?
usage restriction-property / reproduction restriction property
We already have 'copyright exemption' property: https://www.wikidata.org/wiki/Property:P7152
https://commons.wikimedia.org/wiki/Category:Non-copyright_restriction_templates
Traditional knowledge restrictions: https://www.loc.gov/collections/ancestral-voices/about-this-collection/rights-and-access/
Could we lnk a law or UN-treaty to a certain subject and in that way notify a user if that subject is depicted?
We could use 'depicts' that links to wikidata. In wikidata there could be a property that links to certain restrictions
Swaziki symbol, trademark etc property 'usage restriction' in WIkidata, so we move the information out of Commons
Start with Creative Commons licenses, PD-licenses are difficult because template information is complex
****actually some contest can have their own item (WLM Italy = Q19960422) / Yes the probably it's easier to have an item "Winner of Wiki Loves Monuments in Italy" and then just ranking and year (no, it's better not)
}}
https://commons.wikimedia.org/wiki/Template:Book
used on 785 273 pages (source: https://commons.wikimedia.org/w/index.php?title=Special:MostTranscludedPages&limit=300&offset=0 )
Does anyone know how many files are PDFs?
How often is each field used in this template?
Use case of https://commons.wikimedia.org/wiki/File:Gray356.png There is 2 templates, on specific for this image and one for the book where it's from. And instead of filling the template everytime, there is a pre-filled template : {{Gray's Anatomy}}.
FRBR
Book - Data Modeling for SDC
Followup actions:
title
wikidata title
description
legend
author
imgen
date
source
permission
license
map date
location
wikidata location
type
projection
scale
zoom
heading
latitude
longitude
warp status
warp url
set
wikidata set
sheet
book author
wd book author
book title
wikidata book
volume
page
language
publication place
publisher
print date
ISBNLCCNOCLC
institution
accession number
id
uri
dimensions
size
scan resolution
medium
technique
credit line
inscriptions
notes
other versions
references
demo
other fields
<s>* image P18</s> COMMENT: Not relevant for SDC
TO DO: additions to make, based on list of template fields above. Also note template fields that might need free-form text, or otherwise be unsuitable for putting into SDC statements (cf description pages vs Wikidata items for some existing maps)
ISSUE: How to deal with images that contain multiple maps, eg main map + one or more inset maps
Possibly we need to define some standard Q-items for sub-parts of an image, eg "Inset map", "Inset map #1", "Inset map #2', "sub map A", "sub map B" etc (the latter where there is no obvious main map - subsidiary map divide),
and then make statements "File:XYZ map" has parts "sub map 1", "sub map 2" etc, qualified with postition or/and bounding box within the image, followed by statements such as
"File:XYZ" depicts "Paris", applies to part: "sub map 1".
It's not a beautiful data model, but it might be workable.
QUESTION: Should every map have its Wikidata item? If so, which metadata will be stored on Wikidata (related to the object), and which metadata is specific to the file?
depicts (180)
located at street address (P6375)
collection (P195)
creator (P170)
subject has role (P2868)
inception (P571)
start time (P580), end time (P582) / earliest date (P1319), latest date (P1326)
date depicted (P2913)
start time (P580), end time (P582) / earliest date (P1319), latest date (P1326)
set in period (P2408)
P31
Commons compatible image available at URL (P4765)
described at URL (P973)
inscription (P1684)
collection (P195)
height (P2048)
width (P2049)
depicts (180) + preferred rank.
Relative position within image (P2677)
aspect ratio (P2061)
relative position within image (P2677)
checksum (P4092)
determination method (P459)
field of view (P4036)
focal length (P2151) meters and millimeters
digital representation of (P6243)
coordinates of the point of view (P1259)
Easy cases to start with:
Albin and Susanna: we need to be able to indicate provenance of specific statements, eg that they come from tools, AI... - would be good to do this in a uniform way and to have community consensus about it; human translation / various points of transformation and interpretation of the metadata / create a Phabricator ticket about this