Page MenuHomePhabricator

When the author/source is indicated with a template, it is categorized as "Files with no machine-readable author/source"
Open, Needs TriagePublic

Description

https://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&diff=300614036&oldid=300613999

Example: https://commons.wikimedia.org/wiki/File:The_Men,_Yerevan_-_2018-05-10_-_Andy_Mabbett_-_01.jpg

Title says it all really. I've looked and I think this isn't a template issue because an empty page will also categorize as "Files with no machine-readable author".

https://commons.wikimedia.org/wiki/File:Portugal_(Ch.-Fl._112-2832).jpg

Same problem, but this file has no machine-readable source.

(title changed from When the author is indicated with a Creator template, it is categorized as "Files with no machine-readable author")

Event Timeline

In this email sent Oct 6 2014, Gergo Tisza wrote:

Starting this Tuesday (on Commons) or Thursday (all other wikis), files which do not have machine-parseable author, source, license or description will be automatically added to tracking categories (one category for each).
The name of the categories will be determined by the following messages:

commonsmetadata-trackingcategory-no-license
commonsmetadata-trackingcategory-no-description
commonsmetadata-trackingcategory-no-author
commonsmetadata-trackingcategory-no-source

Translatewiki link: https://translatewiki.net/w/i.php?title=Special:Translate&group=ext-commonsmetadata

If you would rather not have these tracking categories on your wiki, you can achieve that by setting the content of the local message to "-" (a single dash character).

Links to the local message pages are available from [[Special:TrackingCategories]].

<end quote>

That is the mechanism how those categories are added. According to Commons [[Special:TrackingCategories]] files are added to Category:Files with no machine-readable author‎ when "The file does not have a machine-readable information template, or its author field is not filled out." In case of this file we have the regular Information template which adds identical machine-readable marking as 90% of files on Commons have. So the issue must me that adding Creator template in the author field confuses the algorithm that looks for machine-readable marking. My guess is that we will need to understand precisely what is the algorithm looking for and figure out how to make Creator template compatible with that.

By the way, machine-readable marking added by the Information template is:

`<td id="fileinfotpl_aut" class="fileinfo-paramfield" lang="{{int:lang}}">{{int:wm-license-information-author}}</td>
<td>{{ #if: {{{author| }}} | {{{author}}} | {{Author missing}} }}</td>`

Running the metadata extraction by hand I get

"Artist" => """
  <bdi><a href="https://www.wikidata.org/wiki/Q15136093" class="extiw" title="d:Q15136093">Andy Mabbett</a>\n
  </bdi>
  """,

(the API result shows the same thing) so in theory CommonsMetadata should not categorize it like that.

(As a side note, when the author is missing, the field should not have the fileinfotpl_aut ID; otherwise the machine-readable metadata will have "This file is lacking author information" as the author name.)

you also might want to consider cases like artwork and photograph template, where there is an artist, or photographer, and not author.

(As a side note, when the author is missing, the field should not have the fileinfotpl_aut ID; otherwise the machine-readable metadata will have "This file is lacking author information" as the author name.)

I simplified the wikitext code a bot so it is easy to read. The actual code snipped is:

`

<!-- Author -->
<tr style="vertical-align: top">
<td {{#switch: {{{author|{{{Author|}}} }}} |- |-- |= |#default= id="fileinfotpl_aut" }} class="fileinfo-paramfield" lang="{{int:lang}}">{{int:wm-license-information-author}}</td>
<td>{{ #if: {{{author|{{{Author|}}} }}} | {{Information/author processing|author={{{author|{{{Author|}}} }}}}} | {{Author missing}} }}</td>
</tr>
`

so the field does not have fileinfotpl_aut ID when author is missing.

As a different side note, each information template (and other infoboxes) have 2 <td> cells per row one with field name and the other with field value. The fileinfotpl_aut marking is on the cell with the name not the value. That makes very little sense to me, especially since other infoboxes like Creator or Institution templates add machine readable tags to the cell with the actual information.

AlexisJazz renamed this task from When the author is indicated with a Creator template, it is categorized as "Files with no machine-readable author" to When the author/source is indicated with a template, it is categorized as "Files with no machine-readable author/source".May 24 2018, 10:30 AM
AlexisJazz updated the task description. (Show Details)
Vvjjkkii renamed this task from When the author/source is indicated with a template, it is categorized as "Files with no machine-readable author/source" to i5caaaaaaa.Jul 1 2018, 1:10 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Yann renamed this task from i5caaaaaaa to When the author/source is indicated with a template, it is categorized as "Files with no machine-readable author/source".Jul 1 2018, 10:50 AM
Yann raised the priority of this task from High to Needs Triage.
Yann updated the task description. (Show Details)
Yann added a subscriber: Aklapper.

Maybe the first step of resolving this issue should be some publication (or link to past publication) of machine-readable marking, expected by the software. At the moment those categories are useless, as they are filled with millions of files with correct templates and author/source info.

As I said earlier, this looks like a software bug, not a problem with the markup. That said, fully machine-readable authorship data would be nice. There's some work at https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling/Author.