Page MenuHomePhabricator

Store and serve annotations in W3C standard format
Open, Needs TriagePublic

Description

The relevant W3C standards are:

As I understand it, the way the pieces fit together is:

  1. On https://commons.wikimedia.org/wiki/File:Douglas_adams_portrait_cropped.jpg we insert in the <head>:
<link rel="http://www.w3.org/ns/oa#annotationService" href="https://commons.wikimedia.org/wiki/File_annotations:Douglas_adams_portrait_cropped.jpg"/>

(This could also appear in an HTTP Link header.) Spec: https://www.w3.org/TR/annotation-protocol/#discovery-of-annotation-containers

  1. This makes the File_annotations page represent an "Annotation Container", which can be retrieved by GET according to https://www.w3.org/TR/annotation-protocol/#container-retrieval -- and in particular, you should be able to use the Accept header to request the application/ld+json type (note, this is slightly different from the usual application/json type).
  1. The returned AnnotationContainer should be in the form: (see this on the JSON-LD playground)
HTTP/1.1 200 OK
Content-Type: application/ld+json; profile="http://www.w3.org/ns/anno.jsonld"

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "id": "https://commons.wikimedia.org/wiki/File_annotations:Douglas_adams_portrait_cropped.jpg",
  "type": [
    "http://www.w3.org/ns/ldp#BasicContainer",
    "AnnotationCollection"
  ],
  "modified": "2017-05-06T12:00:00Z",
  "label": "Annotations for File:Douglas_adams_portrait_cropped.jpg",
  "first": {
    "id": "https://commons.wikimedia.org/wiki/File_annotations:Douglas_adams_portrait_cropped.jpg?page=0",
    "type": "AnnotationPage",
    "items": [
      {
        "id": "https://commons.wikimedia.org/wiki/File_annotations:Douglas_adams_portrait_cropped.jpg#a1",
        "type": "Annotation",
        "body": {
          "type": "http://www.wikidata.org/prop/direct/P1442",
          "source": "https://www.wikidata.org/entity/Q42"
        },
        "target": [
          {
            "selector": {
              "type": "CssSelector",
              "value": "#file img[data-file-width]"
            },
            "source": "https://commons.wikimedia.org/wiki/File:Douglas_adams_portrait_cropped.jpg",
            "state": {
              "type": "TimeState",
              "sourceDate": "2017-05-06T13:30:00Z"
            }
          },
          {
            "source": "https://upload.wikimedia.org/wikipedia/commons/c/c0/Douglas_adams_portrait_cropped.jpg",
            "state": {
              "type": "TimeState",
              "sourceDate": "2017-05-06T13:30:00Z"
            }
          }
        ]
      },
      {
        "id": "https://commons.wikimedia.org/wiki/File_annotations:Douglas_adams_portrait_cropped.jpg#a2",
        "type": "Annotation",
        "body": [
          {
            "type": "TextualBody",
            "format": "text/wikitext",
            "value": "Free '''text''' annotation"
          },
          {
            "type": "TextualBody",
            "format": "text/html; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/HTML/1.4.0\"",
            "value": "Free <b>text</b> annotation"
          }
        ],
        "target": {
          "selector": {
            "type": "FragmentSelector",
            "conformsTo": "http://www.w3.org/TR/media-frags/",
            "value": "xywh=50,50,640,480"
          },
          "source": "https://upload.wikimedia.org/wikipedia/commons/c/c0/Douglas_adams_portrait_cropped.jpg",
          "state": {
            "type": "TimeState",
            "sourceDate": "2017-05-06T13:30:00Z"
          }
        }
      }
    ]
  },
  "total": 2
}

Open questions:

  • Writing a permalink to refer to an image is actually harder than I expected. Although we have nice permalinks for the File:... page, that's actually only the metadata for the image. The actual image is served from (say) https://upload.wikimedia.org/wikipedia/commons/c/c0/Douglas_adams_portrait_cropped.jpg which is then moved to a different archive url like https://upload.wikimedia.org/wikipedia/commons/archive/c/c0/20100416225428%21Douglas_adams_portrait_cropped.jpg if/when the image is updated. We apparently can't know the archive URL without predicting when the file is going to get updated. As a fallback, we're using the memento mechanism (which isn't actually implemented yet -- T164654 -- although there is Extension:Memento). @GWicke suggests content-hash-based URLS (T149847). Multi-content revisions might also provide a fix.
  • We can support multiple targets, but if you include the File:...jpg page as a target, it seems you need to include a quite complicated XPath or CSS selector to extract the actual image on the page.
  • How to handle multiple resolutions of the image? This is related to the previous item, as the image embedded in the HTML File:...jpg page may not be full-size.
  • Is there a way to avoid repeating so much? In particular the target array gets very repetitive, especially if we add all the scaled versions of the image as alternate targets.
  • We're using the http://schema.org/image relation, but we'd rather use the Wikidata P18 relation. In order to do that, we need wikidata to export a proper vocabulary (https://phabricator.wikimedia.org/T44063#3241034). Seems like https://github.com/schemaorg/schemaorg/issues/1186#issuecomment-221991582 gives us some ways to do this.
    • Tweaked to use the wikidata P1442 relation as a type on the body, but I'm not convinced this is the correct way to indicate this semantic triple.
  • We're not really using the paging interface. If the number of annotations got really large, we might need to figure out how to name each page of annotations.

Details

Related Gerrit Patches:

Event Timeline

cscott created this task.May 6 2017, 9:36 PM
Restricted Application added a project: Multimedia. · View Herald TranscriptMay 6 2017, 9:36 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
cscott updated the task description. (Show Details)May 6 2017, 11:52 PM

Change 379669 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[purtle@master] Tweak JSON-LD support to generate more idiomatic JSON-LD

https://gerrit.wikimedia.org/r/379669

cscott updated the task description. (Show Details)Oct 13 2017, 4:54 PM
cscott added a subscriber: GWicke.

Change 379669 merged by jenkins-bot:
[purtle@master] Tweak JSON-LD support to generate more idiomatic JSON-LD

https://gerrit.wikimedia.org/r/379669

Ramsey-WMF moved this task from Untriaged to Tracking on the Multimedia board.Nov 27 2017, 10:50 PM