Page MenuHomePhabricator

Design a system for serving images with depicts suggestions needing confirmation
Closed, InvalidPublic

Description

We need a system for serving images with depicts suggestions needing confirmation.

Criteria:
Images fall out of the queue when they have one or more label candidates approved. If a user submits votes but only rejects or abstains on each candidate, approving none, the image remains in queue. N.B. This is v1 behavior and may change, so we should provide for this changing as best we can.

Open questions:
Should we favor completeness (showing all suggested tags (that meet the minimum cutoff)) or breadth (getting initial confirmations for as many images as possible)? The consensus view seems to be the latter.

Proposal
Present the suggestions on a Special page similar to https://commons.wikimedia.org/wiki/Special:UncategorizedFiles by implementing an ImageQueryPage subclass.

Ideally we would add some logic to prominently feature any images needing label confirmation that the logged-in user has uploaded.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Mholloway updated the task description. (Show Details)
Mholloway updated the task description. (Show Details)
Mholloway renamed this task from Design a system for serving images with depicts suggetsions needing confirmation to Design a system for serving images with depicts suggestions needing confirmation.Jul 23 2019, 3:38 PM

It would be nice -- and not just for depicts statements -- for there to be an additional rank, beyond "preferred", "normal", and "deprecated", so that statements could be given the rank "suggested by machine, but further confirmation desired".

This might be appropriate eg for statements inferred from current Commons categories for a file.

A gadget could present users with a set of images that had a particular value for a particular property with this rank, and ask them to approve/reject.

It would be good for unassessed suggestions for statements (and not necessarily just "depicts" statements) to be accessible via the SPARQL query service, in the same way as regular statements, but possibly set with a lower trust level.

One model that has been floated in discussions around the Wikimania Hackathon would be to initally mark such statements as "deprecated" with qualifier "reason for deprecation" (P2241) = "automated inference, as yet unconfirmed" or similar. Another qualifier could be used to the "assessed probability" projected by the AI system. It would also be useful to add a reference to record which AI system had generated the prediction.

This approach would make unassessed suggestions fully visible to SPARQL query writers, if they wished their queries to see them, ie by using the p:Pxxx form of the property rather than the wdt:Pxxx form of the property in their queries; and also, if desired, to threshold their queries to only take account of suggestions with probabilities assessed above a particular level.

It also has the advantage of being an open accessible system -- anyone can add statements of this form, allowing allcomers to run their own approaches and add resulting suggestions into the system.

Thanks for commenting, @Jheald. Can you share what you would use this capability for?

Thanks @Abit

So for example this afternoon @Miriam is going to be leading a workshop session introducing image classifiers and clustering algorithms, with a view that Commons users can start to explore writing their own large-scale machine-learning image analysis tools. It would be good if there was syntax in place so they could write the results of those explorations to Commons SDC, for that they were there and accessible for the community to then refine or extract or take further for each image. Similarly two days ago @Multichill led a small group session, to try to think through what was the generic pipeline and workflow for machine-learning contributions, and what kind of open framework was needed to support bulk contributions of that kind from allcomers and any set or subset of images. @Fuzheado too has been talking about some of the investigations he has been doing with machine vision and the Metropolitan Museum collection. All those voices, and more, I think would have useful input for this conversation.

Myself, I have been to meet-ups where a researcher from eg the world-class Oxford machine vision group has been looking for new challenges that they can throw algorithms and students at. At a meeting of worldwide GLAM coordinators two days ago others had had this experience too. Over the last 15 years Wikipedia has been a very popular source of data for academic researchers to be able to investigate at scale, and with SDC the images on Commons will become very attractive as a very big, accessible, worldwide-visible set of partially-tagged images for use as raw data for investigation. So I see the potential for a lot of interest from research groups worldwide, some of whom may be very good indeed. We need an open framework so that such groups can write their results back to Commons, and enrich it for all, even though as the products of ML those results are going to be (at best) probability assessments for further investigation, rather than definite yes/no binary facts.

One of the characteristics of machine learning is that there is a huge range of different approaches being researched, with models of different types, each trained on different training data, specialised for different challenges, different niches, and different applications -- and for the entire research community a constant pressure to come up with something new, that nobody has quite done in that way or for that problem before. So even if we find a excellent general classifier and apply it centrally, there will always be room for new approaches and new specialisations - and the openness and availability and increasingly accessible description of Commons images should increasingly make them a honeypot for external researchers.

In turn they may apply models specially trained for specific domains -- so for example a general classifier might identify the presence of a car or a cat in an image, but they might be working on a classifier trained specifically for a specialist domain, and might be able to take that tagging suggest more refined identifications, eg a VW Golf or a 1951 Buick, a Persian cat or a Burmese. Or their research may be taking an entirely different approach -- eg this demo, which rather than trying to do a top-down classification of all images, instead adopts a "shooting" approach, taking a single classification term, trying to learn its characteristics, then trying to identify images that match those specific characteristics. Also, there are things that the most common general classifiers find hard. One might have thought that "Map of Australia" would be a relatively easy thing to identify, with that country's characteristic outline. But I would like to see some code that reliably can. So however good our in-house classifier may be or may become, we should (also?) have a framework that makes it possible for other actors to register provisional assessments.

One can also identify different moments of opportunity. There is a crucial moment, that is when the image is first uploaded. For that moment, we want to have as good an in-house classifier as we can, so that we can immediately offer suggestions to the initial uploader -- because perhaps nobody will ever be as interested in that file again as they are at that moment. But we also have an existing store of 40 million images to work through. Obviously, the more that our in-house classifier can do to classify such images the better. But for this set we want to be able to gather together suggestions from as many actors as we possibly can -- including tools developed by regular Commons users. So it becomes hugely advantageous to have a structure that regular Commons users can easily write to; and also to easily extract from at scale, eg to allow users to develop new crowd-judgment apps (Wikidata game is great, but different judgment apps might make sense for a set of images with group membership estimated at 98% from those with membership estimated at 66% -- plus, it's a valuable and interesting thing to give community tool-makers the chance to make).

Finally, I said above that this wasn't just about machine vision; and there will be other types of statement beyond "depicts" that it would be good to be able to record values for, marked in some way as 'provisional'. Just as one example of this, one of the strongest sources of information we want to tap into is the category system. But the contents of Commons categories are not homogenous. For eg a category of images of the inside of cathedral X, for 95% of cases or more it may be appropriate to write statements "Depicts = Cathedral X", with qualifier "Depicted Part = Interior", the category might also contain a close-up of a candle, which just happened to have been located within the cathedral. For that one might want to write "Location = Cathedral X", but only "Depicts = Church Candle". And of course many categories are a lot more random than that. So this is the kind of case where it would be good for bots to be able to write statements straight to SDC, but for those statements to somehow be marked as provision, so query-writers could choose to include them or not. The SDC scheme I set out about might be one syntax for doing this.

Thanks for these use cases @Jheald -- they are very interesting, and I appreciate you explaining the demand as well. We're trying to finish the prototype for the Commons Query Service by the end of the month, so I don't think we can incorporate them in this first iteration, but we should consider them for the next. Copying @dr0ptp4kt so he's aware of these use cases for his machine vision conversations as well.

(Edited formatting so pinging Jheald worked correctly.)

@Mholloway Can you update this ticket with the appropriate priority and column? Thanks!

photo_2019-08-22_08-37-01.jpg (1×750 px, 84 KB)
For the record, here's a photo of the collaborative flip-chart that came out of @Multichill's small-group session on Wednesday 14th at the Hackathon.

The aim was to try to identify the different generic stages of the workflow in machine vision, and to think how each stage could be made as "pluggable" as possible, so how users could be made able to plug in their own approach at any stage, and report their results from that stage back into the overall pipeline.

The stages identified were:

  1. Selection (ie from categories, from search, new files, PagePile etc);
  2. Processing, leading to classification suggestion;
  3. Database recording of the suggestion (eg in the structured data, or an accessible MySql table);
  4. Judgment (ie human assessment of the suggestions) via some tool, eg Wikidata distributed game, or some gadget or extension, or some pop-up on the filepage
  5. finally a mechanism to record the result of the judgment, to be available to feedback as calibration on the suggestion step
Jdforrester-WMF subscribed.

Extension has been archived per T352884.