[XL] Evaluate 'depicts' annotations added via CAT
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Cparle
	Jun 20 2023, 10:09 AM

Description

Some community members are concerned that 'depicts' annotations added via CAT (~~including additions via the ISA tool which uses the MachineVision extension API to get suggestions~~)* are doing more harm than good

To try and measure this objectively we could select a random sample of 'depicts' annotations added via Special:SuggestedTags or ISA and make an interface to rate them as "good" or "bad" in a way similar to how we've classified image suggestions in the past. Then we could allow the community (or ambassadors) to use the interface to rate the annotations, and come up with a reasonably objective idea of whether they're good or bad overall

Once that's done we can report back to the community and they can decide whether to turn CAT off

(An alternative would be to use a more complex rubric for rating "depicts", similar to https://commons.wikimedia.org/wiki/User:Rhododendrites_(WMF)/Suggested_Edits/data )

it looks like the part of the ISA tool that uses machine-vision suggestions has not been deployed so far, so that may be unaffected

Related Objects

Mentioned In: T352884: Undeploy and archive the MachineVision extension
T338831: Migrate MachineVision to Vue 3
T345187: [Spike] Figure out what's involved in turning MachineVision off

Event Timeline

Cparle created this task.Jun 20 2023, 10:09 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 20 2023, 10:09 AM

Cparle updated the task description. (Show Details)Jun 20 2023, 10:10 AM

Cparle added a subscriber: AUgolnikova-WMF.

Sannita added subscribers: Sannita, Udehb-WMF.Jul 6 2023, 1:56 PM

allow the community (or ambassadors) to use the interface to rate the annotations

Piling extra and pointless work such as this onto an already stretched volunteer community is pointless, bordering on harmful; as I pointed out when you raised the same suggestion on Commons, four days before you opened this ticket [1]

Community members have identified the problems and described them to you repeatedly, with ample examples, since February 2020. [2, 3, 4]

You have been asked questions - reasonable questions - about the approach time and time again since that date; these remain unanswered. [2, 3]

You now propose to take "the next few months" [5] doing nothing to actually fix the issue, while holding no meaningful discussion about the problems - which you have yet to acknowledge exist, using weaselly phrases like "Some community members are concerned" - or their solution with community members.

[1] https://commons.wikimedia.org/wiki/Commons_talk:Structured_data/Computer-aided_tagging#WMF_response

[2] https://commons.wikimedia.org/w/index.php?title=Commons_talk:Structured_data/Computer-aided_tagging/Archive_2020#Bad_tags,_nagging,_and_no_tags

[3] https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2020/02#Misplaced_invitation_to_%22tag%22_images

[4] https://commons.wikimedia.org/wiki/Commons_talk:Structured_data/Computer-aided_tagging#Large_numbers_of_trash_tags?

[5] https://commons.wikimedia.org/wiki/Commons_talk:Structured_data/Computer-aided_tagging#Update_on_Computer-aided_Tagging_-_We're_on_it!

AUgolnikova-WMF moved this task from Triage to Current Work on the Structured-Data-Backlog board.Jul 24 2023, 3:55 PM

AUgolnikova-WMF edited projects, added Structured-Data-Backlog (Current Work); removed Structured-Data-Backlog.

AUgolnikova-WMF moved this task from Incoming to Ready for Estimation on the Structured-Data-Backlog (Current Work) board.

MarkTraceur renamed this task from Evaluate 'depicts' annotations added via CAT (incl the ISA tool) to [XL] Evaluate 'depicts' annotations added via CAT (incl the ISA tool).Jul 26 2023, 5:27 PM

MarkTraceur moved this task from Ready for Estimation to Ready for Development on the Structured-Data-Backlog (Current Work) board.

Cparle claimed this task.Aug 3 2023, 2:18 PM

Cparle moved this task from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.

Cparle renamed this task from [XL] Evaluate 'depicts' annotations added via CAT (incl the ISA tool) to [XL] Evaluate 'depicts' annotations added via CAT.Aug 17 2023, 12:53 PM

Cparle updated the task description. (Show Details)

At Wikimania this week, Mariana Fossati and Sunshine Fionah Komusana of Whose Knowledge? talked about some of the challenges of using structured data to describe the images of women contributed through #VisibleWikiWomen (consent, privacy, biases in automated description): https://www.youtube.com/live/nSsVDaCJyZ8?feature=share&t=800

Cparle mentioned this in T345187: [Spike] Figure out what's involved in turning MachineVision off.Aug 29 2023, 4:43 PM

Because of @Pigsonthewing’s reservations about the WMF asking the community to evaluate depicts annotations, I took a random sample of depicts annotations added in 2023 via the Special:SuggestedTags, and evaluated them myself. Here are the results:

Total annotations rated	1000
Annotations rated “bad”	734
Annotations rated “ok”	180
Annotations rated “good”	86

Here’s a more detailed breakdown with reasons for all “bad” or “ok” rated images:

Rating	Reason	Count
Bad	Image is a scan of a text (or mostly-text) document, and so probably should not have a "depicts" annotation.	218
Bad	Depicts annotation is present in image, but only as an incidental part (e.g. "road surface" for an image of a car)	149
Bad	Depicts annotation is not present in image	112
Bad	Depicts annotation is too general to be useful (e.g. "automotive design" for an image of a car, or "blue" for an image of the sky)	90
Bad	Depicts annotation is abstract or invisible (e.g. "happiness" or "electricity" or "visual arts")	60
Bad	Depicts annotation is general, when we already have a more specific annotation (e.g. "plant" when we already have "oak")	50
Bad	Depicts annotation is a part of a pre-existing annotation (e.g. "tire" when we already have "car")	25
Bad	Depicts annotation used in the wrong sense (e.g. the mathematical concept "slope" for an image of a hill)	18
Bad	Other	10
Bad	Only part of the item described in the depicts annotation is visible (e.g. "airplane" when only a wing is visible)	2
Ok	Depicts annotation is more general than we would like, but might be useful anyway (e.g. image of a house annotated with "building" when there are no other annotations)	79
Ok	Depicts annotation is general when we already have a more specific annotation, but might be useful anyway (e.g. "dog" when we already have "poodle")	59
Ok	Depicts annotation only describes one aspect of the image, but might be useful anyway (e.g. image of a cemetery annotated with "tombstone" when there are no other annotations)	38
Ok	Other	4

What next?

Only 8.6% of the “depicts” annotations added via the tool evaluate as “good”, with 73.4% evaluating as “bad”.

The CAT tool uses a “blocklist” to reject suggested annotations from google that contain images of people. If we used it to reject suggested annotations that
a) indicate the image might be a scan of a document (e.g. “text”, “document”, “line”, “calligraphy”)
b) are abstract (e.g. “happiness”, “sharing”, “color, tint and tone”)
c) are mathematical concepts (“slope”)
… then we might be able to reduce the proportion of “bad” images.

For example if we successfully detected all the document scans, and eliminated all the abstract suggestions plus eliminated “slope” as a suggestion in the test sample, we’d remove 296 “bad” images from the sample.

This leaves us with 438 “bad” images out of 704 images remaining, which is still a “bad” proportion of 62%, and a “good” proportion of only 12%.

So even if our mitigation measures are very successful, more than 6 out of 10 depicts annotations added by the tool are likely to be bad.

It’s possible that a UI redesign might reduce this, but given that we have no roadmap for this and our “good”-rated images are outnumbered by “ok”-rated images (images where the quality of the annotation is not objectively good but might be acceptable) 2:1, it seems as if turning off the tool is our best option

Cparle closed this task as Resolved.Sep 14 2023, 2:22 PM

DonTrung subscribed.Sep 24 2023, 10:59 AM

Fuzheado subscribed.Sep 26 2023, 1:08 PM

egardner mentioned this in T338831: Migrate MachineVision to Vue 3.Oct 30 2023, 5:40 PM

Aliyushaba subscribed.Nov 25 2023, 11:57 AM

Cparle mentioned this in T352884: Undeploy and archive the MachineVision extension.Dec 6 2023, 4:38 PM

Comparing with the rules listed at https://commons.wikimedia.org/wiki/Commons:Depicts I find the sample analysis above to be too harsh. I would consider these as good:

Depicts annotation is present in image, but only as an incidental part (e.g. "road surface" for an image of a car) <-- That's why we have "Prominent"
Depicts annotation is too general to be useful (e.g. "automotive design" for an image of a car, or "blue" for an image of the sky) <-- As long as not "Depicts annotation is general, when we already have a more specific annotation" I would consider this OK
Depicts annotation is abstract or invisible (e.g. "happiness" or "electricity" or "visual arts") <-- We need search for abstract concepts to give enough illustrations too.
Only part of the item described in the depicts annotation is visible (e.g. "airplane" when only a wing is visible) <-- Unless we have an "airplane wing" item this is OK, because many other objects have wings.
Depicts annotation only describes one aspect of the image, but might be useful anyway (e.g. image of a cemetery annotated with "tombstone" when there are no other annotations)

In other words 425 "good". Combined with Cparle's great improvement ideas, the extension would greatly improve searchability.

(e.g. "automotive design" for an image of a car, or "blue" for an image of the sky) <-- As long as not "Depicts annotation is general, when we already have a more specific annotation" I would consider this OK

Consensus on Commons clearly disagrees. Such statements are in the process of being removed.

We need search for abstract concepts

This is the crux of the problem. "Depicts" is meant as a structured way of saying what an image shows. It is not a tool for improving general searchability. If the latter is the use case, then a new property ("keywords", say ) should be proposed.

This task is complete. Shouldn't the ticket be closed?

Kaartic subscribed.Jan 12 2024, 1:51 PM

[XL] Evaluate 'depicts' annotations added via CATClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

What next?

[XL] Evaluate 'depicts' annotations added via CAT
Closed, ResolvedPublic
Actions