Add an image: caption validation (PLACEHOLDER)
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	MMiller_WMF
	Oct 9 2021, 1:27 AM

Description

NOTE: a full validation experience will not be part of Iteration 1. This is for a future iteration. For the minimal version, see T293161: Add an image: minimal caption validation.

Placeholder task for validation rules on the caption entered by the user. These rules may include:

Minimum and maximum length.
Not allowing the filename to be included.
Not allowing same caption as previous image done by the user.
Checking that it is in the content language for the article.

This task will also include the user experience for displaying the warning message.

Mockup as of 2021-10-08:

Figma: https://www.figma.com/file/ULhJr1isDstRbGE5vjYDsr/Add-images-structured-task?node-id=3050%3A9628

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T293711 [EPIC] "Add an image" Iteration 2
		Open		None	T292888 Add an image: caption validation (PLACEHOLDER)

Event Timeline

MMiller_WMF created this task.Oct 9 2021, 1:27 AM

MMiller_WMF updated the task description. (Show Details)

MMiller_WMF mentioned this in T290781: Add an image: captions.Oct 9 2021, 1:33 AM

Minimum and maximum length.
Not allowing the filename to be included.

These are trivial to do.

Not allowing same caption as previous image done by the user.

Slightly more complicated because we'd have to store previous captions somewhere. Still fairly easy, but is it useful? Someone giving the exact same caption for multiple images seems unlikely.

Checking that it is in the content language for the article.

Language detection is a complicated problem. It requires dictionaries, which are probably too large to do this on the client side. We'd have to find which open-source tool does a good-enough job, and set it up as a web service. Unless someone already did that for another product, this isn't really feasible IMO.

In T292888#7414235, @Tgr wrote:

Minimum and maximum length.
Not allowing the filename to be included.

These are trivial to do.

Not allowing same caption as previous image done by the user.

Slightly more complicated because we'd have to store previous captions somewhere. Still fairly easy, but is it useful? Someone giving the exact same caption for multiple images seems unlikely.

I agree that not allowing previous captions seems not super useful validation.

Checking that it is in the content language for the article.

Language detection is a complicated problem. It requires dictionaries, which are probably too large to do this on the client side. We'd have to find which open-source tool does a good-enough job, and set it up as a web service. Unless someone already did that for another product, this isn't really feasible IMO.

I know that Android did this for their app version of Add caption task, not sure if this is using something within Android but @Dbrant may have more info? Also, is it possible that CX uses the dictionaries we are after for V1 languages? cc @Pginer-WMF

In T292888#7416310, @RHo wrote:

Also, is it possible that CX uses the dictionaries we are after for V1 languages? cc @Pginer-WMF

In Content Translation we have not been doing language detection, the focus has often been on content mapping across two languages we knew in advance. In particular:

Finding which could be the equivalent sections across two versions of an article in different languages. For which we use a database of equivalent section titles across different languages.
Finding which could be the equivalent template parameters across templates in two different languages. For this[[ https://github.com/digitalTranshumant/templatesAlignment/tree/master | a machine learning approach ]] based on multilingual fastText vectors was used.

I don't know if any of those underlying resources, or the MT services that Content Translation provides could be useful as part of the process to support detection. @santhosh may know more about language detection. I recall talking about this functionality when T98728 was explored.

In T292888#7416310, @RHo wrote:

I know that Android did this for their app version of Add caption task, not sure if this is using something within Android but @Dbrant may have more info?

This was indeed specific to Android -- our language detection uses Google's ML Kit.

In T292888#7416310, @RHo wrote:

In T292888#7414235, @Tgr wrote:

Minimum and maximum length.
Not allowing the filename to be included.

These are trivial to do.

Not allowing same caption as previous image done by the user.

Slightly more complicated because we'd have to store previous captions somewhere. Still fairly easy, but is it useful? Someone giving the exact same caption for multiple images seems unlikely.

I agree that not allowing previous captions seems not super useful validation.

My use case for not allowing previous captions is if the user is copy/pasting the same thing in to each caption, e.g. "A good image for the article." I don't think I have evidence to say this will happen, but I could imagine users doing it. We'll see when we have the caption data from Iteration 1.

MMiller_WMF mentioned this in T293161: Add an image: minimal caption validation.Oct 13 2021, 1:40 AM

MMiller_WMF updated the task description. (Show Details)Oct 13 2021, 1:43 AM

MMiller_WMF edited parent tasks, added: T293711: [EPIC] "Add an image" Iteration 2; removed: T290781: Add an image: captions.Oct 18 2021, 10:29 PM