Page MenuHomePhabricator

UploadWizard requires description at least 5 characters long, does not work well in languages like Japanese or Korean
Open, Needs TriagePublic

Description

As reported on Commons:Upload_Wizard_feedback#short_descriptions

Latest updates force users to enter descriptions of at least five characters, which are often unnecessary in languages like Chinese and Korean. How about transforming this filter into a warning that users could override?

I could confirm than indeed, entering a short description in Japanese triggers:

WARNING: This entry is too short. Please make sure this entry is at least 5 characters.

Event Timeline

Restricted Application added subscribers: revi, Aklapper. · View Herald Transcript

In languages that use ideograms, e.g. Chinese and Japanese, and Korean with its unique alphabet, five characters are often more than enough. For example, I often upload closeups, which serve as nothing but merely portraits. The context becomes irrelevant since the photos zoom in on them. I want to be as concise as possible so I write down only their names as descriptions.

The filter only applies to UploadWizard. Uploads via f2c, v2c, etc. and modifying description pages are not affected. It is true that I could change description however I prefer after uploads, but it'd be better if the filter could be turned into a warning that users could override, or a filter of 5/6 bytes (one ideogram takes up 3).

Examples:
https://commons.wikimedia.org/wiki/File:Sharon_Kwok_20140503_1.png
https://commons.wikimedia.org/wiki/File:Joyce_Cheng_20180908.png

I realised this dumb filter could be circumvented by inserting whitespaces. For example, 陳大明 is not allowed but 陳 大 明 is ok. I'm just gonna bypass it this way if I have to.

Change #1121724 had a related patch set uploaded (by Mdaniels5757; author: Mdaniels5757):

[mediawiki/extensions/UploadWizard@master] Use bytes for minimum caption and description lengths

https://gerrit.wikimedia.org/r/1121724

Change #1121724 abandoned by Mdaniels5757:

[mediawiki/extensions/UploadWizard@master] Use bytes for minimum caption and description lengths

https://gerrit.wikimedia.org/r/1121724

(Moving the conversation here from gerrit, as the patch was abandoned):

Displaying the number of bytes to users is actively-terrible and a lot of the reason this was done this way. We could change the limit to 4 characters, or have the limit at (8 bytes AND 5 characters) or whatever, but talking about bytes to users is not a good idea.

Different languages have radically different numbers of bytes-to-characters/-ideograms/-glyphs, and for some it's not "stable" (different characters have different numbers of bytes). Can we get consensus about what we want to do here in a way that doesn't only privilege people speaking the major languages like Chinese and English?

Here's a simple scenario.

is it acceptable to caption https://commons.wikimedia.org/wiki/File:Banana-Single.jpg "banana"?

then the filter should be set to allow captioning it "蕉".

how to achieve that? I dont know. But that's the principle by which the filter should apply.

(8 bytes AND 5 characters)

do you mean OR?
the minimum allowed should be 3 bytes (which is 1 kanji). 8 bytes are too many.

and the simplest solution is to tell the user "the given caption is too short" without saying the exact number. 3 bytes (3 ascii letters) is almost impossible to fail. very few meaningful words are shorter than that.