Page MenuHomePhabricator

5 character limit is too restrictive for structured data on commons for character-based languages
Open, Needs TriagePublicBUG REPORT

Description

I tried to add Li Wenliang's name in Chinese to the description and caption on this image, but I wasn't allowed to proceed because in mandarin his name is three characters: 李文亮


I agree that a 5 character limit makes sense in English and other languages with a phonetic alphabet, but it is too restrictive for character based languages where a single character can be an entire word.

BTW, I got around this by adding whitespace to the end of the name, which I think is probably also not as desired. :)

Event Timeline

Mvolz created this task.Feb 12 2020, 4:36 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 12 2020, 4:36 AM
Ramsey-WMF added a subscriber: Ramsey-WMF.

Any ideas on this one, Matthias?

matthiasmullie added a comment.EditedFeb 18 2020, 8:59 AM

Changing this would be easy, except that there's probably no limit that makes sense for all languages, and compiling a list of limits that do make sense per language seems not feasible.
And even so, this limit was added to prevent spam (T234756) to some extent - just because Chinese can have meaningful content in 1 character, doesn't make it any more immune from spam.
And there is an (even more strict - 9 characters) abuse filter that will prevent this kind of input, even if we were to lower the character count.

So, realistically, keeping the character limit (so that it continues to block some of the unwanted content) and leaning on our users to come up with clever tricks around it (for when it does prevent meaningful contributions; e.g. by adding whitespace as above) is not a bad option, IMO.

The only other thing I can think of doing would be to soften the character limit. E.g.:

  • only require X number of characters for anonymous users - logged in accounts can add any amount of characters
  • instead of blocking submission of short content in UW, it could just show a warning

It'd slightly open the door for more garbage content, but not too much, IMO - accounts are easily banned and likely won't abuse this.

Thoughts?

I think for languages like Chinese and Japanese, although the caption CAN be shorter than alphabetic languages, it doesn't hurt much to ask for a slightly longer sentence. Essentially what we're requesting the user to do is just provide a little more detail in the caption, which isn't a bad thing regardless of the language. Since community has already set up AbuseFilter rules to combat spam by utilizing rulesets that are even stricter than ours, as Matthias mentioned above, I think we should be encouraging longer captions in general so I'd be inclined to leave things as they are.

Mvolz added a comment.Feb 20 2020, 4:02 PM

So, realistically, keeping the character limit (so that it continues to block some of the unwanted content) and leaning on our users to come up with clever tricks around it (for when it does prevent meaningful contributions; e.g. by adding whitespace as above) is not a bad option, IMO.

I should note this was inconsistent, I think; I think I tried the whitespace trick here and it didn't work, but I tried it again after giving up here on a different screen and it did work. (but maybe I only did it for one field and that was why?)

The only other thing I can think of doing would be to soften the character limit. E.g.:

  • only require X number of characters for anonymous users - logged in accounts can add any amount of characters

I'd personally be happy with this or limiting it to autoconfirmed users

  • instead of blocking submission of short content in UW, it could just show a warning

I'd be happy with this too.

It'd slightly open the door for more garbage content, but not too much, IMO - accounts are easily banned and likely won't abuse this.

Thoughts?