Page MenuHomePhabricator

Update web UI for increased comment length
Closed, ResolvedPublic

Description

We will soon be able to start turning on the refactored comment storage, allowing comments of up to 1000 Unicode characters. This will be usable by API clients immediately. But everything in the web UI that applies a limit is still going to be applying the old limit of 255 bytes.

When the server-side $wgCommentTableSchemaMigrationStage variable is set to MIGRATION_OLD (0), frontend code should continue to apply the 255-byte limit. When that variable is set to any other value (other expected values are 1, 2, and 3), frontend code should limit things to 1000 Unicode characters. Note that JavaScript and HTML5 maxlength work with UTF-16 code points, e.g. "😭".length returns 2 but MediaWiki will count it as only 1; see also T180497: OO.ui.TextInputWidget's maxLength option limits by UTF-16 code units, not characters as documented.

Note that $wgCommentTableSchemaMigrationStage is temporary, so it should be made easy to search for in relevant code for future removal; naming variables to contain "CommentTableSchemaMigrationStage" or including that string in nearby comments is recommended. Extensions not using "compatibility policy = rel" may want to treat it as 0 when unset and the CommentStore class does not exist, and set to MIGRATION_NEW (3) when unset and the CommentStore class exists.

Places that may need attention for this task include:

  • The edit summary in the edit form, including the JavaScript that adds the byte counter.
  • The edit summary in VE, if it uses different code.
  • The reason field in the protection form.
  • The reason field in the page move form.
  • The reason field in the page deletion form.
  • The reason field in the file deletion form.
  • The reason field in the page undeletion form.
  • The reason field in the revision deletion form.
  • The reason field in the block form.
  • The reason field in the user rights change form.
  • Anything else that creates an entry in the logging table with a user-entered log_comment.

Note that extensions storing comments in their own tables don't need updating for this task, but may want to be updated to take advantage of CommentStore themselves.

Related Objects

Event Timeline

Is there a reason why CommentStore::COMMENT_CHARACTER_LIMIT (the magical "1000" value) is a constant rather than a configuration variable? We should probably get that value into mw.config, rather than change all the hardcoded 255s to hardcoded 1000s.

By "Unicode characters", do you mean "Unicode codepoints"? A "character" is a term that is notoriously hard to define.

Do you think it's important to make the limit 1000 codepoints, rather than 1000 UTF-16 code units? If we made it count UTF-16 code units, we could remove (or at least, stop using) all the code counting byte length, rather than having to adapt it to count codepoint length (or probably, duplicate it).

(…eh, we probably should make it count codepoints, just to make our software excellent, plus sooner or later someone is probably going to want a thousand emojis)

Is there a reason why CommentStore::COMMENT_CHARACTER_LIMIT (the magical "1000" value) is a constant rather than a configuration variable?

Because no one made a case for it to be configurable.

We should probably get that value into mw.config, rather than change all the hardcoded 255s to hardcoded 1000s.

I note that there's no need for it to be a configuration variable to be in mw.config, ResourceLoaderStartUpModule doesn't care how the value is determined.

By "Unicode characters", do you mean "Unicode codepoints"? A "character" is a term that is notoriously hard to define.

Yes. And true enough.

Do you think it's important to make the limit 1000 codepoints, rather than 1000 UTF-16 code units? If we made it count UTF-16 code units, we could remove (or at least, stop using) all the code counting byte length, rather than having to adapt it to count codepoint length (or probably, duplicate it).

Well, the limit already is 1000 codepoints in the code that has been merged (and already activated on Beta Cluster). In PHP we have all strings encoded as UTF-8 and mb_strlen() reports the length in codepoints. To measure the length in UTF-16 code units, we'd have to convert the string to that encoding and then use half the byte length.

And while "codepoints" will still be confusing to end users for situations with combining characters, doing code units will additionally be confusing with generic emojis or languages with characters outside the BMP. You'd still want the counter to indicate that.

Change 409168 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] Allow limiting comment length by characters rather than bytes in JS

https://gerrit.wikimedia.org/r/409168

This commit implements the "backend" part of the changes, and only updates the edit form and the protection form, as an example. Assuming we agree with the approach I took there, the same update can be easily done for the other instances, but I have no plans to work on that myself right now.

Change 409168 merged by jenkins-bot:
[mediawiki/core@master] Allow limiting comment length by characters rather than bytes in JS

https://gerrit.wikimedia.org/r/409168

Change 414732 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Update more forms to limit comments by codepoints rather than bytes

https://gerrit.wikimedia.org/r/414732

Change 415159 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@master] ve.ui.MWSaveDialog: Allow limiting comment length by characters rather than bytes

https://gerrit.wikimedia.org/r/415159

Change 415159 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] ve.ui.MWSaveDialog: Allow limiting comment length by characters rather than bytes

https://gerrit.wikimedia.org/r/415159

Sorry, thought I had +2'ed that but clearly I was imagining it. Will (re?-)review.

Change 414732 merged by jenkins-bot:
[mediawiki/core@master] Update more forms to limit comments by codepoints rather than bytes

https://gerrit.wikimedia.org/r/414732

The 2018 Wikitext-Editor still shows the limit of 255 Characters.

It shows the new limit on the Beta Cluster, e.g. https://en.wikipedia.beta.wmflabs.org/wiki/. The patch for VE should be included in 1.31.0-wmf.24, see https://www.mediawiki.org/wiki/MediaWiki_1.31/Roadmap for the deployment schedule.

Anomie claimed this task.

Let's call this done, since nothing has shown up for the "any other stuff" checkbox.

Cirdan subscribed.

@Anomie There is at least one commonly used comment field left T194588: Increase limit for edit summary in Flagged Revisions to COMMENT_CHARACTER_LIMIT

It might be almost as easy as the patch I suggested, but probably some more changes are required.

Better to use that task for tracking that issue.