Page MenuHomePhabricator

Get the character count instead of the number of bytes
Open, Needs TriagePublicFeature



  • Show character count in "Page information".
  • Use a configuration variable to control diff count type in history page, use byte or character diff count.

In languages other than English, the number of characters is often not equal to the number of bytes and editors care about characters, not bytes.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 4 2020, 10:38 AM
RazeSoldier renamed this task from [Feature request] Get the character count instead of the number of bytes to Get the character count instead of the number of bytes.Feb 4 2020, 10:39 AM
RazeSoldier changed the subtype of this task from "Task" to "Feature Request".

@Aklapper What should I do if I want to advance this request to be approved? As long as people agree with this request, I can write code for this.

Good question... I personally don't see (yet) why to add yet another configuration variable...
How is this related to MediaWiki-ContentHandler or MediaWiki-Page-History ? This sounds like it is about action=info instead?

In my vision, directly add a method similar to getSize() (maybe getCharCount()) in Content implementation to get the number of characters. So I mark MediaWiki-ContentHandler.

I also want to reflect in history page, +10 or -10 will be a number of characters, not a number of bytes. So I mark MediaWiki-Page-History. About "new configuration variable", I wanted to make this an optional feature, after all, not everyone welcomed this change.

The_RedBurn added a subscriber: The_RedBurn.EditedApr 24 2020, 8:07 AM

Great idea! Here's an example of change that ends up having the same size while actually adding characters:

The Spanish article for Spain has a current size of 260 578 bytes but actually only contains 254 162 characters.

And here's an extreme example:

Like @Aklapper, I don't think the configuration variable is actually needed since the only use of the version size is to know how much text it contains. The current size shown is simply an easy and imperfect way to show (an estimation of) it.