Page MenuHomePhabricator

Get the character count instead of the number of bytes
Open, Needs TriagePublicFeature

Description

Proposal

  • Show character count in "Page information".
  • Use a configuration variable to control diff count type in history page, use byte or character diff count.

Cause
In languages other than English, the number of characters is often not equal to the number of bytes and editors care about characters, not bytes.

Event Timeline

RazeSoldier renamed this task from [Feature request] Get the character count instead of the number of bytes to Get the character count instead of the number of bytes.Feb 4 2020, 10:39 AM
RazeSoldier changed the subtype of this task from "Task" to "Feature Request".

@Aklapper What should I do if I want to advance this request to be approved? As long as people agree with this request, I can write code for this.

In my vision, directly add a method similar to getSize() (maybe getCharCount()) in Content implementation to get the number of characters. So I mark MediaWiki-ContentHandler.

I also want to reflect in history page, +10 or -10 will be a number of characters, not a number of bytes. So I mark MediaWiki-Page-history. About "new configuration variable", I wanted to make this an optional feature, after all, not everyone welcomed this change.

Great idea! Here's an example of change that ends up having the same size while actually adding characters: https://fr.wikipedia.org/w/index.php?title=Ha%C3%AFti&type=revision&diff=169953710&oldid=169807136

The Spanish article for Spain has a current size of 260 578 bytes but actually only contains 254 162 characters.

And here's an extreme example:

Screenshot_2020-04-24 Différences entre versions de « Aide Bac à sable » — Wikipédia.png (255×1 px, 11 KB)

Screenshot_2020-04-24 Historique des versions de « Aide Bac à sable » — Wikipédia(1).png (62×886 px, 12 KB)

Like @Aklapper, I don't think the configuration variable is actually needed since the only use of the version size is to know how much text it contains. The current size shown is simply an easy and imperfect way to show (an estimation of) it.