Page MenuHomePhabricator

Use IEC units (KiB, MiB, etc.) and not SI units (KB, MB)
Open, LowPublicFeature

Description

Author: zefling

Description:
Why Mediawiki dont' use KiB, MiB, etc. ?

https://en.wikipedia.org/wiki/Binary_prefix


Version: 1.22.0
Severity: enhancement

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:02 AM
bzimport set Reference to bz52687.
bzimport added a subscriber: Unknown Object (MLST).

The messages are defined in MessagesEn.php:
'size-bytes' => '$1 B', # only translate this message to other languages if you have to change it
'size-kilobytes' => '$1 KB', # only translate this message to other languages if you have to change it
'size-megabytes' => '$1 MB', # only translate this message to other languages if you have to change it
'size-gigabytes' => '$1 GB', # only translate this message to other languages if you have to change it
'size-terabytes' => '$1 TB', # only translate this message to other languages if you have to change it
'size-petabytes' => '$1 PB', # only translate this message to other languages if you have to change it
'size-exabytes' => '$1 EB', # only translate this message to other languages if you have to change it
'size-zetabytes' => '$1 ZB', # only translate this message to other languages if you have to change it
'size-yottabytes' => '$1 YB', # only translate this message to other languages if you have to change it

This is not an i18n issue. Some languages have opted to use binary prefixes already by translating these messages.

This issue is about "What people are used to" vs. "What is technically correct".

This should probably be discussed on English communities if such a change is wanted. And this might be a really good bikeshed.

Note: SI units are power-of-ten based, IEC units are power-of-two based.
Also see http://en.wikipedia.org/wiki/Binary_prefix

zefling wrote:

Okay, in French version, it's Kio (Kibi-octet), Mio (Mégi-octet). The French community has already decided.

Coming here from https://translatewiki.net/wiki/Thread:Support/File_size_messages .

As the consumers vs. hard disk producers wars clearly show, the vast majority of the population interprets "GB" as "GiB". The binary prefixes, while greatly appreciated and used by enthusiasts like me, are not so widely adopted globally. Some languages and operative systems are more precise and use them.

It's not a trivial decision.

D.U.Thibault wrote:

"This issue is about "What people are used to" vs. "What is technically
correct"."

A tiresome argument. The pseudo-SI symbols are just plain wrong at worst, ambiguous at best. There is absolutely no reason to be wishy-washy about this: no ordinary consumer is going to complain of "not understanding the symbols" once he's been pointed to the abundant literature that explains the issue. People are not stupid, they actually like learning new things.

(In reply to comment #5)

A tiresome argument. The pseudo-SI symbols are just plain wrong at worst,
ambiguous at best. There is absolutely no reason to be wishy-washy about
this:
no ordinary consumer is going to complain of "not understanding the symbols"
once he's been pointed to the abundant literature that explains the issue.
People are not stupid, they actually like learning new things.

Can you clarify if by "pseudo-SI symbols" you mean GB or GiB? Your comment can be read in both ways.

D.U.Thibault wrote:

(In reply to comment #6)

(In reply to comment #5)

A tiresome argument. The pseudo-SI symbols are just plain wrong at worst,
ambiguous at best.

Can you clarify if by "pseudo-SI symbols" you mean GB or GiB? Your comment
can be read in both ways.

Sadly true. I meant KB, GB, etc. Pseudo-SI because they look like SI (the upper case K is wrong) but their values are different.

(In reply to comment #7)

Sadly true. I meant KB, GB, etc. Pseudo-SI because they look like SI (the
upper
case K is wrong) but their values are different.

Thanks for clarifying. For what it's worth, the upper K is a separate issue, and a mistake that not all languages and sectors allow themselves: see https://translatewiki.net/w/i.php?title=Special%3ATranslations&message=MediaWiki%3ASize-kilobytes&namespace=1256

D.U.Thibault wrote:

(In reply to Nemo from comment #8)

For what it's worth, the upper K is a separate issue,
and a mistake that not all languages and sectors allow themselves: see
https://translatewiki.net/w/i.
php?title=Special%3ATranslations&message=MediaWiki%3ASize-
kilobytes&namespace=1256

That link lists the various translations for the "Size-kilobytes" message...whose description clearly states its measured in kibibytes. The resulting mess is awesome. Some languages have the correct kibi symbol, others use an ambiguous K, and some use a plainly wrong k. Not to mention those who use a Greek Chi or some such.

So you know all those languages to nkow what's right and wrong? I remember you we live in a world where some English countries still use ancient era/middle ages units like "foot" and multiply by 12 instead of 10. :)

(In reply to Nemo from comment #11)

So you know all those languages to nkow what's right and wrong? I remember
you we live in a world where some English countries still use ancient
era/middle ages units like "foot" and multiply by 12 instead of 10. :)

To be fair, if we counted in base 12, things would be so much better, as 12 has many more factors than 10, which is a good quality in a unit of measure, since you are more likely to divide things out evenly.


As for the actual proposal - I'm all for changing english to KiB, at least in english, but its not something that I care enough about to have a flamewar over, so I don't exactly want to upload a patch for it ;)

D.U.Thibault wrote:

(In reply to Nemo from comment #11)

So you know all those languages to nkow what's right and wrong? I remember
you we live in a world where some English countries still use ancient
era/middle ages units like "foot" and multiply by 12 instead of 10. :)

Symbols, when backed by IEEE, IEC, ISO or whatever, are translingual. No translation required. The nature of the quantity being reported by the message is *not* influenced by the language it is in.

That may be the ideal, but it is far from true in practice. Most languages using non-latin scripts will at least transliterate the symbols.

gerritbot subscribed.

Change 179450 had a related patch set uploaded (by Nemo bis):
mediawiki.inspect: Use binary prefixes for human sizes

https://gerrit.wikimedia.org/r/179450

Patch-For-Review

Nemo_bis renamed this task from Use IEC units (KiB, MiB, etc.) and not SI units (KB, MB) on file description to Use IEC units (KiB, MiB, etc.) and not SI units (KB, MB).Jan 23 2015, 9:39 PM
Nemo_bis changed the task status from Open to Stalled.
Nemo_bis set Security to None.
Jdforrester-WMF assigned this task to Fomafix.
Jdforrester-WMF edited projects, added Multimedia; removed Patch-For-Review.

Change 179450 merged by jenkins-bot:
mediawiki.inspect: Use binary prefixes for human sizes

https://gerrit.wikimedia.org/r/179450

Nemo_bis changed the task status from Resolved to Invalid.Jan 6 2016, 2:12 PM

The original task is about more than one JavaScript module, this is not resolved. However, there is little gain in keeping it open unless it's clarified what we really want to change.

This is not an i18n issue. Some languages have opted to use binary prefixes already by translating these messages.

Or in other words, please change the translations (including en-GB, I guess, as UK formally uses ISO units nowadays).

Jdforrester-WMF subscribed.

The original task is about more than one JavaScript module, this is not resolved. However, there is little gain in keeping it open unless it's clarified what we really want to change.

This is not an i18n issue. Some languages have opted to use binary prefixes already by translating these messages.

Or in other words, please change the translations (including en-GB, I guess, as UK formally uses ISO units nowadays).

It's not a question of language, though. If the number scale is different then the number format is different too – 5.0 KiB == 5.1 KB. You can't just magically change the output unit string without changing the formatter.

Fomafix subscribed.

@Aklapper Changing the task (232095) name to "Message should use binary prefix symbols (SI units)" is misleading because binary prefixes are not SI. They are IEC.

Why has this been languishing for over three years? All that I'm asking is that the English messages use the unambiguous prefixes or, failing that, that the qqq documentation state which scale is used (powers of 1000 vs powers of 1024). No change is required to the underlying code, for crying out loud.

@Aklapper Changing the task (232095) name to "Message should use binary prefix symbols (SI units)" is misleading because binary prefixes are not SI. They are IEC.

Feel free to edit and rename.

Why has this been languishing for over three years?

See https://www.mediawiki.org/wiki/Bug_management/Development_prioritization

Why has this been languishing for over three years?

There is (or at least has been) opposition in English to use IEC units. This is not a technical issue. The status quo says as long as there isn't (developer) consensus to change it (or if consensus requirement is dropped by delegating the decision to an authority). Translators are free to use correct units, and the base is explained in the message documentation.

Anyone can submit a patch to change the units, but as far as I remember earlier patches have been abandoned or reverted.

https://gerrit.wikimedia.org/r/179450 is not reverted and the messages in the developer module mediawiki.inspect report KiB instead of KB or kB.

Changing the content of the messages 'size-kilobytes', 'size-megabytes', 'size-gigabytes' is bad. Better introduce new messages 'size-kibibytes', 'size-mebibytes', 'size-gibibytes' with the content $1 KiB, $1 MiB, $1 GiB.

Why would you introduce new messages? Then you would need to update call callers separately. They are already documented to use base 2 so they are just using the wrong units.

It is bad to have message keys that doesn't match to the content.

Then this is a good chance to fix this, too.

Change 535171 had a related patch set uploaded (by Fomafix; owner: Fomafix):
[mediawiki/core@master] Use IEC prefixes instead of SI prefixes for byte sizes

https://gerrit.wikimedia.org/r/535171

@Nikerabbit "Translators are free to use correct units, and the base is explained in the message documentation." Not always. https://translatewiki.net/wiki/Intuition:X%27stools-memory/qqq is a good example. It is highly doubtful that this message's parameter is actual megabytes—because it deals with memory. I raised the question in 2017 and the non-answer was that "This message is a part of old Xtools tool, please contribute translations of new Xtools [instead] via Special:Translate/xtools". If the message is that old, why is still in the frigging database?

@Urhixidur This task is about MediaWiki. Issues with X'tools should be handled separately in an appropriate place. The message you linked is not in use anywhere. We keep old messages around because it is more hassle to remove them.

Dans T54687#574233, @Nikerabbit a écrit :

That may be the ideal, but it is far from true in practice. Most languages using non-latin scripts will at least transliterate the symbols.

There's a IEC standard for the multiplier, but "byte" has no standard (even its name in English is confusive, when technically what is meant should be "octet", notably for networking and storage where a byte of data may be coded on more bits, but keeping only 8 significant bits, the other bit(s) being used for error detection or autocorrection in a "mesh encoding", or less bits with compression layers, or variable bits (for clock autoadjustment or for electro-magnetic stabilization or to avoid side effects like resonnance/transductance and protection of other parallel data wires caused by various field effects too easily amplified on long links, creating additional noise that can dramatically reduce the SNR and the usable bandwidth or the lifetime of electronic components like capacitors, or like insulators in flashable NAND cells and more generally field-effect transitors).

Once the new messages are in, the instances of use of the old messages will need to be fixed. For instance, page sizes listed by Special:ListFiles are not in kilo/mega/giga/etc. bytes, they are in kibi/mebi/gibi/etc. bytes. Users would appreciate having a preference switch in their user settings that allows them to see file sizes using either set of prefixes.

  • MediaWiki:Size-kilobytes
  • MediaWiki:Size-megabytes
  • MediaWiki:Size-gigabytes
  • MediaWiki:Size-terabytes
  • MediaWiki:Size-petabytes
  • MediaWiki:Size-exabytes
  • MediaWiki:Size-zetabytes this should be renamed as MediaWiki:Size-zettabytes
  • MediaWiki:Size-yottabytes

The following set of messages also needs to be fixed:

  • MediaWiki:Size-kilopixel
  • MediaWiki:Size-megapixel
  • MediaWiki:Size-gigapixel
  • MediaWiki:Size-terapixel
  • MediaWiki:Size-petapixel
  • MediaWiki:Size-exapixel
  • MediaWiki:Size-zetapixel this should be renamed as MediaWiki:Size-zettapixel
  • MediaWiki:Size-yottapixel

The *pixel messages should also be made plural for consistency's sake (e.g. .MediaWiki:Size-kilopixels, etc.).

Followup stuff would be a separate followup task...

Users would appreciate having a preference switch in their user settings

@Urhixidur: Citation needed. :) No, "users" would not appreciate having a preference switch.

I agree keeping those unchanged as SI units:

  • MediaWiki:Size-kilobytes
  • MediaWiki:Size-megabytes
  • MediaWiki:Size-gigabytes
  • MediaWiki:Size-terabytes
  • MediaWiki:Size-petabytes
  • MediaWiki:Size-exabytes
  • MediaWiki:Size-zetabytes this should be renamed as MediaWiki:Size-zettabytes
  • MediaWiki:Size-yottabytes

But we should also have separate translations for IEC units:

  • MediaWiki:Size-kibibytes
  • MediaWiki:Size-mebibytes
  • MediaWiki:Size-gibibytes
  • MediaWiki:Size-tebibytes
  • MediaWiki:Size-pebibytes
  • MediaWiki:Size-exbibytes
  • MediaWiki:Size-zebibytes
  • MediaWiki:Size-yobibytes

And then the other software components can be fixed to use the proper units that they really compute (and if they compute using powers of 1024, not 1000) notably for file sizes and memory sizes, then IEC units should be used. For network speeds, SI units should be used.

IEC units do not seem to have any uses for sizes in pixels.

For hardware storage, there's no single unit because they depend of other factors, notably their reliability design for a given MTBF, so there are varaible amounts of space used for data synchronization (on mechanical storages, magnetic and/or optic), error correction and remapped sectors (plus an internal reserved space for this remapping used by the firmware, and also a part also used to store the firmware itself, which may be updated, and that could even change the reported available storage size: it could shriink over time, or could grow again after a low-level reformatting to erase and cleanup errors and improve again the synchronization and reliability after some internal measurements; this affects both mechanical storages, and solid-state storage in NAND/NOR chips or other technologies).

Below filesystems but above physical storage there are other software features (notably for RAID management) where there are additional space used to control the data layout and improve the reliability, as well as reserved space for management of the storage space (notably the partition schemes or reserved space to store and change the effective layout possibly dynamically): some partitioning systems use a lot of reserved space (it can be up to 20%, and this tends to grow with larger storage spaces; RAID itself can add up to 33% or 67% reserved for one ot two "parity columns" for resilience, at the price of lower performance, and up to 50 or 67% for mirroring; all these rates are cumulative!). So finally the usable storage space (for data files) are generally much below the total capacity of the hardware storage.

Indicating any "size" without giving the precision of the method od measurement makes no sense. There's no single "size" measurement.

I created T283958 for renaming zeta by zetta.

Change 697099 had a related patch set uploaded (by Fomafix; author: Fomafix):

[mediawiki/extensions/MediaUploader@master] Use IEC prefixes instead of SI prefixes for byte sizes

https://gerrit.wikimedia.org/r/697099

Change 697100 had a related patch set uploaded (by Fomafix; author: Fomafix):

[mediawiki/extensions/UploadWizard@master] Use IEC prefixes instead of SI prefixes for byte sizes

https://gerrit.wikimedia.org/r/697100

Change 697101 had a related patch set uploaded (by Fomafix; author: Fomafix):

[mediawiki/extensions/MediaSearch@master] Use IEC prefixes instead of SI prefixes for byte sizes

https://gerrit.wikimedia.org/r/697101

Change 697104 had a related patch set uploaded (by Fomafix; author: Fomafix):

[mediawiki/extensions/MultiUpload@master] Use IEC prefixes instead of SI prefixes for byte sizes

https://gerrit.wikimedia.org/r/697104

Change 697105 had a related patch set uploaded (by Fomafix; author: Fomafix):

[mediawiki/extensions/PerformanceInspector@master] Use IEC prefixes instead of SI prefixes for byte sizes

https://gerrit.wikimedia.org/r/697105

Change 697126 had a related patch set uploaded (by Fomafix; author: Fomafix):

[mediawiki/extensions/WikibaseMediaInfo@master] Use IEC prefixes instead of SI prefixes for byte sizes

https://gerrit.wikimedia.org/r/697126

Change 697180 had a related patch set uploaded (by Fomafix; author: Fomafix):

[mediawiki/core@master] Use IEC prefixes instead of SI prefixes for byte sizes

https://gerrit.wikimedia.org/r/697180

Change 697180 merged by jenkins-bot:

[mediawiki/core@master] Use IEC prefixes instead of SI prefixes for byte sizes (docs+backend)

https://gerrit.wikimedia.org/r/697180

Change 697126 abandoned by Fomafix:

[mediawiki/extensions/WikibaseMediaInfo@master] Use IEC prefixes instead of SI prefixes for byte sizes

Reason:

resources/mediasearch-vue/mixins/searchResult.js was removed in cdc1ac5145128faf1ab05b532435d24a729f1520.

https://gerrit.wikimedia.org/r/697126

Change 697105 abandoned by Krinkle:

[mediawiki/extensions/PerformanceInspector@master] Use IEC prefixes instead of SI prefixes for byte sizes

Reason:

Extension no longer used. Might be archived or taken over if there is interest.

https://gerrit.wikimedia.org/r/697105

Change 697104 abandoned by Fomafix:

[mediawiki/extensions/MultiUpload@master] Use IEC prefixes instead of SI prefixes for byte sizes

Reason:

The extension MultiUpload is ARCHIVED (https://phabricator.wikimedia.org/T268667).

https://gerrit.wikimedia.org/r/697104

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:13 AM