Page MenuHomePhabricator

[[Wikimedia:Ia-upload-zip-file-too-large/en]]: MB vs MiB
Closed, DeclinedPublic

Description

Is the "MB" unit really decimal-based (common language millions of bytes, i.e. 10^6=1000*1000) ?
Shouldn't it be "MiB" (binary-based "mebibytes", i.e. 2^20=1024*1024) ?

Decimal units are typically used for network bandwidth or for physical disk drive sizes (used by hardware manufacturers).

But file sizes are software and related to filesystems, based on OS measures that allocate storage for them in powers of 2 and not powers of 10 (including for paging, caching, indexing, data compression...).


URL: https://translatewiki.net/wiki/Wikimedia:Ia-upload-zip-file-too-large/en

Event Timeline

Aklapper added a subscriber: DannyS712.

This string seems to be located in https://tools.wmflabs.org/ia-upload/ / https://wikitech.wikimedia.org/wiki/Tool:IA_Upload hence adding IA Upload.

Also see endless discussions in T54687: Use IEC units (KiB, MiB, etc.) and not SI units (KB, MB) for being technically correct vs being understood by users, basically.

IEC units are standard in the UI of file explorers even if they do not always use the correct unit symbol, or use shorter symbols (like M instead of MB or MiB) for more compact presentations in table views (e.g. in MacOS) or on consoles (e.g. in Linux with "df -h", intended to be "human readable").

Designers should anyway make efforts to specify the correct unit so users don't have to guess what is meant, like here with a "maximum file size" (usually not a problem for sizes in kilobytes or kibibytes, but evident for sizes in mega/giga- or mebi/gibi- bytes): users may think their filesizes should be OK for uploads when they are not on the target system), so these messages should be as precise as possible for the limits.

Aklapper renamed this task from [[Wikimedia:Ia-upload-zip-file-too-large/en]] translation issue to [[Wikimedia:Ia-upload-zip-file-too-large/en]]: MB vs MiB.Dec 17 2019, 10:19 AM

Is the "MB" unit really decimal-based (common language millions of bytes, i.e. 10^6=1000*1000) ?
Shouldn't it be "MiB" (binary-based "mebibytes", i.e. 2^20=1024*1024) ?

I'm a bit confused... where does it say that it's decimal-based? Because it's not; it's binary-based, as you correctly say it should be.

The help text for that translation message is "Warning when the zip file is too large. $1 is the file's size, $2 is the configured maximum (both integers, in MB)". I'd think most people would interpret the 'MB' here as a standard megabyte, i.e. 1024 bytes.

The code in question is as follows:

			$maxSizeInMb = 600;
			$sizeInMb = round( $iaData['files'][$jp2Filename]['size'] / ( 1024 * 1024 ) );
			if ( $sizeInMb > $maxSizeInMb ) {
				$msgParams = [ $sizeInMb, $maxSizeInMb ];
				$warning = $this->app['i18n']->message( 'zip-file-too-large', $msgParams )
					. ' ' . $this->app['i18n']->message( 'watch-log' );
			}

Even though I translated it in French as "Mio" (which the the approved value for mebibytes, i.e. binary), this was reverted to "Mo" which is decimal in French, then I was blocked for that !
Th reason being that "if MB" is used in English, use "Mo" in French, as they assume that "MB" unambiguously indicates decimal in English, which of course is not.

No way to explain that file sizes are measured in binary units (except recent version of MacOS X 10.8+, which makes now the English message then really ambiguous between Windows/Linux and MacOS X).

So such use of "MB" in English should always be documented indicating that "MB" is just "customary" (not so much given MacOS vs. all other OSes for messages intended to users that don't know what is the actual file size limit), and should be understood as a unit in powers of 2, not a power of 10. "customary units" are old things of the past, they have be disrecommanded in many standards, except JEDEC now for flashable NVRAM and SSD devices. IEEE made a clear statement, and ISO too has accepted to endorse the IEEE-proposed prefixes (with lowercase 'i' in the second position of the symbol and "bi" as the second syllable replacing the greek-inherited syllable for SI derived unit multiples).

Once again, Apple creates its own "standard" and is incoherent with its own specifications.

And anyway, the translate interface in TranslateWiki.net does not display enough information for tracking changes in review. You can review many strings and sometimes this will conflict with waht another people made in a few strings, even when we are careful. Don't be surprised that so many strings are translated there and then no longer reviewed (tons of messages not reviewed contain then lot of typos or incoherent terminology, which does not help refining the translations to be precise enough for users, when English is full of traps for its confusion of many terms, for its very freestyle of capitlization, for the absence of distinction between verbs and nouns, and many missing prepositions: we always have to "guess" and need clear understanding of the intent and usage, but documentation of messages frequently don't give any context of use, or constraints of formatting, or the effective parser that will be used for replacable variables or optional templates/keywords that can be added: even in the Wikimedia or Mediawiki namespace, messages are not always parsed using the wikicode parser).

Adding thje documentation or fixing it is also part of the job to track what was really meant and how translated messages will be used.

most people would interpret the 'MB' here as a standard megabyte, i.e. 1024 bytes.

Even if they would not, they'd still be able to upload their file, whether that file is 1000 bytes large or 1024 bytes large. The question to me is rather if average humans trying to use software have any idea what "MiB" is, and if it is more likely that they might have seen "MB" before. Apart from any "technically correct" stuff.

And anyway, the translate interface in TranslateWiki.net does not display enough information for tracking changes in review.

Please stay on-topic in tasks; this has been asked for before. Thanks.

I think that as MediaWiki uses multiples of 1024 for things such as $wgMaxUploadSize (its default of 100 MB), we should do the same here.