Page MenuHomePhabricator

Byte size not plurialized in search results with interface in French
Open, LowPublic

Description

How to reproduce:
Run a wiki search, so that some of the pages are less than 1 kio (example of search).

Current:
Size is displayed as « 200 octet ».

Expected:
Size should be displayed as « 200 octets ».

As a reminder, in French, zero is singular. So the « 0 octet » texts are correct.

The message is Search-result-size, and it should be accompanied with Size-bytes. The messages seems to be correct… so maybe the error is somewhere when determining if the value is plural.

Also note the issue happens only in French. There is no such issue when displaying in English or Spanish, for instance.

Code searches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel triaged this task as Low priority.Mar 20 2024, 1:38 PM
Gehel moved this task from needs triage to Current work on the Discovery-Search board.
Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Search.
EBernhardson added a project: I18n.
EBernhardson subscribed.

The interface message as provided above is search-result-size. The english version, provided by dev, is as follows:

$1 ({{PLURAL:$2|1 word|$2 words}})

This renders as:

1 KB (187 words)

The french translation, provided by translatewiki:

$1 ($2⎵mot{{PLURAL:$2||s}})

And renders as

207 octet (23 mots)

In both cases the initial bit is not provided directly by the translation message, instead we are passing a sizeParameter in $1 and the language handling is formatting the size as appropriate for the language. This formatter receives a number of bytes and rounds to an appropriate display magnitude. Example usage:

$ mwscript shell.php --wiki=testwiki
Psy Shell v0.12.0 (PHP 7.4.33 — cli) by Justin Hileman                                     
> $langFactory = MediaWiki\MediaWikiServices::getInstance()->getLanguageFactory();
= MediaWiki\Languages\LanguageFactory {#775}

> $msg = new MediaWiki\Message\Message( 'example', [], $langFactory->getLanguage( 'fr' ) );
= MediaWiki\Message\Message {#8774}

> sudo $msg->message = '$1'
= "$1"

> $msg->sizeParams( [ 200 ] )->text();
= "200 octet"

So this is a general problem with how byte sizes are formatted in French. English in comparison pluralizes bytes:

> $msg = new MediaWiki\Message\Message( 'example', [], $langFactory->getLanguage( 'en' ) );
= MediaWiki\Message\Message {#6696}

> sudo $msg->message = '$1'
= "$1"

> $msg->sizeParams( [ 200 ] )->text();
= "200 bytes"

These look to resolve into the size-bytes message. Curiously those do format as expected:

> (new MediaWiki\Message\Message( 'size-bytes', [], $langFactory->getLanguage( 'en' ) ))->params( [ 200 ] )->text()
= "200 bytes"

> (new MediaWiki\Message\Message( 'size-bytes', [], $langFactory->getLanguage( 'fr' ) ))->params( [ 200 ] )->text()
= "200 octets"

This simplifies down into Language::formatSize. Querying for single and plural exposes what is happening here. English is always plural, french is never plural.

> $langFactory->getLanguage( 'fr' )->formatSize( 200 );
= "200 octet"

> $langFactory->getLanguage( 'fr' )->formatSize( 1 );
= "1 octet"

> $langFactory->getLanguage( 'en' )->formatSize( 200 );                                                                                                                                                                                        
= "200 bytes"

> $langFactory->getLanguage( 'en' )->formatSize( 1 );
= "1 bytes"

The underlying problem seems to be that Language::formatComputingNumbers works roughly as follows. In our case $msg = 'size-bytes'. The size-bytes message includes a plural handler, but this code skips the normal handling and doesn't end up invoking the plural.

$text = $this->msg( $msg )->text();
return str_replace( '$1', $this->formatNum( $size ), $text );

I suspect this is some form of optimization that was applied? I would typically expect this code to do the following:

return $this->msg( $msg )->numParams( [ $size ] )->text();

In general, not a search specific problem. The language team will have to weigh in on their preferred fix.

Gehel added subscribers: Nemo_bis, Gehel.

Removing Search Platform as this isn't directly related to Search.

@Nemo_bis: is this something you could help with?

Someone could probably also ask Language-Team to investigate, they exist :)