Page MenuHomePhabricator

Chinese Language Converter is not working in the sidebar table of the contents in Vector-2022
Open, HighPublicBUG REPORT

Assigned To
None
Authored By
50829
Apr 26 2022, 3:46 AM
Referenced Files
Restricted File
Wed, Jun 22, 3:21 PM
Restricted File
Wed, Jun 22, 3:21 PM
F35069163: image.png
Apr 26 2022, 10:38 PM
F35069122: image.png
Apr 26 2022, 9:45 PM
F35069119: image.png
Apr 26 2022, 9:45 PM

Description

What happens?:
I find that Chinese Traditional and Simplified Conversion doesn't work in the sidebar table of the contents like the following screenshot. Could somebody fix it? Thanks a lot.

Request URL: this

image.png (565×1 px, 120 KB)

What should have happened instead?:
The Chinese characters in the sidebar table of the contents should be converted like those in articles.

Event Timeline

50829 renamed this task from Chinese Traditional and Simplified Conversion in to Chinese Traditional and Simplified Conversion in the sidebar table of the contents.Apr 26 2022, 3:47 AM

Hi @50829, thanks for taking the time to report this! Please provide a full link to a page where a problem can be seen by others.

StevenSun renamed this task from Chinese Traditional and Simplified Conversion in the sidebar table of the contents to Chinese Language Converter is not working in the sidebar table of the contents.Apr 26 2022, 7:53 AM

Not sure if this is relevant to T295187: Chinese conversion no longer work in the table of content, same issue about ToC?

image.png (939×471 px, 47 KB)
image.png (573×478 px, 17 KB)
Chinese WikipediaMeta-Wiki
Stang renamed this task from Chinese Language Converter is not working in the sidebar table of the contents to Chinese Language Converter is not working in the sidebar table of the contents in Vector-2022.Apr 26 2022, 10:34 PM
Stang updated the task description. (Show Details)
Stang changed the subtype of this task from "Task" to "Bug Report".

Change 787842 had a related patch set uploaded (by Func; author: Func):

[mediawiki/skins/Vector@master] SkinVector22: Apply language converter on section titles

https://gerrit.wikimedia.org/r/787842

@Jdlrobson Does the new ToC use Parsoid? IIRC Parsoid is not able to handle Chinese variant conversion properly now.

@Diskdance no it's using the same code as before, so in theory it should be working exactly the same as the old table of contents. I can see that it isn't though (and can replicate that locally):

Notice 问 in Vector skin but 問 in vector 2022:

Vector 2022Vector
{F35264673,size=full}{F35264675,size=full}

The headings seem to be working fine.

I've marked this as a blocker for further deployment to make sure this gets looked at.

Jdlrobson added a subscriber: JMcLeod_WMF.

@JMcLeod_WMF this is an issue in the parser so I believe the content transform team are best to fix this. If not please let me know which team we should talk to.

The language converter that runs on the legacy TOC here: https://github.com/wikimedia/mediawiki/blob/master/includes/parser/ParserOutput.php#L487

It doesn't run on getSections so should presumably also run here with appropriate refactoring/abstraction https://github.com/wikimedia/mediawiki/blob/master/includes/parser/ParserOutput.php#L730
e.g.

public function getSections() {
        $services = MediaWikiServices::getInstance();
        $languageFactory =
            $services->getLanguageFactory();
        $languageConverterFactory =
            $services->getLanguageConverterFactory();
        // T303329: this should migrate out of extension data
        $langCode = $this->getExtensionData( 'core:target-lang' )
            // This is a temporary fallback while the ParserCache fills
            ?? $services->getContentLanguage()->getCode();
        $langConv = $languageConverterFactory->getLanguageConverter(
            $languageFactory->getLanguage( $langCode )
        );
        $variant = $this->getExtensionData( 'core:target-lang-variant' )
            // This is a temporary fallback while the ParserCache fills
            ?? $langConv->getPreferredVariant();
		return array_map( static function ( $item ) use ( $langConv, $variant ) {
			$item['line'] = $langConv->convertTo( $item['line'], $variant ); 
			return $item;
		}, $this->mSections );
	}

Thanks, @Jdlrobson. We'll (re-)review based on your notes. @ssastry, any thoughts on this in light of Jon's comments?

Getting converted TOC by converting section names independently from the content will produce unexpected results, especially for inline conversion rules.

For example:

-{H|through=>zh-hant:via;}-
== through ==
through (ˈvīə) means...

-{H|through=>zh-hant:pass;}-
== pass ==
through (pas) means...

should output the result in zh-hant:

== via ==
via (ˈvīə) means...

== pass ==
pass (pas) means...

but will become like this:

== via ==
via (ˈvīə) means...

== via ==
via (pas) means...

or this:

== pass ==
pass (ˈvīə) means...

== pass ==
pass (pas) means...

if converted independently.


CC:
@Jdlrobson
@JMcLeod_WMF

@Jdlrobson, the Content-Transform-Team tag is to capture incoming tasks from people who may not know which component the task applies to. The Content Transform Team retains responsibility for (and visibility of) this task via the MediaWiki-Parser tag, so I am going to untag Content-Transform-Team for now.

@Jdlrobson, @SLopes-WMF and I are still formalizing the team processes; once that is done, we will document this.

@Winston_Sung is this information available during the parser or does it have to happen after?
Another place this conversion could happen is in construction of tocraw. I'm guessing currently there is a bug in the API too? (e.g. are the sections here converted? https://zh.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&variant=zh-cn&page=Wikipedia%3A%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88%2F%E6%B1%82%E5%8A%A9&prop=sections)
https://github.com/wikimedia/mediawiki/blob/master/includes/parser/Parser.php#L4447