Page MenuHomePhabricator

Handle conversion between traditional and simplified Chinese
Closed, InvalidPublic

Description

Example: https://zh.wikipedia.org/w/index.php?search=張軒&title=Special%3A搜索&fulltext=1&ns0=1

All search results matches traditional Chinese characters, and even if there is an article with such name in simplified Chinese, it is not displayed in search results.

Event Timeline

TJones subscribed.

Search does already convert Traditional Chinese characters to Simplified. The problem here, as with T257922, is ranking.

I think a significant factor in the poorer-than-expected ranking is the length of the query. For example, searching for either 第4次吉田内阁 (simplified) and 第4次吉田內閣 (traditional) gives the expected article as the first result.

With only two characters in the query, and one of them being the common surname 張/张 (Zhang), it's not too surprising that the other articles that matched the two characters were ranked more highly.

We do take "exact matches" into consideration when ranking, as well as title matches, so exact title matches are ranked highly. So, when searching for , 12 of the top 20 have 張 in the title. With 15, of the top 20 have 张 in the title.

even if there is an article with such name in simplified Chinese, it is not displayed in search results.

It is not displayed in the top 20 search results. In this case, it is 29th. It is also the 3rd suggestion from the completion suggester (the search box in the upper corner). (See results from API).

This kind of ranking can happen even without writing system complexities. Searching for Dog × Cat on English Wikipedia, the exact title match is 7th in the results.

I did notice that there is a message offering to let you create the page even though it exists. I've opened a bug to address that: