This is a realllllllly old bug related to languages with variants (e.g. Chinese) which is still not solved!!!! I reported it at-least three years ago though the Bugzilla system with another account.
This bug affect almost every site using Mediawiki with Chinese/Gan/Inuktitut/Kazakh/Kurdish/Serbian/Tachelhit/Tajik/Uzbek. Search engine will index random language variants link from the sitemap for same page.
Following is an example segment of https://zh.moegirl.org/sitemap/sitemap-zhmoegirl-NS_0-0.xml.gz . The 1st one is the canonical URL, and the rest four are variants links.
Mediawiki generate all five link for same page "Bios"with same priority 1. Search engine then take one link from these five randomly, which usually would be a language variants (4/5 chance) and cause users using language variant A, can't read the article in B.
<url> <loc>https://zh.moegirl.org/Bios</loc> <lastmod>2016-11-25T04:20:47Z</lastmod> <priority>1.0</priority> </url> <url> <loc>https://zh.moegirl.org/zh-hans/Bios</loc> <lastmod>2016-11-25T04:20:47Z</lastmod> <priority>1.0</priority> </url> <url> <loc>https://zh.moegirl.org/zh-hant/Bios</loc> <lastmod>2016-11-25T04:20:47Z</lastmod> <priority>1.0</priority> </url> <url> <loc>https://zh.moegirl.org/zh-cn/Bios</loc> <lastmod>2016-11-25T04:20:47Z</lastmod> <priority>1.0</priority> </url> <url> <loc>https://zh.moegirl.org/zh-tw/Bios</loc> <lastmod>2016-11-25T04:20:47Z</lastmod> <priority>1.0</priority> </url>
- This cause extreme damage to the user experience. For example. Transitional Chinese reader were not able to read Simplified Chinese, especially the youth generation. Meanwhile, Simplified Chinese reader could only understand part of Transitional Chinese. It's like you are searching for simple English article titled: My Little Pony Friendship Is Magic and Wikipedia give you article wrote in Hebrew.
For a simple and quick easy fix. We could simply remove all language variant links from the sitemap. Only keep the canonical URL. Since Mediawiki can detect users' language setting and provide proper page. (if the build in detect does not work then extension UniversalLanguageSelector can be use)
For a more proper fix, the priority of language variant links should be reduced relevant to the original link. However, there is at-least one other bug T108443 need fix to get proper URL indexed .