Page MenuHomePhabricator

Sitemap doesn't count language variant entries against the url_limit, causing it to be rejected
Open, LowPublic

Description

I am using Manual:GenerateSitemap.php to generate sitemap now.

Is it possible to disable language variants link in sitemap?
For example the sitemap will contain language variants in URL which actually all point to same page.

I only want to keep the link to "http://zh.moegirl.org/Help:DynamicPageList". How?

<url> <loc>http://zh.moegirl.org/Help:DynamicPageList</loc> <lastmod>2013-03-25T01:55:56Z</lastmod> <priority>0.5</priority> </url> <url> <loc>http://zh.moegirl.org/zh-cn/Help:DynamicPageList</loc> <lastmod>2013-03-25T01:55:56Z</lastmod> <priority>0.5</priority> </url> <url> <loc>http://zh.moegirl.org/zh-tw/Help:DynamicPageList</loc> <lastmod>2013-03-25T01:55:56Z</lastmod> <priority>0.5</priority> </url>


Version: 1.22.4
Severity: enhancement

Details

Reference
bz63098

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:03 AM
bzimport set Reference to bz63098.
bzimport added a subscriber: Unknown Object (MLST).
Zoglun created this task.Mar 26 2014, 7:01 AM

Hi zoglun. This does not sound like something is wrong in the code of MediaWiki (a so-called "bug"), but instead like a support request (how to change settings, questions how to do something, etc.). bugzilla.wikimedia.org is only for specific bug reports and enhancement requests.
Please use https://www.mediawiki.org/wiki/Project:Support_desk for support requests to make sure that functionality does not exist; afterwards an enhancement request can be filed in Bugzilla. Thanks!

This is quite confusing, I ask same question at the support dest and people said I should report it here: https://www.mediawiki.org/wiki/Thread:Project:Support_desk/How_to_separate_sitemap_into_series_small_file%3F

Neither Google nor Baidu(largest search engine in China) accept my sitemap due to "over maximum links limitation"

Their is a "$wgDefaultLanguageVariant = "zh-cn";" which allows me to set the default language variant while still keep the variant function. But the GenerateSitemap.php still generate three link for three exactly the same page.

The purpose of sitemaps is to allows search engine collect pages. When these engine refuse to record the links in sitemap.xml, that's a bug. Isn't it?

(In reply to zoglun from comment #2)

I ask same question at the support dest and people said I should report it here

That's very good and information which is already welcome when reporting a ticket here. :)

The phrasing of this ticket as a question makes it sound like you simply do not know if it's possible and have a support question; however the comments in the Support Desk ("I don't think you can split the sitemap up like that") make clear that it's very likely not possible currently and hence a valid feature request.

Nbdd0121 claimed this task.May 22 2016, 8:54 PM
Nbdd0121 added a subscriber: Nbdd0121.

This is a bug indeed. The 50,000 url limit is currently imposed on number of articles, instead of number of entries generated.

Change 290143 had a related patch set uploaded (by Nbdd0121):
Count language variant sitemap entries for url_limit

https://gerrit.wikimedia.org/r/290143

Shizhao moved this task from Backlog to Non-WMF Chinese sites on the Chinese-Sites board.
Ciencia_Al_Poder renamed this task from Separate sitemap into a series of smaller files (for search engines) to Sitemap doesn't count language variant entries against the url_limit, causing it to be rejected.Aug 1 2019, 6:16 PM