Steps to replicate the issue (include links if applicable):
- Generate a sitemap using generateSitemap.php
What happens?:
- JSON pages are included in the sitemap. In some wikis, there might be a lot of them, and sometimes they have precedence in the search engine results which is rarely what we seek.
What should have happened instead?:
- Only pages that are wikitext should be included
Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):
MW1.43
Other information (browser name/version, screenshots, etc.):
The select should read:
private function getPageRes( $namespace ) { return $this->dbr->newSelectQueryBuilder() ->select( [ 'page_namespace', 'page_title', 'page_touched', 'page_is_redirect', 'pp_propname' ] ) ->from( 'page' ) ->leftJoin( 'page_props', null, [ 'page_id = pp_page', 'pp_propname' => 'noindex' ] ) ->where( [ 'page_namespace' => $namespace ] ) ->where( [ 'page_content_model' => 'wikitext' ] ) ->caller( __METHOD__ )->fetchResultSet(); }