Page MenuHomePhabricator

The maintenance script generateSitemap.php should only show pages that are of content model wikitext
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Generate a sitemap using generateSitemap.php

What happens?:

  • JSON pages are included in the sitemap. In some wikis, there might be a lot of them, and sometimes they have precedence in the search engine results which is rarely what we seek.

What should have happened instead?:

  • Only pages that are wikitext should be included

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

MW1.43

Other information (browser name/version, screenshots, etc.):

The select should read:

	private function getPageRes( $namespace ) {
		return $this->dbr->newSelectQueryBuilder()
			->select( [ 'page_namespace', 'page_title', 'page_touched', 'page_is_redirect', 'pp_propname' ] )
			->from( 'page' )
			->leftJoin( 'page_props', null, [ 'page_id = pp_page', 'pp_propname' => 'noindex' ] )
			->where( [ 'page_namespace' => $namespace ] )
			->where( [ 'page_content_model' => 'wikitext' ] )
			->caller( __METHOD__ )->fetchResultSet();		
	}

Event Timeline

Hello, is there something I can do to help moving this forward ? Would it help providing a patch or any complementary information?

Thanks for taking a look at the code. You are very welcome to use a Developer Account to submit the proposed code changes as a Git branch directly into Gerrit which makes it easier to review and provide feedback. If you don't want to set up Git/Gerrit, you can also use the Gerrit Patch Uploader. Thanks again.