Page MenuHomePhabricator

CirrusSearch: Monitor the different between the number of pages in the index and the number of pages in the Special:Statistics
Open, LowestPublic

Description

Monitor the different between the number of pages in the index and the number of pages in the Special:Statistics. This came up because foundationwiki had no pages in its content index for quite some time without us knowing it. We should know about this.

I don't know what the ration between pages in Special:Statistics and the search index ought to be, nor if it would be better to check against some other count. The problem is that those counts are expensive to execute in mysql.


Version: unspecified
Severity: minor
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=58953

Details

Reference
bz58972

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 2:27 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz58972.
bzimport added a subscriber: Unknown Object (MLST).

So we've got a script for this but it needs some tidying up. Would be nice to have this info somewhere so we can tell when the numbers diverge.

I've rolled Cirrus out to more wikis so the script is more useful than before. I've also fixed some math problems so we can get better data.

Maybe alert when we're > 40%? That's almost always a problem on our end. Less than that and you could very easily hit a wiki with a high redirect:page ratio and alert pointlessly.

Bleh, this script isn't all that useful. See for example:

demon@terbium:~$ mwscript extensions/CirrusSearch/maintenance/checkCounts.php zhwikibooks
SiteStats=12273
Elasticsearch=7258
Percentage=51%

But I just finished a force reindexing of this entire wiki to prove my point...this is hard to measure :(

Restricted Application added a subscriber: Aklapper. · View Herald Transcript