Page MenuHomePhabricator

ApiFeatureUsage extension requests full cluster state from elasticsearch
Closed, ResolvedPublic

Description

This doesn't seem like something we want to regularly do. In production this takes ~3s and returns ~30MB of data from elasticsearch to PHP.

Stack trace where this happens:

#0 /vagrant/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Request.php(171): Elastica\Transport\Http->exec(Elastica\Request, array)
#1 /vagrant/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Client.php(662): Elastica\Request->send()
#2 /vagrant/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Cluster.php(55): Elastica\Client->request(string, string)
#3 /vagrant/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Cluster.php(46): Elastica\Cluster->refresh()
#4 /vagrant/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Client.php(498): Elastica\Cluster->__construct(Elastica\Client)
#5 /vagrant/mediawiki/extensions/ApiFeatureUsage/ApiFeatureUsageQueryEngineElastica.php(42): Elastica\Client->getCluster()
#6 /vagrant/mediawiki/extensions/ApiFeatureUsage/ApiFeatureUsageQueryEngineElastica.php(148): ApiFeatureUsageQueryEngineElastica->getIndexNames()
#7 /vagrant/mediawiki/extensions/ApiFeatureUsage/SpecialApiFeatureUsage.php(17): ApiFeatureUsageQueryEngineElastica->suggestDateRange()
#8 /vagrant/mediawiki/includes/specialpage/SpecialPage.php(522): SpecialApiFeatureUsage->execute(NULL)
#9 /vagrant/mediawiki/includes/specialpage/SpecialPageFactory.php(576): SpecialPage->run(NULL)
#10 /vagrant/mediawiki/includes/MediaWiki.php(285): SpecialPageFactory::executePath(Title, RequestContext)
#11 /vagrant/mediawiki/includes/MediaWiki.php(860): MediaWiki->performRequest()
#12 /vagrant/mediawiki/includes/MediaWiki.php(521): MediaWiki->main()
#13 /vagrant/mediawiki/index.php(43): MediaWiki->run()
#14 /var/www/w/index.php(5): include(string)
#15 {main}

Details

Related Gerrit Patches:
mediawiki/extensions/ApiFeatureUsage : masterUse the _aliases endpoint for fast index names retrieval

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 15 2017, 10:44 PM
Anomie added a subscriber: Anomie.Mar 16 2017, 1:51 PM

The data is stored in daily indexes so it can be properly expired. The getIndexNames() call is used to determine which daily indexes are available in two places:

  • So the extension can supply useful default values for the start and end date fields.
  • So it can warn the user when there is no data (i.e. no indexes) for part or all of the requested date range.

Apparently Elastica is requesting much more than just the list of index names when it's asked for the list of index names. If you know of a more efficient way to satisfy the two requirements, please let me know.

I think the fastest way to retrieve a list of index names is to use wildcard expansion and the _aliases endpoint:
curl -XGET elastic1020.eqiad.wmnet:9200/apifeatureusage*/_aliases is very fast and the response size is very small if there are no aliases set for these indices.
The output looks like:

{
  "apifeatureusage-2017.02.14" : {
    "aliases" : { }
  },
  "apifeatureusage-2017.03.04" : {
    "aliases" : { }
  },
[...]

Sadly I'm not sure Elastica has a nice php API for it and we may have to use the request API directly:

$response = $this->getClient()->request( 'apifeatureusage*/_aliases' );
if ( $response->isOK() ) {
        $indexNames = array_keys( $response->getData() );
} else {
        // Error handling
}

Thanks!

Sadly I'm not sure Elastica has a nice php API for it and we may have to use the request API directly:

$response = $this->getClient()->request( 'apifeatureusage*/_aliases' );
if ( $response->isOK() ) {
        $indexNames = array_keys( $response->getData() );
} else {
        // Error handling
}

That looks good enough to me, when I read the first half of your post I was afraid it would come down to making requests with MWHttpRequest.

I'll give it a little more time in case anyone else has comments before I start to code. Or if anyone wants to beat me to it, I note that first line should probably look more like ->request( urlencode( $this->options['indexPrefix'] ) . '*/_aliases' ) and would probably fit right into ApiFeatureUsageQueryEngineElastica::getIndexNames().

Change 343082 had a related patch set uploaded (by DCausse):
[mediawiki/extensions/ApiFeatureUsage] Use the _aliases endpoint for fast index names retrieval

https://gerrit.wikimedia.org/r/343082

Change 343082 merged by jenkins-bot:
[mediawiki/extensions/ApiFeatureUsage] Use the _aliases endpoint for fast index names retrieval

https://gerrit.wikimedia.org/r/343082

Anomie closed this task as Resolved.Mar 17 2017, 3:19 PM
Anomie assigned this task to dcausse.