Page MenuHomePhabricator

Renable elasticsearch on wikis that have had it disabled
Open, Needs TriagePublic

Description

This appears to have happened due to events that should be prevented by resolving T333559.

Currently there are 46:

WikiSetting::where(['name' => "wwExtEnableElasticSearch", 'value' => 0])->with('wiki')->get()->pluck('wiki')->where('deleted_at', null)->count()
=> 46

Event Timeline

I believe that the snipped I posted above did not successfully filter out deleted wikis. Manually inspecting the output of WikiSetting::where(['name' => "wwExtEnableElasticSearch", 'value' => 0])->with('wiki')->get()->pluck('wiki')->where('deleted_at', null) I currently only see 7.

list of affected wikis

  • magnus.wikibase.cloud
  • sciencefictionhistory.wikibase.cloud
  • danish-elite-network.wikibase.cloud
  • nigeria-ethnic-groups.wikibase.cloud
  • agschemas.wikibase.cloud
  • plant-collective.wikibase.cloud
  • gridobjecttwo.wikibase.cloud

Currently the re-indexing fails for the wikis above, while it runs smoothly for for example my test wiki:

$ kubectl exec -ti deployments/api-app-backend -- php artisan job:dispatchNow CirrusSearch\\ForceSearchIndex domain wmde-test-deer.wikibase.cloud 0 1000
[2023-04-27 10:31:07] production.INFO: App\Jobs\CirrusSearch\ForceSearchIndex::handleResponse: Finished batch! Indexed 13 pages. From id 0 to 1000  

While it fails for other test wikis from us (rose-collection.wikibase.cloud)

I can't see an error message in the api pod logs, and also no entry in the DB (apidb.failed_jobs). It fails with a fairly non-descriptive error message that we are used to (due to T308122): https://phabricator.wikimedia.org/P47288

I also created a new wiki called test-wmde-deer-230427.wikibase.cloud, for which it failed at first too, but after re-running it after some minutes it succeeded.

I noticed that the 7 user wikis mentioned above were all created in March 2023 but that's probably irrelevant.

By running the laravel Job ForceSearchIndex via this k8s job script forceSearchIndexFrom.sh we could see an error message in the logs:

 $ kubectl logs force-search-index-from-jvzhn-77kw4 
[mwdb_2b5a928b9b-mwt_dceea3d363_] index(es) do not exist. Did you forget to run updateSearchIndexConfig?

By running the laravel Job ForceSearchIndex via this k8s job script forceSearchIndexFrom.sh we could see an error message in the logs:

 $ kubectl logs force-search-index-from-jvzhn-77kw4 
[mwdb_2b5a928b9b-mwt_dceea3d363_] index(es) do not exist. Did you forget to run updateSearchIndexConfig?

Funnily enough, if we run the ElasticSearchIndexInit Job (which runs the updateSearchIndexConfig script), it fails because the index already exists.

We assume that the ES indices for these wikis are in an inconsistent state that our application logic can't use.
If we delete the index (via ES REST API) we end up in another inconsistent state, because (assumption:) CirrusSearch needs to be re-initialized too somehow.

Note that the following procedure worked fine earlier today for rose-collection.wikibase.cloud, but not for magnus.wikibase.cloud or sciencefictionhistory.wikibase.cloud:

  • WBS_FORCE_SEARCH_INDEX_FROM=2023-01-01T00:00:00Z WBS_DOMAIN=rose-collection.wikibase.cloud ./loadElasticsearchAndRunMWJobs.sh
  • WBS_DOMAIN=rose-collection.wikibase.cloud ./forceSearchIndexFrom.sh

I earlier wrote that the creation date of these wikis is probably irrelevant, but I noticed later today that it falls into a time where we had some trouble with ES and also upgraded mediawiki 1.37 to 1.38. Not sure if that's important but I wanted to note it here.

reindexing perotchelibou.wikibase.cloud faills as well, note that this wiki was created in 2022 and had no edits since August 2022. This made me think, this is maybe not a problem for wikis edited or created during the mediawiki upgrade period, that is march 2023.

Evelien_WMDE added a subscriber: Evelien_WMDE.

First moving to ES7, then reevaluating if this is still a problem / needs further investigation.
In addition, need to analyze which Wikibases are affected.