Configure MediaWiki to support two Elasticsearch clusters
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Andrew-WMDE
	May 3 2023, 12:28 PM

Description

Support running two Elasticsearch clusters in wbstack/mediawiki
Support the --cluster parameter when running maintenance scripts:
- Update ApiWbStackElasticSearchInit.php and ApiWbStackForceSearchIndex.php in wbstack/mediawiki
- Update CirrusSearch jobs in wbstack/api
Update MediaWiki chart in wbstack/charts to support new environment variables

Patches:

Related Objects
Search...

Status	Assigned	Task
Resolved	Andrew-WMDE	T330998 🟪 Update ElasticSearch to 7.10.2
Resolved	Tarrow	T335854 Configure MediaWiki to support two Elasticsearch clusters
Resolved	Evelien_WMDE	T335891 🟣 Wire up and initialize Elasticsearch 7.10.2 cluster
Resolved	Evelien_WMDE	T335893 🟣 Switch reads over to Elasticsearch 7.10.2 cluster
Resolved	Evelien_WMDE	T335894 🟣 Remove Elasticsearch 6.8.22 cluster

Event Timeline

Andrew-WMDE created this task.May 3 2023, 12:28 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 3 2023, 12:28 PM

Andrew-WMDE added a parent task: T330998: 🟪 Update ElasticSearch to 7.10.2.May 3 2023, 12:28 PM

Andrew-WMDE moved this task from Sprint Backlog to Doing on the Wikibase Cloud (WB Cloud Sprint 19) board.

Andrew-WMDE updated the task description. (Show Details)May 3 2023, 4:51 PM

Andrew-WMDE mentioned this in T335891: 🟣 Wire up and initialize Elasticsearch 7.10.2 cluster.May 3 2023, 5:14 PM

Andrew-WMDE added a subtask: T335891: 🟣 Wire up and initialize Elasticsearch 7.10.2 cluster.

Andrew-WMDE updated the task description. (Show Details)May 4 2023, 3:13 PM

Andrew-WMDE updated the task description. (Show Details)May 4 2023, 4:15 PM

Andrew-WMDE updated the task description. (Show Details)May 5 2023, 4:06 PM

Andrew-WMDE updated the task description. (Show Details)

Andrew-WMDE updated the task description. (Show Details)May 8 2023, 2:26 PM

Evelien_WMDE moved this task from WB Cloud Sprint 19 to WB Cloud Sprint 20 on the Wikibase Cloud board.May 10 2023, 11:44 AM

Evelien_WMDE edited projects, added Wikibase Cloud (WB Cloud Sprint 20); removed Wikibase Cloud (WB Cloud Sprint 19).

Evelien_WMDE moved this task from Sprint Backlog to Doing on the Wikibase Cloud (WB Cloud Sprint 20) board.

Andrew-WMDE updated the task description. (Show Details)May 11 2023, 5:15 PM

Guide on how to smoke test these changes:

Fetch the changes

Checkout the following:

Setup your development environment

Start with a clean environment:

make minikube-delete
make minikube-start
make init-local
make apply-local
make minikube-tunnel

Skaffold the following services:

skaffold run -m mediawiki-139
skaffold run -m api

Configure port-forwarding for both Elasticsearch clusters:

kubectl port-forward elasticsearch-master-0 9200:9200
kubectl port-forward elasticsearch-honey-master-0 9201:9200

(A) Verify the new changes with only one Elasticsearch cluster configured still maintain the status quo

Create a wiki with the name wiki1 via http://www.wbaas.localhost/dashboard
Query the database for wiki1's id and name:

kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'

Verify two new indices (i.e., NAME_general_first, NAME_content_first) were created on the primary Elasticsearch cluster using wiki1's name (2):

curl http://localhost:9200/_cat/indices?v

Create a new item on wiki1 via http://wiki1.wbaas.localhost/wiki/Special:NewItem
Verify docs.count is 1 for wiki1's NAME_content_first (2) entry on the primary Elasticsearch cluster :

curl http://localhost:9200/_cat/indices?v

Verify the following CirrusSearch jobs for wiki1 exit without an error on the primary Elasticsearch cluster:

kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ElasticSearchIndexInit 1 default"
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\QueueSearchIndexBatches 1 default"

Verify the following CirrusSearch job for wiki1 exits with "... Indexed 1 pages on primary. ..." on the primary Elasticsearch cluster:

kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ForceSearchIndex id 1 0 1000 default"

(B) Verify that a wiki can still be deleted with the new changes and only one Elasticsearch cluster configured

Create a wiki with the name wiki2 via http://www.wbaas.localhost/dashboard
Query the database for wiki2's id and name:

kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'

Verify two new indices (i.e., NAME_general_first, NAME_content_first) were created on the primary Elasticsearch cluster using wiki2's name (2):

curl http://localhost:9200/_cat/indices?v

Delete wiki2 via http://www.wbaas.localhost/dashboard
Hard delete wiki2 by running the following:

kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow DeleteWikiDispatcherJob"

Verify wiki2's tables have been removed from the database:

kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'

Verify wiki2's Elasticsearch indices have been deleted:

curl http://localhost:9200/_cat/indices?v

Fetch the changes to enable both Elasticsearch clusters

Checkout the following:

wbaas-deploy/pull/891 | feature(api): add a secondary Elasticsearch cluster

Update your development environment

Skaffold the following services again:

skaffold run -m mediawiki-139
skaffold run -m api

(C) Verify that an existing wiki can use the new secondary Elasticsearch cluster

Verify the secondary Elasticsearch cluster is empty:

curl http://localhost:9201/_cat/indices?v

Verify the following CirrusSearch jobs for wiki1 exit without an error on the secondary Elasticsearch cluster:

kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ElasticSearchIndexInit 1 secondary"
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\QueueSearchIndexBatches 1 secondary"

Verify the following CirrusSearch job for wiki1 exits with "... Indexed 1 pages on secondary. ..." on the secondary Elasticsearch cluster (wait for job queue to run or reload the page):

kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ForceSearchIndex id 1 0 1000 secondary"

Verify wiki1 now has indices on both Elasticsearch clusters:

curl http://localhost:9200/_cat/indices?v
curl http://localhost:9201/_cat/indices?v

Note: Try querying the index directly if the queries above aren't updating with:
curl http://localhost:9201/NAME_content_first/_search

(D) Verify that a new wiki will use both Elasticsearch clusters

Create a wiki with the name wiki3 via http://www.wbaas.localhost/dashboard
Query the database for wiki3's id and name:

kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'

Verify two new indices (i.e., NAME_general_first, NAME_content_first) were created on both Elasticsearch cluster using wiki3's name (2):

curl http://localhost:9200/_cat/indices?v
curl http://localhost:9201/_cat/indices?v

Create a new item on wiki3 via http://wiki3.wbaas.localhost/wiki/Special:NewItem
Verify docs.count is 1 for wiki3's NAME_content_first (2) entry on both Elasticsearch clusters:

curl http://localhost:9200/_cat/indices?v
curl http://localhost:9201/_cat/indices?v

(E) Verify that a wiki can be deleted when both Elasticsearch clusters are used

Delete wiki1 via http://www.wbaas.localhost/dashboard
Hard delete wiki1 by running the following:

kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow DeleteWikiDispatcherJob"

Verify wiki1's tables have been removed from the database:

kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'

Verify wiki1's Elasticsearch indices have been deleted from both clusters:

curl http://localhost:9200/_cat/indices?v
curl http://localhost:9201/_cat/indices?v

Andrew-WMDE updated the task description. (Show Details)May 15 2023, 5:52 PM

There seems to still be a bug in C.2.

This step doesn't always exit cleanly on the first run but it does still create the indices:

kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ElasticSearchIndexInit 1 secondary"

Andrew-WMDE removed Andrew-WMDE as the assignee of this task.May 16 2023, 6:36 PM

Andrew-WMDE updated the task description. (Show Details)

Andrew-WMDE moved this task from Doing to In Review on the Wikibase Cloud (WB Cloud Sprint 20) board.

Evelien_WMDE moved this task from WB Cloud Sprint 20 to Kanban board Q2 2023 on the Wikibase Cloud board.May 24 2023, 12:09 PM

Evelien_WMDE edited projects, added Wikibase Cloud (Kanban board Q2 2023); removed Wikibase Cloud (WB Cloud Sprint 20).

Evelien_WMDE moved this task from To do to In Review on the Wikibase Cloud (Kanban board Q2 2023) board.

dang claimed this task.May 25 2023, 12:17 PM

dang moved this task from In Review to Waiting for Deploy to Staging on the Wikibase Cloud (Kanban board Q2 2023) board.Jun 5 2023, 9:17 AM

Tarrow changed the task status from Open to Stalled.Jun 5 2023, 9:23 AM

Tarrow removed dang as the assignee of this task.

Tarrow added a subscriber: dang.

Andrew-WMDE changed the task status from Stalled to Open.Jun 12 2023, 1:42 PM

Andrew-WMDE moved this task from Waiting for Deploy to Staging to In Review on the Wikibase Cloud (Kanban board Q2 2023) board.

Task was marked stalled both to represent reality: it had been hanging around since Mid may with no movement but also to focus on other work that was inflight at the same time (I think specifically T330389).

Marked it as open again

Tarrow claimed this task.Jun 12 2023, 2:13 PM

Right, unfortunately I'm now off for a few days and therefore am about to un-assign myself from reviewing this. Sorry that it doesn't feel like I've managed to make some great progress on it but there are certainly a couple of thoughts.

There are a couple of comments but I would also add some general ones here. While it does definitely make it harder to write and prepare the patches it would probably be easier to review if these were a little more broken up. Particularly if they were broken up in an order where it felt like merging some of the work was very low risk.

For example I wasn't sure if I could merge the api or MW changes without that then breaking ES functionality. While there would be no change in production to things if there was no corresponding chart or repo change merging some breaking change here would result in us then being unable to push and deploy new api patches.

The other big picture thought I had here was that having the platform api talk directly to ES is an inconvenience that we don't even benefit from that this point since we don't use the delete functionality. Without this it would be possible to encapsulate all the ES config to be set in MW and we wouldn't then need to worry about passing things like cluster names over the wire.

Rosalie_WMDE claimed this task.Jun 19 2023, 11:16 AM

Rosalie_WMDE removed Rosalie_WMDE as the assignee of this task.Jun 21 2023, 1:53 PM

Rosalie_WMDE subscribed.

Tarrow claimed this task.Jun 27 2023, 8:41 AM

We had a look at the api and mediawiki patches and decided that the main thing that made the couple was changing the parameters of the api calls that triggers the elasticsearch jobs to run on mediawiki.

We wondered if instead it would be possible to remove these and have MediaWiki always write to all configured clusters when these apis are called. This would mean that all of the changes that are being made on both sides (api and mediawiki) would not need to happen at the same time (because they probably don't need to happen at all).

When, mid migration, we need to call specific clusters one at a time we could do this in another way (for example, following the "use a kubernetes job" pattern that we are generally aiming towards).

We'll look at adding on (or adjusting?) the mediawiki patch to see what this would look like.

Tarrow mentioned this in T335853: 🟣 Spin up an Elasticsearch 7.10.2 cluster.Jun 28 2023, 10:37 AM

Tarrow moved this task from In Review to Doing on the Wikibase Cloud (Kanban board Q2 2023) board.Jun 29 2023, 9:04 AM

Evelien_WMDE moved this task from Kanban board Q2 2023 to Kanban board Q3 2023 on the Wikibase Cloud board.Jun 30 2023, 1:02 PM

Evelien_WMDE edited projects, added Wikibase Cloud (Kanban board Q3 2023); removed Wikibase Cloud (Kanban board Q2 2023).

Evelien_WMDE moved this task from To do to Doing on the Wikibase Cloud (Kanban board Q3 2023) board.

Tarrow changed the task status from Open to Stalled.Jul 6 2023, 9:36 AM

Documenting some failed updates form dailies over the last few dasy: Marked stalled since we at least want a second cluster setup that is working for all devs before people move further on this. See: T335853

Deniz_WMDE removed Tarrow as the assignee of this task.Jul 10 2023, 11:07 AM

Deniz_WMDE moved this task from Doing to To do on the Wikibase Cloud (Kanban board Q3 2023) board.

Marked open again now that there is a second cluster both on the local environment and in staging

Fring claimed this task.Jul 18 2023, 9:29 AM

Fring moved this task from To do to Doing on the Wikibase Cloud (Kanban board Q3 2023) board.

Pull Requests here:

Fring removed Fring as the assignee of this task.Jul 18 2023, 10:25 AM

Fring subscribed.

Tarrow claimed this task.Jul 18 2023, 4:12 PM

Tarrow removed Tarrow as the assignee of this task.

Tarrow moved this task from In Review to Waiting for Deploy to Staging on the Wikibase Cloud (Kanban board Q3 2023) board.

Fring moved this task from Waiting for Deploy to Staging to Done on the Wikibase Cloud (Kanban board Q3 2023) board.Jul 24 2023, 7:55 AM

Tarrow closed this task as Resolved.Jul 31 2023, 10:46 AM

Tarrow claimed this task.

Andrew-WMDE changed the status of subtask T335891: 🟣 Wire up and initialize Elasticsearch 7.10.2 cluster from Open to Stalled.Oct 26 2023, 8:57 AM

Andrew-WMDE changed the status of subtask T335891: 🟣 Wire up and initialize Elasticsearch 7.10.2 cluster from Stalled to Open.Oct 30 2023, 8:51 AM

Evelien_WMDE closed subtask T335891: 🟣 Wire up and initialize Elasticsearch 7.10.2 cluster as Resolved.Oct 31 2023, 11:17 AM

Configure MediaWiki to support two Elasticsearch clustersClosed, ResolvedPublicActions