Page MenuHomePhabricator

Configure MediaWiki to support two Elasticsearch clusters
Closed, ResolvedPublic

Description

  • Support running two Elasticsearch clusters in wbstack/mediawiki
  • Support the --cluster parameter when running maintenance scripts:
    • Update ApiWbStackElasticSearchInit.php and ApiWbStackForceSearchIndex.php in wbstack/mediawiki
    • Update CirrusSearch jobs in wbstack/api
  • Update MediaWiki chart in wbstack/charts to support new environment variables

Patches:

Event Timeline

Guide on how to smoke test these changes:

Fetch the changes

Checkout the following:

Setup your development environment
  1. Start with a clean environment:
make minikube-delete
make minikube-start
make init-local
make apply-local
make minikube-tunnel
  1. Skaffold the following services:
skaffold run -m mediawiki-139
skaffold run -m api
  1. Configure port-forwarding for both Elasticsearch clusters:
kubectl port-forward elasticsearch-master-0 9200:9200
kubectl port-forward elasticsearch-honey-master-0 9201:9200
(A) Verify the new changes with only one Elasticsearch cluster configured still maintain the status quo
  1. Create a wiki with the name wiki1 via http://www.wbaas.localhost/dashboard
  2. Query the database for wiki1's id and name:
kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'
  1. Verify two new indices (i.e., NAME_general_first, NAME_content_first) were created on the primary Elasticsearch cluster using wiki1's name (2):
curl http://localhost:9200/_cat/indices?v
  1. Create a new item on wiki1 via http://wiki1.wbaas.localhost/wiki/Special:NewItem
  2. Verify docs.count is 1 for wiki1's NAME_content_first (2) entry on the primary Elasticsearch cluster :
curl http://localhost:9200/_cat/indices?v
  1. Verify the following CirrusSearch jobs for wiki1 exit without an error on the primary Elasticsearch cluster:
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ElasticSearchIndexInit 1 default"
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\QueueSearchIndexBatches 1 default"
  1. Verify the following CirrusSearch job for wiki1 exits with "... Indexed 1 pages on primary. ..." on the primary Elasticsearch cluster:
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ForceSearchIndex id 1 0 1000 default"
(B) Verify that a wiki can still be deleted with the new changes and only one Elasticsearch cluster configured
  1. Create a wiki with the name wiki2 via http://www.wbaas.localhost/dashboard
  2. Query the database for wiki2's id and name:
kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'
  1. Verify two new indices (i.e., NAME_general_first, NAME_content_first) were created on the primary Elasticsearch cluster using wiki2's name (2):
curl http://localhost:9200/_cat/indices?v
  1. Delete wiki2 via http://www.wbaas.localhost/dashboard
  2. Hard delete wiki2 by running the following:
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow DeleteWikiDispatcherJob"
  1. Verify wiki2's tables have been removed from the database:
kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'
  1. Verify wiki2's Elasticsearch indices have been deleted:
curl http://localhost:9200/_cat/indices?v
Fetch the changes to enable both Elasticsearch clusters

Checkout the following:

Update your development environment

Skaffold the following services again:

skaffold run -m mediawiki-139
skaffold run -m api
(C) Verify that an existing wiki can use the new secondary Elasticsearch cluster
  1. Verify the secondary Elasticsearch cluster is empty:
curl http://localhost:9201/_cat/indices?v
  1. Verify the following CirrusSearch jobs for wiki1 exit without an error on the secondary Elasticsearch cluster:
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ElasticSearchIndexInit 1 secondary"
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\QueueSearchIndexBatches 1 secondary"
  1. Verify the following CirrusSearch job for wiki1 exits with "... Indexed 1 pages on secondary. ..." on the secondary Elasticsearch cluster (wait for job queue to run or reload the page):
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ForceSearchIndex id 1 0 1000 secondary"
  1. Verify wiki1 now has indices on both Elasticsearch clusters:
curl http://localhost:9200/_cat/indices?v
curl http://localhost:9201/_cat/indices?v

Note: Try querying the index directly if the queries above aren't updating with:

curl http://localhost:9201/NAME_content_first/_search
(D) Verify that a new wiki will use both Elasticsearch clusters
  1. Create a wiki with the name wiki3 via http://www.wbaas.localhost/dashboard
  2. Query the database for wiki3's id and name:
kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'
  1. Verify two new indices (i.e., NAME_general_first, NAME_content_first) were created on both Elasticsearch cluster using wiki3's name (2):
curl http://localhost:9200/_cat/indices?v
curl http://localhost:9201/_cat/indices?v
  1. Create a new item on wiki3 via http://wiki3.wbaas.localhost/wiki/Special:NewItem
  2. Verify docs.count is 1 for wiki3's NAME_content_first (2) entry on both Elasticsearch clusters:
curl http://localhost:9200/_cat/indices?v
curl http://localhost:9201/_cat/indices?v
(E) Verify that a wiki can be deleted when both Elasticsearch clusters are used
  1. Delete wiki1 via http://www.wbaas.localhost/dashboard
  2. Hard delete wiki1 by running the following:
kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow DeleteWikiDispatcherJob"
  1. Verify wiki1's tables have been removed from the database:
kubectl --context minikube-wbaas exec -it sql-mariadb-primary-0 -- /bin/bash -c 'mysql -u root -p${MARIADB_ROOT_PASSWORD} -e "SELECT wikis.id, sitename, prefix, name FROM apidb.wikis LEFT JOIN apidb.wiki_dbs ON wikis.id = wiki_dbs.wiki_id;"'
  1. Verify wiki1's Elasticsearch indices have been deleted from both clusters:
curl http://localhost:9200/_cat/indices?v
curl http://localhost:9201/_cat/indices?v

There seems to still be a bug in C.2.

This step doesn't always exit cleanly on the first run but it does still create the indices:

kubectl exec -it deployments/api-scheduler -- bash -c "php artisan job:dispatchNow CirrusSearch\\\\ElasticSearchIndexInit 1 secondary"
Andrew-WMDE updated the task description. (Show Details)
Andrew-WMDE moved this task from Doing to In Review on the Wikibase Cloud (WB Cloud Sprint 20) board.
Tarrow changed the task status from Open to Stalled.Jun 5 2023, 9:23 AM
Tarrow removed dang as the assignee of this task.
Tarrow added a subscriber: dang.

Task was marked stalled both to represent reality: it had been hanging around since Mid may with no movement but also to focus on other work that was inflight at the same time (I think specifically T330389).

Marked it as open again

Right, unfortunately I'm now off for a few days and therefore am about to un-assign myself from reviewing this. Sorry that it doesn't feel like I've managed to make some great progress on it but there are certainly a couple of thoughts.

There are a couple of comments but I would also add some general ones here. While it does definitely make it harder to write and prepare the patches it would probably be easier to review if these were a little more broken up. Particularly if they were broken up in an order where it felt like merging some of the work was very low risk.

For example I wasn't sure if I could merge the api or MW changes without that then breaking ES functionality. While there would be no change in production to things if there was no corresponding chart or repo change merging some breaking change here would result in us then being unable to push and deploy new api patches.

The other big picture thought I had here was that having the platform api talk directly to ES is an inconvenience that we don't even benefit from that this point since we don't use the delete functionality. Without this it would be possible to encapsulate all the ES config to be set in MW and we wouldn't then need to worry about passing things like cluster names over the wire.

We had a look at the api and mediawiki patches and decided that the main thing that made the couple was changing the parameters of the api calls that triggers the elasticsearch jobs to run on mediawiki.

We wondered if instead it would be possible to remove these and have MediaWiki always write to all configured clusters when these apis are called. This would mean that all of the changes that are being made on both sides (api and mediawiki) would not need to happen at the same time (because they probably don't need to happen at all).

When, mid migration, we need to call specific clusters one at a time we could do this in another way (for example, following the "use a kubernetes job" pattern that we are generally aiming towards).

We'll look at adding on (or adjusting?) the mediawiki patch to see what this would look like.

Tarrow changed the task status from Open to Stalled.Jul 6 2023, 9:36 AM

Documenting some failed updates form dailies over the last few dasy: Marked stalled since we at least want a second cluster setup that is working for all devs before people move further on this. See: T335853

Tarrow changed the task status from Stalled to Open.Jul 17 2023, 12:42 PM

Marked open again now that there is a second cluster both on the local environment and in staging

Fring removed Fring as the assignee of this task.Jul 18 2023, 10:25 AM
Fring subscribed.
Tarrow removed Tarrow as the assignee of this task.
Tarrow claimed this task.