🔍Investigate ways to reduce the resources consumed by ES
Open, HighPublic
Actions

Assigned To

Authored By

	Tarrow
	Jan 19 2023, 5:31 PM

Description

There could be a variety of ways to reduce the resources consumed by ES including:

reducing the number of indices or shards
using more modern or patches versions of ES
changing config settings or optimising the installation

Patches:

wbstack/mediawiki/pull/430 | Support sharing Elasticsearch indices across multiple wikis

Details

	Subject	Repo	Branch	Lines +/-
	[POC] Support for sharing indices across multiple wikis	mediawiki/extensions/CirrusSearch	REL1_37	+58 -20

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		None	T310990 [1 sprint Timebox] Investigate solution to the ElasticSearch scaling issues
Open	Goal	None	T366278 🔍 Elastic Search Improvements
Open		Andrew-WMDE	T327428 🔍Investigate ways to reduce the resources consumed by ES

Event Timeline

Tarrow renamed this task from Investigate ways to reduce the requirements of ES to Investigate ways to reduce the resources consumed by ES.Jan 19 2023, 5:31 PM

Tarrow created this task.

This task was already partially started by @Andrew-WMDE; created this ticket so we don't lose his work

Change 881642 had a related patch set uploaded (by Tarrow; author: Andrew-WMDE):

[mediawiki/extensions/CirrusSearch@REL1_37] [POC] Support for sharing indices across multiple wikis

https://gerrit.wikimedia.org/r/881642

gerritbot added a project: Patch-For-Review.Jan 19 2023, 5:36 PM

Tarrow updated the task description. (Show Details)Jan 19 2023, 5:39 PM

We had explored this possibility in the past (see T139496) we never finished/enabled this work because we realized that the perf benefits were not worth the efforts (T148554).
But I believe that there are few pieces in CirrusSearch that was written for this purpose that you could use to run this POC:

conflicting IDs, page ids are used as elasticsearch doc_id and thus won't work well if multiple wikis have similar page_ids, $wgCirrusSearchPrefixIds can be set to true to prefix page_ids with the wiki and used the elastic doc_id.
the basename of the index can already be forced with $wgCirrusSearchIndexBaseName, have you considered testing with this setting instead of adding a new sharedName option to the UpdateOneSearchIndexConfig?

The missing bits are I think on the query side, as all docs will be stored in the same index the generated query must be attaching a "wiki: current_wiki" filter to all the searches (we have a indexed field named wiki).
There are certainly other problems to solve but it's hard to anticipate all of them at this point.

Might there be any benefit from cloning (ES 7.4+, so maybe MW 1.39+) a default index per MediaWiki release used by Wikibase.cloud, rather than generate a new one for each instance, which might presumably create a lot of duplicate content if default messages are indexed?

It seems to hardlink from a readonly index on compatible filesystems, rather than make new files (but I don't know if the subsequent 'recovery' creates a copy of the old data, or simply references it from the readonly indexes).

Apologies if this does not make sense, or is being done already as part of the creation process - I know little about how ES works or the details of its use in MediaWiki/WB.C.

Evelien_WMDE moved this task from WB Cloud Sprint 12 to Ready to Pick Up on the Wikibase Cloud board.Feb 1 2023, 12:51 PM

Evelien_WMDE edited projects, added Wikibase Cloud; removed Wikibase Cloud (WB Cloud Sprint 12).

Evelien_WMDE moved this task from Ready to Pick Up to WB Cloud Sprint 14 on the Wikibase Cloud board.Feb 15 2023, 12:35 PM

Evelien_WMDE edited projects, added Wikibase Cloud (WB Cloud Sprint 14); removed Wikibase Cloud.

Evelien_WMDE moved this task from WB Cloud Sprint 14 to Tech prioritized backlog on the Wikibase Cloud board.Mar 1 2023, 12:20 PM

Evelien_WMDE edited projects, added Wikibase Cloud; removed Wikibase Cloud (WB Cloud Sprint 14).

Deniz_WMDE claimed this task.Mar 30 2023, 10:49 AM

Deniz_WMDE moved this task from Tech prioritized backlog to WB Cloud Sprint 17 on the Wikibase Cloud board.

Deniz_WMDE edited projects, added Wikibase Cloud (WB Cloud Sprint 17); removed Wikibase Cloud.

Deniz_WMDE moved this task from Sprint Backlog to In Review on the Wikibase Cloud (WB Cloud Sprint 17) board.

One way to test this would be to add this getting patch to the patchUrls section of pacman.yaml.

It would be interesting to know if this does indeed work in combination with the "platform api" even on some old form of of our MediaWiki image.

A few manual steps to see if this thing is working would be to create two wikis and add some content to both of them.

For example an item in wiki1 called cat1 and and item in wiki2 called cat2. It would be then great to confirm that there are no results on wiki1 for cat2 for example but that the search results we would normally expect are there. Similarly it would be good to confirm that Q1 when searched for returns results from only one wiki.

Today I tried this patch on a local wikibase.cloud cluster and mediawiki image 1.37-7.4-20220621-fp-beta-0 (slightly modified to contain the patched code).

I used mediawiki 481e0b535... and wbaas-deploy c54a972...
and added the patch URL to pacman.yaml

$ git diff pacman.yaml
diff --git a/pacman.yaml b/pacman.yaml
index c0121f437..ae1d55c18 100644
--- a/pacman.yaml
+++ b/pacman.yaml
@@ -244,6 +244,8 @@
   - Gruntfile.js
   - Doxyfile
 - name: CirrusSearch
+  patchUrls:
+  - https://gerrit.wikimedia.org/r/changes/mediawiki%2Fextensions%2FCirrusSearch~881642/revisions/2/patch?download
   artifactUrl: https://codeload.github.com/wikimedia/mediawiki-extensions-CirrusSearch/zip/e9fe241ff135f666dc1837cedb1afd5b8b78a338
   artifactLevel: 1
   destination: ./dist/extensions/CirrusSearch

After that I ran sync.sh and built a local image, which I used in the helmfile local env config.

Results:

Creating wikis worked. I created several wikis where Q1 Item label was the same and/or slightly different, and in both cases the searchbox suggestions lead to the correct item. Also I didn't see suggestions for items from other wikis.

I also looked a bit at the ES status via elasticHQ (docker run --network host elastichq/elasticsearch-hq and kubectl port-forward elasticsearch-master-0 9002).

There I could confirm that only shared indices for wikis exist (mw_cirrus_metastore_first, wiki_content_first and wiki_general_first)
as well as correct aliases for the wikis.

In T327428#8756168, @Deniz_WMDE wrote:

Today I tried this patch on a local wikibase.cloud cluster and mediawiki image 1.37-7.4-20220621-fp-beta-0 (slightly modified to contain the patched code).

Same results for mediawiki image 1.38-7.4-20230323-0 (used with wbaas-deploy d39ace9...)

also: the patched maint script UpdateOneSearchIndexConfig.php didn't change between the last 1.37 and this 1.38 image. I'm now looking at our current mediawiki upstream with 1.39 and I see some changes when diffing, so I first will try to check if the patch looks still compatible to that version.

For 1.39 I tried to port the patch, currently only living here: https://phabricator.wikimedia.org/P46080

wbaas-deploy 6dec041...
mediawiki 1.39-7.4-20230328-0

At first it looked like it's working but then I noticed that for every wiki other than the first one, ES wasn't enabled (which decreased my confidence in my results from earlier - maybe I didn't realize looking at the wrong search suggestions).

I created a third wiki, and for all but the first one there was a failed wbstackElasticSearchInit log: https://phabricator.wikimedia.org/P46081
Looking at existing ES aliases in that setup though looked like the aliases were created successfuly earlier.

Evelien_WMDE moved this task from WB Cloud Sprint 17 to WB Cloud Sprint 18 on the Wikibase Cloud board.Apr 12 2023, 11:40 AM

Evelien_WMDE edited projects, added Wikibase Cloud (WB Cloud Sprint 18); removed Wikibase Cloud (WB Cloud Sprint 17).

Evelien_WMDE moved this task from Sprint Backlog to In Review on the Wikibase Cloud (WB Cloud Sprint 18) board.

Evelien_WMDE moved this task from WB Cloud Sprint 18 to Ready to Pick Up on the Wikibase Cloud board.Apr 12 2023, 11:46 AM

Evelien_WMDE edited projects, added Wikibase Cloud; removed Wikibase Cloud (WB Cloud Sprint 18).

Adding to the last experiment with 1.39, I found out that it works fine for other wikis if the ES WikiSetting gets enabled manually. May assumption is that it probably just "fails" because the wikibase.cloud APIs approach of checking if the job completed successfully is by looking for specific strings in the output, and that is quite error prone. I assume the output changed and this would work after T333559 is merged. (edit: tried it with that fix locally, that worked fine!)

Here's the formatted output and stack trace of a failed job from that setup: https://phabricator.wikimedia.org/P46555

Deniz_WMDE removed Deniz_WMDE as the assignee of this task.Apr 12 2023, 12:14 PM

Deniz_WMDE subscribed.

Deniz_WMDE mentioned this in T333559: Prevent elasticsearch from being disabled on Wikis after some failure.Apr 13 2023, 2:13 PM

Addshore awarded a token.Apr 18 2023, 11:01 PM

Evelien_WMDE moved this task from Ready to Pick Up to Tech prioritized backlog on the Wikibase Cloud board.May 15 2023, 10:03 AM

Change 881642 abandoned by Umherirrender:

[mediawiki/extensions/CirrusSearch@REL1_37] [POC] Support for sharing indices across multiple wikis

Reason:

This branch is EOL, please upload in the master branch if still relevant/needed

https://gerrit.wikimedia.org/r/881642

Maintenance_bot removed a project: Patch-For-Review.Jun 30 2023, 8:11 PM

Tarrow assigned this task to Andrew-WMDE.Mar 11 2024, 9:17 AM

Tarrow moved this task from Tech prioritized backlog to Kanban board Q1 2024 on the Wikibase Cloud board.

Tarrow edited projects, added Wikibase Cloud (Kanban board Q1 2024); removed Wikibase Cloud.

Tarrow moved this task from To do to Doing on the Wikibase Cloud (Kanban board Q1 2024) board.

notes from a call about this:

Some upstream patch to try this out is visible at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1010190

We're dusting off and reconsidering this attempt to alias many indices to one. The suggestion is that we make a replica ES cluster in staging which would cost around 1kEUR/month (but hopefully we won't need for a full month).

We have maybe around 18GB of ES storage although we think a lot of that might be index overhead e.g. for empty Wikis

Sticking everything in one index claims to have an approx limit of 200million docs and 50GB but we're well below that right now and in the future we could also try aliasing to a small number of upstream indices if we get near this limit.

We think that we could set this new cluster as the write only cluster for a bit. Populate it and then try and move it to become a read cluster while keeping the existing many index cluster as write only.

We want to try this on staging first; we also want to document both how the aliasing technique is working as well as the mechanics of doing the test migration.

Andrew-WMDE moved this task from Doing to In Review on the Wikibase Cloud (Kanban board Q1 2024) board.Mar 12 2024, 2:57 PM

Andrew-WMDE updated the task description. (Show Details)Mar 13 2024, 1:59 PM

Andrew-WMDE moved this task from In Review to Doing on the Wikibase Cloud (Kanban board Q1 2024) board.Mar 15 2024, 1:10 PM

Charlie_WMDE mentioned this in T350874: Disable ElasticSearch for empty Wikibases.Mar 27 2024, 12:25 PM

conny-kawohl_WMDE edited projects, added Wikibase Cloud (Kanban Board Q2 2024); removed Wikibase Cloud (Kanban board Q1 2024).Apr 17 2024, 1:42 PM

conny-kawohl_WMDE moved this task from To do to Doing on the Wikibase Cloud (Kanban Board Q2 2024) board.Apr 17 2024, 1:47 PM

Anton.Kokh renamed this task from Investigate ways to reduce the resources consumed by ES to 🔍Investigate ways to reduce the resources consumed by ES.Wed, May 29, 1:47 PM

Anton.Kokh triaged this task as High priority.

Anton.Kokh added a parent task: T366278: 🔍 Elastic Search Improvements.Thu, May 30, 1:20 PM

Andrew-WMDE removed Andrew-WMDE as the assignee of this task.Wed, Jun 19, 7:52 AM

Andrew-WMDE claimed this task.

Andrew-WMDE moved this task from Doing to Done on the Wikibase Cloud (Kanban Board Q2 2024) board.

🔍Investigate ways to reduce the resources consumed by ESOpen, HighPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

🔍Investigate ways to reduce the resources consumed by ES
Open, HighPublic
Actions

Related Objects
Search...