Maniphest T194678

Update OtherIndex to operate on a cluster other than the one holding the wiki
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	EBernhardson
	May 14 2018, 6:51 PM

Description

To allow storing wikis on separate clusters but still supporting the OtherIndex (on-wiki commons multimedia search with duplicates removal) functionality we need CirrusSearch to understand, in this limited circumstance, that other clusters exist and send indexing/search operations the right way.

Details

	Subject	Repo	Branch	Lines +/-
	OtherIndex support multiple clusters	mediawiki/extensions/CirrusSearch	master	+485 -73

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	EBernhardson	T183281 [epic] ELK upgrade to 6.x (elasticsearch, kibana, logstash)
Resolved	None	T183282 [epic] Search cluster upgrade to 6.x
Resolved	debt	T193654 [epic] Run multiple elasticsearch clusters on same hardware
Resolved	EBernhardson	T194678 Update OtherIndex to operate on a cluster other than the one holding the wiki

Event Timeline

EBernhardson triaged this task as Medium priority.May 14 2018, 6:51 PM

EBernhardson created this task.

For search we might consider Cross Cluster Search. This was added in 5.4, came out of beta in 6.0, and is the blessed replacement for tribe nodes. It essentially allows us to query the other index as if it were local by prefixing the index with the cluster name, such as large:commonsiwiki_file. This allows us to ignore the question of which cluster (eqiad, codfw?) to read from in the cirrus code, relegating it to elasticsearch configuration.

For indexing we need to be a little more involved. The use cases look to be:

Default operation issues writes to all clusters. Writes to clusters not accepting writes are backed up into the job queue.
Maintenance scripts issue writes to a single cluster. Multi-cluster operations are almost always handled by separate invocations.

When we perform OtherIndex updates to a wiki on a separate cluster from commonswiki we need to map to the correct cluster.

file created on qqwiki in cluster eqiad-small01 needs to send writes to eqiad-large and codfw-large
Maintenance script on qqwiki in cluster eqiad-small01 needs to send writes only to eqiad-large. Sending to only local cluster, or all configured clusters is currently gated on a flag called 'same-cluster'.

Current config:[ NS_FILE => 'commonswiki_file' ]

Proposal 1:[ NS_FILE => [ ['codfw-large', 'eqiad-large'], 'commonswiki_file' ] ]

Since search time is handled by cluster configuration, the only thing we really need here is all clusters that need writes. This will easily handle the standard operation. For maintenance scripts though, we need to know that when passed a wiki connection to eqiad-small01 we should only choose eqiad-large. We could perhaps hardcode a naming convention, match only cluster names with a matching prefix before a - deimiter.

Proposal 2:

I think part of the reason all of the above feels awkward is because we are treating all clusters as equals, with mostly a name, and not imposing the logical structure we want on them. To give it a better structure we need names, I propose, roughly:

$wgCirrusSearchReplicaGroup = 'small-01';
$wgCirrusSearchReplicaGroups = [
   'small-01' => [
    'eqiad' => 'eqiad-small-01',
    'codfw' => 'codfw-small-01',
  ],
  'small-02' => [
    'eqiad' => 'eqiad-small-02',
    'codfw' => 'codfw-small-02',
  ],
  'large' => [
    'eqiad' => 'eqiad-large', 
    ...
  ],
]

The values inside the arrays are names in wgCirrusSearchClusters and everywhere that deals with names of clusters would primarily deal with the inner structure, specified by the local replica group. OtherIndex operations instead of referring to a separate cluster, will refer to the appropriate replica group.

In proposal 2 does it mean we get rid of wgCirrusSearchWriteClusters?
On one hand I like the simplicity of the first proposal but I'd go with something like:

[ NS_FILE => [
   'read_cluster' => 'logical_name_used_by_crosscluster_search' # OR '__local__' if mw-config detects this wiki belongs to the same cluster
   'write_clusters' => [
         'eqiad-large-01',
         'codfw-large-01',
   ],
   'index_name' => 'commonswiki_file'
];

But I think this will require extra logic in mw-config to adjust properly for every wiki but to also ensure that switching between clusters remains a relatively easy config change.

And in the end I think OtherIndex is not the sole component that require some adaptation. Crosswiki search will likely require some changes unless we guarantee that all the wikis that can be accessed from a crosswiki search live on the same cluster. It's unclear to me how to do this, we managed to get crosswiki working by only using the data available in SiteMatrix, here it looks like we will also need to infer another info to determine which cluster to query.

For splitting wikis between clusters and ensuring sister searches stay on the same cluster i was hoping i could get by with a test case in mw-config that pokes at the SiteMatrix configuration (unfortunately without the SiteMatrix code in mw-debug test suite) and verifies things all belong on the "correct" clusters.

To actually choose what goes where though, I'm not sure. I'm half tempted to generate a whitelist for the large cluster, and distribute the rest of the wikis between the small clusters based on the first letter of the wiki with some letter chosen as the cutoff point between the two clusters. It wouldn't be perfectly balanced, but it would be enough and i hope it's reasonable to assume all sister wikis will have the same first letter of their wikiid

• EBjune moved this task from needs triage to Up Next on the Discovery-Search board.May 31 2018, 5:28 PM

EBernhardson claimed this task.Jun 27 2018, 10:11 PM

EBernhardson moved this task from Up Next to Current work on the Discovery-Search board.

EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search.

EBernhardson moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.

In T194678#4207854, @dcausse wrote:

[ NS_FILE => [
   'read_cluster' => 'logical_name_used_by_crosscluster_search' # OR '__local__' if mw-config detects this wiki belongs to the same cluster
   'write_clusters' => [
         'eqiad-large-01',
         'codfw-large-01',phpun
   ],
   'index_name' => 'commonswiki_file'
];

Ideally, we want to have a generic wgCirrusSearchExtraIndexes configuration that doesn't need to be dynamically built per-wiki by mediawiki-config. Another option would be to configure a mapping from "some cluster" to appropriate extra index cluster.

The configuration would become something like:

[ NS_FILE => [
    'indexName' => 'commonswiki_file',
    'clusters' => [
        'eqiad-a' => 'eqiad-a',
        'eqiad-b' => 'eqiad-a',
        'eqiad-c' => 'eqiad-a',
        'codfw-a' => 'codfw-a',
        'codfw-b' => 'codfw-a',
        'codfw-c' => 'codfw-a',
    ]
]

This makes a simplifying assumption that the elasticsearch cross-cluster is configured giving the clusters the same names we use in cirrussearch. Using anything else seems like it would be over complicated anyways. If we add the assumption that any unconfigured mapping points to the current cluster then this can also reasonably easily handle the existing configuration without modification. When configured as a string instead of an array we can assume ['indexName' => 'foo', 'clusters' => []]. When we lookup the mapping in clusters we follow the assumption that unconfigured clusters write to themselves.

Change 443009 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] OtherIndex support multiple clusters

https://gerrit.wikimedia.org/r/443009

gerritbot added a project: Patch-For-Review.Jul 2 2018, 8:40 PM

EBernhardson moved this task from not in use - please delete to Needs review on the Discovery-Search (Current work) board.Jul 2 2018, 10:49 PM

Change 443009 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] OtherIndex support multiple clusters

https://gerrit.wikimedia.org/r/443009

ReleaseTaggerBot added a project: MW-1.32-notes (WMF-deploy-2018-07-10 (1.32.0-wmf.12)).Jul 4 2018, 3:00 PM

just to add a note that I discovered this morning that elastic will refuse to boot if the other cluster you setup like:

search:
        remote:
                other:
                        seeds: 10.11.12.1:9300

is not running.

This means we can't make clusters dependent on each others.

Mentioned in SAL (#wikimedia-operations) [2018-07-05T18:55:28Z] <ebernhardson> T194678 pause cirrussearch writes to codfw to check how kafka+mirrormaker responds

Mentioned in SAL (#wikimedia-operations) [2018-07-05T19:07:19Z] <ebernhardson> T194678 un-pause cirrussearch writes to codfw

EBernhardson mentioned this in T196032: Huge messages in eqiad.mediawiki.job.cirrusSearchElasticaWrite (and other?) topics.Jul 5 2018, 7:10 PM

This means we can't make clusters dependent on each others.

As discussed on irc this isn't going to be a problem for our current plans as we only need cross cluster search from the new small clusters back to the large cluster.

dcausse moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Jul 10 2018, 5:17 PM

debt closed this task as Resolved.Jul 13 2018, 7:11 PM

Update OtherIndex to operate on a cluster other than the one holding the wikiClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Update OtherIndex to operate on a cluster other than the one holding the wiki
Closed, ResolvedPublic
Actions

Related Objects
Search...