Page MenuHomePhabricator

maintenance script to copy the ES index from one cluster to another
Closed, ResolvedPublic3 Story Points

Description

I think we can build the index in codfw and the future labs replica much quicker than a dump. Basically we just need to run the current Reindexer but with the source in one cluster and the sink in the other.

This needs to fit into the basic plan for bootstraping an additional cluster:

  1. Create mappings for all wiki's in codfw
  2. deploy operations/mediawiki-config to also send writes to codfw for one index
  3. Watch things for a bit
  4. turn on a few more. turn on the a few more. turn on the rest.
  5. Copy/rebuild into the same index accepting writes

Additionally by having the reindexing contained under a single process we could use standard tools like trickle[1] to rate limit our WAN traffic (The Reindexer can also fork to increase the speed if needed).

[1] http://linux.die.net/man/1/trickle

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added subscribers: EBernhardson, chasemp.
Restricted Application added a project: Discovery. · View Herald TranscriptSep 18 2015, 4:13 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
EBernhardson updated the task description. (Show Details)Sep 18 2015, 4:16 AM
EBernhardson set Security to None.
EBernhardson updated the task description. (Show Details)Sep 18 2015, 5:16 AM

updateOneSearchIndexConfig basically already does everything we need. It even takes the connection to read from (via an $index and $oldTypes constructor arguments) and the connection to write to ($connection and $types constructor arguments) independently.

updateOneSearchIndexConfig currently always creates a new index, so it isn't directly usable. Maybe a flag? maybe shared code in a new maint class? not sure yet

This comment was removed by EBernhardson.
Smalyshev claimed this task.Oct 7 2015, 7:15 PM

Change 244247 had a related patch set uploaded (by Smalyshev):
Use corrent idnexes/connections for old and new types.

https://gerrit.wikimedia.org/r/244247

Change 244247 merged by jenkins-bot:
Split connection to source and target.

https://gerrit.wikimedia.org/r/244247

Change 246243 had a related patch set uploaded (by DCausse):
Split connection to source and target.

https://gerrit.wikimedia.org/r/246243

Change 246243 merged by jenkins-bot:
Split connection to source and target.

https://gerrit.wikimedia.org/r/246243

Smalyshev closed this task as Resolved.Oct 14 2015, 5:34 PM

Change 247379 had a related patch set uploaded (by EBernhardson):
Split connection to source and target.

https://gerrit.wikimedia.org/r/247379

Change 247379 merged by jenkins-bot:
Split connection to source and target.

https://gerrit.wikimedia.org/r/247379