Page MenuHomePhabricator

Initialize CirrusSearch on cloudelastic
Closed, ResolvedPublic

Description

  • Open firewall on cloudelastic machines to allow connections from mwmaint*, mw job runners to cloudelastic
  • Add cloudelastic to wgCirrusSearchClusters on all non-private wikis. Do not add to wgCirrusSearchWriteClusters initially
  • Set wgCirrusSearchDropDelayedJobsAfter to 15 minutes for cloudelastic
  • Initialize all indices
  • Import cirrussearch dumps from dumps.wikimedia.org
  • Add cloudelastic to wgCirrusSearchWriteClusters
  • reindex updates between when dumps were created and when writes were enabled

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
EBernhardson renamed this task from Initialize CirrusSeearch on cloudelastic to Initialize CirrusSearch on cloudelastic.Apr 10 2019, 4:23 PM
EBernhardson updated the task description. (Show Details)
EBernhardson updated the task description. (Show Details)

Open firewall on cloudelsatic machines to allow connections from mwmaint*, mw job runners and cloudelastic

A quick look into our puppet code and ferm configuration does not show an obvious variable that would identify either the mwmaint* or the job runners nodes. I'll keep digging!

Change 502829 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] cloudelastic: allow jobrunners and mwmaint nodes to access cloudelastic

https://gerrit.wikimedia.org/r/502829

Change 502832 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/mediawiki-config@master] [cirrus] add cloudelastic service

https://gerrit.wikimedia.org/r/502832

Change 502829 merged by Gehel:
[operations/puppet@production] cloudelastic: allow jobrunners and mwmaint nodes to access cloudelastic

https://gerrit.wikimedia.org/r/502829

Change 507219 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Add cloudelastic servers to wgCirrusSearchClusters

https://gerrit.wikimedia.org/r/507219

Change 507219 merged by jenkins-bot:
[operations/mediawiki-config@master] Add cloudelastic servers to wgCirrusSearchClusters

https://gerrit.wikimedia.org/r/507219

Mentioned in SAL (#wikimedia-operations) [2019-04-29T23:54:21Z] <ebernhardson@deploy1001> Synchronized tests/: T220625 Add cloudelastic servers to wgCirrusSearchClusters (1/5) (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2019-04-29T23:55:33Z] <ebernhardson@deploy1001> Synchronized wmf-config/LabsServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (2/5) (duration: 00m 52s)

Mentioned in SAL (#wikimedia-operations) [2019-04-29T23:56:47Z] <ebernhardson@deploy1001> Synchronized wmf-config/ProductionServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (3/5) (duration: 00m 50s)

Mentioned in SAL (#wikimedia-operations) [2019-04-29T23:58:34Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (4/5) (duration: 00m 52s)

Mentioned in SAL (#wikimedia-operations) [2019-04-29T23:59:46Z] <ebernhardson@deploy1001> Synchronized wmf-config/CirrusSearch-production.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (5/5) (duration: 00m 52s)

Created indices using the following. First created the index for testwiki and verified it made into a sane state.

expanddblist private > ~/private_wikis
expanddblist all | grep -vFf ~/private_wikis | while read wiki; do mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki=$wiki --cluster=cloudelastic; done

It turns out we can't set the max shards per node on a per-cluster basis, we only have one value used on all clusters. Shouldn't be a big change, but this needs to be updated to take a per-cluster value much like our other configurations.

We should additionally set the refresh interval to ~15 minutes for cloudelastic to help push back on expectations of update recency.

Importing group0 wikis from dumps with following script:

DUMPDATE=20190429
DUMPURL=https://dumps.wikimedia.your.org/other/cirrussearch/${DUMPDATE}
expanddblist group1 | while read WIKI; do
    CIRRUSPORT=$(echo 'echo (new CirrusSearch\SearchConfig)->getClusterAssignment()->getServerList( "cloudelastic" )[0]["port"];' | mwscript eval.php --wiki=${WIKI})
    for INDEXTYPE in content general; do
        DUMPFILE="${WIKI}-${DUMPDATE}-cirrussearch-${INDEXTYPE}.json.gz"
        echo "Importing $DUMPFILE"
        echo
        https_proxy=http://webproxy.eqiad.wmnet:8080/ curl -s ${DUMPURL}/${DUMPFILE} | zcat | pv -c -N bytes | pv -c -N lines -l | \
            bin/parallel --blocksize 20971520 --pipe -L 2 -N 100 -j20 "https_proxy="" curl -s https://cloudelastic.wikimedia.org:${CIRRUSPORT}/${WIKI}_${INDEXTYPE}/_bulk --data-binary @- -H 'Content-Type: application/x-ndjson' >/dev/null"
    done
done

Change 507609 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Start writing to cloudelastic from testwiki

https://gerrit.wikimedia.org/r/507609

Change 507609 merged by jenkins-bot:
[operations/mediawiki-config@master] Start writing to cloudelastic from testwiki

https://gerrit.wikimedia.org/r/507609

Mentioned in SAL (#wikimedia-operations) [2019-05-01T16:58:46Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic from testwiki (duration: 01m 01s)

Change 507703 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Start writing to cloudelastic for group0

https://gerrit.wikimedia.org/r/507703

Change 507703 merged by jenkins-bot:
[operations/mediawiki-config@master] Start writing to cloudelastic for group0

https://gerrit.wikimedia.org/r/507703

Mentioned in SAL (#wikimedia-operations) [2019-05-01T23:19:18Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic for group0 (duration: 01m 05s)

group0 indices have all been imported from dumps, live writes have been enabled, and forceSearchIndex.php has been used to catch up any changes missed. Additionaly saneitize.php has been run over them to ensure everything is in an expected state. Not sure how practical it will be to use saneitize.php on the larger wikis.

I've additionally started importing group1 and group2 wikis from the dumps. This will likely take many days. They are all importing from the 20190429 dumps. group0 was imported from the 20190422 dumps.

@Gehel Something i haven't been able to figure out, I can't find the cloudelastic servers in the grafana 'Cluster overview' dashboard. Is it expected to show up there, and is there anything we should do to help things along?

Change 508732 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Limit the clusters archive index is written to

https://gerrit.wikimedia.org/r/508732

Change 508733 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Configure wgCirrusSearchPrivateClusters

https://gerrit.wikimedia.org/r/508733

Change 508733 merged by jenkins-bot:
[operations/mediawiki-config@master] Configure wgCirrusSearchPrivateClusters

https://gerrit.wikimedia.org/r/508733

Mentioned in SAL (#wikimedia-operations) [2019-05-07T23:31:08Z] <ebernhardson@deploy1001> Synchronized wmf-config/CirrusSearch-production.php: T220625 Configure wgCirrusSearchPrivateClusters (duration: 00m 58s)

Change 508732 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Limit the clusters archive index is written to

https://gerrit.wikimedia.org/r/508732

Change 509110 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@wmf/1.34.0-wmf.4] Limit the clusters archive index is written to

https://gerrit.wikimedia.org/r/509110

Change 509110 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@wmf/1.34.0-wmf.4] Limit the clusters archive index is written to

https://gerrit.wikimedia.org/r/509110

Mentioned in SAL (#wikimedia-operations) [2019-05-09T23:43:36Z] <ebernhardson@deploy1001> Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/: T220625 Limit the clusters archive index is written to (duration: 00m 59s)

Mentioned in SAL (#wikimedia-operations) [2019-05-09T23:52:11Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T220625: Dont write to private wikis on cloudelastic (duration: 00m 50s)

This is still only taking group0 updates, waiting to roll out group1 updates on figuring out a proper inbound loadbalancer for job runners -> cloudelastic. Without this a single host in cloudelastic being unavailable will result in a constant stream of errors in logstash.

EBernhardson triaged this task as Medium priority.

Change 502832 abandoned by DCausse:
[cirrus] add cloudelastic service

Reason:
will be using lvs

https://gerrit.wikimedia.org/r/502832

Waiting on Ops to let us know about load balancing and how this should work in the future.

Change 528260 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Repoint cloudelastic at LB dns

https://gerrit.wikimedia.org/r/528260

Change 528263 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Temporarily stop writing to cloudelastic

https://gerrit.wikimedia.org/r/528263

Mentioned in SAL (#wikimedia-operations) [2019-08-05T20:49:14Z] <ebernhardson> nuke all search indices on cloudelastic preparing for fresh imports and live updates T220625

Change 528263 abandoned by EBernhardson:
Temporarily stop writing to cloudelastic

Reason:
wasn't necessary

https://gerrit.wikimedia.org/r/528263

Change 528260 merged by jenkins-bot:
[operations/mediawiki-config@master] Repoint cloudelastic at LB dns

https://gerrit.wikimedia.org/r/528260

Mentioned in SAL (#wikimedia-operations) [2019-08-05T23:03:15Z] <urbanecm@deploy1001> Synchronized wmf-config/ProductionServices.php: SWAT: 87b428d: Repoint cloudelastic at LB dns (T220625) (duration: 00m 48s)

Change 528503 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Turn on cloudelastic writes for group1

https://gerrit.wikimedia.org/r/528503

Change 528503 merged by jenkins-bot:
[operations/mediawiki-config@master] Turn on cloudelastic writes for group1

https://gerrit.wikimedia.org/r/528503

Mentioned in SAL (#wikimedia-operations) [2019-08-06T16:19:56Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T220625: Turn on cloudelastic writes for group1 (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2019-08-06T16:40:37Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T220625: Re-sync enable group1 on cloudelastic, job runners are claiming its not enabled while app servers are sending jobs (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2019-08-07T23:07:27Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T220625: Send writes for all non-private wikis to cloudelastic (duration: 01m 02s)

All wikis are writing to cloudelastic now. Still be a few days to catchup on writes since july 29, the day the dump was made. Also somehow importing commonswiki_file only imported ~25M out of 50M items. The saneitizer is working on fixing that, but will take a bit.

Change 529993 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200

https://gerrit.wikimedia.org/r/529993

Change 529993 merged by Ppchelko:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200

https://gerrit.wikimedia.org/r/529993

Mentioned in SAL (#wikimedia-operations) [2019-08-13T19:15:46Z] <ppchelko@deploy1001> Started deploy [cpjobqueue/deploy@f1a562e]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625

Mentioned in SAL (#wikimedia-operations) [2019-08-13T19:17:17Z] <ppchelko@deploy1001> deploy aborted: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625 (duration: 01m 30s)

Mentioned in SAL (#wikimedia-operations) [2019-08-13T19:22:40Z] <ppchelko@deploy1001> Started deploy [cpjobqueue/deploy@3882ddb]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625

Mentioned in SAL (#wikimedia-operations) [2019-08-13T19:23:38Z] <ppchelko@deploy1001> Finished deploy [cpjobqueue/deploy@3882ddb]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625 (duration: 00m 58s)

Change 624237 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] cloudelastic: remove temporarily increased timeout

https://gerrit.wikimedia.org/r/624237

Change 624237 merged by Ryan Kemper:
[operations/puppet@production] cloudelastic: remove temporarily increased timeout

https://gerrit.wikimedia.org/r/624237