Page MenuHomePhabricator

Perform In-Place Re-index to apply newest shard settings
Closed, ResolvedPublic

Description

Discrete Steps

  • Craft set of commands
  • Actually run the commands
    • STARTED: monitor tmux session reindex on mwmaint1002 under shell user ryankemper

Commands and how I got them

modified reindex function from https://wikitech.wikimedia.org/wiki/Search#In_place_reindex:

function reindex_single_es_index() {
    cluster="$1"
    wiki="$2"
    es_index_suffix="$3"
    mkdir -p "$HOME/cirrus_log/"
    reindex_log="$HOME/cirrus_log/$wiki.$cluster.reindex.log"
    if [ -z "$cluster" -o -z "$wiki" -z "es_index_suffix" ]; then
        echo "Usage: reindex [cluster] [wiki] [es_index_suffix]"
        return 1
    fi
    TZ=UTC export REINDEX_START=$(date +%Y-%m-%dT%H:%m:%SZ)
    echo "Started at $REINDEX_START" > "$reindex_log"
    mwscript extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --wiki $wiki --cluster $cluster --indexType $es_index_suffix --reindexAndRemoveOk --indexIdentifier now 2>&1 | tee -a "$reindex_log" && \
    mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --deletes | tee -a "$reindex_log" && \
    mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --queue | tee -a "$reindex_log"
    mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --archive| tee -a "$reindex_log"
}

quick ruby script to generate all [cluster, wiki] permutations to reduce the chance of human error:

# https://phabricator.wikimedia.org/T256928 -> https://phabricator.wikimedia.org/T257387

clusters = ['eqiad', 'codfw', 'cloudelastic']
indices  = ['enwiki_content', 'dewiki_content', 'commonswiki_file', 'viwiki_general']

final_output = ""
clusters.each do |cluster|
  indices.each_with_index do |es_index, idx|
      es_index_base_name = es_index.split("_")[0]
      es_index_type      = es_index.split("_")[1]
    unless (idx == indices.size - 1)
      final_output += "reindex_single_es_index #{cluster} #{es_index_base_name} #{es_index_type} && \\\n"
    else
      final_output += "reindex_single_es_index #{cluster} #{es_index_base_name} #{es_index_type}"
    end
  end
  final_output += "\n\n"
end

puts final_output.chomp # remove extra trailing newline

result (after some manual randomization of the order of indices within a cluster):

reindex_single_es_index eqiad enwiki content && \
reindex_single_es_index eqiad dewiki content && \
reindex_single_es_index eqiad commonswiki file && \
reindex_single_es_index eqiad viwiki general

reindex_single_es_index codfw dewiki content && \
reindex_single_es_index codfw viwiki general && \
reindex_single_es_index codfw commonswiki file && \
reindex_single_es_index codfw enwiki content


reindex_single_es_index cloudelastic viwiki general && \
reindex_single_es_index cloudelastic dewiki content && \
reindex_single_es_index cloudelastic enwiki content && \
reindex_single_es_index cloudelastic commonswiki file

So, we'll want a single tmux session divided into 3 panes, where each pane is running one of those 3 blocks.

Since we're not reindexing the whole world, this shouldn't take as long as normal reindex of everything, but we are doing the biggest wikis so I would still expect this to take quite some time.

Event Timeline

RKemper updated the task description. (Show Details)

@Gehel: Commands are ready for review, see the above task description (final codeblock specifically). See changes in https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/608965/8/wmf-config/InitialiseSettings.php for/if verifying that the list of wikis I used in the ruby script is correct.

Erik had two optimizations that both make sense. I've listed them in descending order of forecasted performance gain:

(1) Since we're not re-doing every index within a wiki (e.g. we only need to change wikicommons_file, not all wikicommons), we can manually call UpdateOneSearchIndexConfig so that we only update the indices we care about

(2) Since extensions/CirrusSearch/maintenance/ForceSearchIndex.php involves (intentionally) running poorly indexed queries, we should shuffle/randomize the order we perform the ForceSearchIndex in between the 3 separate tmux panes (i.e. between each cluster). This should help us prevent from thrashing, since each of the wikis we are updating happens to be in a separate database group as confirmed by dblists/s?.dblist files in the mediawiki-config repo

I'll work on getting those two changes integrated before the next round of review.

Updated the main ticket body with the results after performing the two optimizations I spoke about here:

Erik had two optimizations that both make sense. I've listed them in descending order of forecasted performance gain:

(1) Since we're not re-doing every index within a wiki (e.g. we only need to change wikicommons_file, not all wikicommons), we can manually call UpdateOneSearchIndexConfig so that we only update the indices we care about

(2) Since extensions/CirrusSearch/maintenance/ForceSearchIndex.php involves (intentionally) running poorly indexed queries, we should shuffle/randomize the order we perform the ForceSearchIndex in between the 3 separate tmux panes (i.e. between each cluster). This should help us prevent from thrashing, since each of the wikis we are updating happens to be in a separate database group as confirmed by dblists/s?.dblist files in the mediawiki-config repo

I'll work on getting those two changes integrated before the next round of review.

I'd forgotten to update the state but this ticket's been done for > a week.

Marked resolved.