Page MenuHomePhabricator

Repair and reload all cassandra-2 data tables but the 2 big ones
Open, In Progress, HighPublic

Description

The 2 big ones NOT TO DO are:

  • pageviews_per_article
  • mediarequests_per_file

This is needed to overcome the problem of some data being not replicated correctly, and therefore missing from the single-rack snapshot.
Step to perform:

  • Repair
  • Snapshot
  • Transfer
  • Load
  • QA

Event Timeline

BTullis changed the task status from Open to In Progress.Tue, Sep 21, 10:04 AM
BTullis claimed this task.
BTullis triaged this task as High priority.
BTullis moved this task from Next Up to In Progress on the Analytics-Kanban board.

I have completed the single-rack repair of all tables with the exception of the two biggest.

To begin I created the following list of keyspaces as a fiole named repair_keyspaces:

local_group_default_T_editors_bycountry
local_group_default_T_lgc_pagecounts_per_project
local_group_default_T_mediarequest_per_referer
local_group_default_T_mediarequest_top_files
local_group_default_T_pageviews_per_project
local_group_default_T_pageviews_per_project_v2
local_group_default_T_top_bycountry
local_group_default_T_top_pageviews
local_group_default_T_top_percountry
local_group_default_T_unique_devices

I then executed a nodetool repair for each of these keyspaces with the commands:

btullis@aqs1004:~$ for keyspace in $(cat repair_keyspaces);do sudo nodetool-a repair --full $keyspace ; done
btullis@aqs1004:~$ for keyspace in $(cat repair_keyspaces);do sudo nodetool-b repair --full $keyspace ; done
btullis@aqs1007:~$ for keyspace in $(cat repair_keyspaces);do sudo nodetool-a repair --full $keyspace ; done
btullis@aqs1007:~$ for keyspace in $(cat repair_keyspaces);do sudo nodetool-b repair --full $keyspace ; done

These completed without error.

I then repaired the meta tables in the two keyspaces which contain large data tables.

btullis@aqs1004:~$ sudo nodetool-a repair --full local_group_default_T_mediarequest_per_file meta
btullis@aqs1004:~$ sudo nodetool-b repair --full local_group_default_T_mediarequest_per_file meta
btullis@aqs1004:~$ sudo nodetool-a repair --full local_group_default_T_pageviews_per_article_flat meta
btullis@aqs1004:~$ sudo nodetool-b repair --full local_group_default_T_pageviews_per_article_flat meta
btullis@aqs1007:~$ sudo nodetool-a repair --full local_group_default_T_mediarequest_per_file meta
btullis@aqs1007:~$ sudo nodetool-b repair --full local_group_default_T_mediarequest_per_file meta
btullis@aqs1007:~$ sudo nodetool-a repair --full local_group_default_T_pageviews_per_article_flat meta
btullis@aqs1007:~$ sudo nodetool-b repair --full local_group_default_T_pageviews_per_article_flat meta

Once again, these completed without error.

That means that all tables in all keyspaces have been fully repaired in a single rack, with the exception of the two largest.

Now that the repair has completed, we can carry out the next steps.
Building upon the experience from T291473: Test snapshot-reload from all instances using pageview-top data table I will use scripts as much as possible in order to reduce the chance of errors.
Broadly the steps will be.

  1. Check that the loading job is not happening before making a snapshot
  2. Use a single cumin command to create four new snapshots that are tagged with this ticket ID on aqs1004 and aqs1007.
  3. Use a single cumin command to remove the previous snapshots on the destination servers: aqs1010 and aqs1011.
  4. Copy and adapt the transfer_rsync_T291473 script to rsync every table from the four snapshots. excluding the two large ones, to the destination hosts
  5. Copy and adapt the reload_T291473 script to rename and reload the snapshots in sequence on the destination hosts

Snapshot command:
We need snapshots across 10 keyspaces on 4 instances

sudo cumin 'aqs100[4,7].eqiad.wmnet' --mode async 'nodetool-a snapshot -t T291469' 'nodetool-b snapshot -t T291469'

This will create a snapshot across all keyspaces, which is more than we need.

We could then delete the snapshots we don't need with:

sudo cumin 'aqs100[4,7].eqiad.wmnet' --mode async 'nodetool-a clearsnapshot -t T291469 local_group_default_T_mediarequest_per_file local_group_default_T_pageviews_per_article_flat' 'nodetool-b clearsnapshot -t T291469 local_group_default_T_mediarequest_per_file local_group_default_T_pageviews_per_article_flat'

@JAllemandou - What about the local_group_default_T_pageviews_per_project tables?
Since the data table for this doesn't exist on the new cluster, what should we do about it? Do we need to do anything different about the data and meta tables for it?

Should we still snapshot it and transfer/reload just the meta table?
Should we omit the snapshot for this keyspace altogether?

@JAllemandou - What about the local_group_default_T_pageviews_per_project tables?
Since the data table for this doesn't exist on the new cluster, what should we do about it? Do we need to do anything different about the data and meta tables for it?

Should we still snapshot it and transfer/reload just the meta table?
Should we omit the snapshot for this keyspace altogether?

I think we should omit the keyspace entirely. These tables should have been deleted long ago, and omitting them will prove that (nothing will fail on the new cluster, hopefully).

OK, thanks.
We already imported the meta table in that keyspace didn't we? So should we delete the whole local_group_default_T_pageviews_per_project from the aqs_new cluster?

Right, in that case we can do this:

sudo cumin 'aqs100[4,7].eqiad.wmnet' --mode async 'nodetool-a snapshot -t T291469' 'nodetool-b snapshot -t T291469'
sudo cumin 'aqs100[4,7].eqiad.wmnet' --mode async 'nodetool-a clearsnapshot -t T291469 local_group_default_T_mediarequest_per_file local_group_default_T_pageviews_per_article_flat local_group_default_T_pageviews_per_project' 'nodetool-b clearsnapshot -t T291469 local_group_default_T_mediarequest_per_file local_group_default_T_pageviews_per_article_flat local_group_default_T_pageviews_per_project'

That's creating a named snapshot across all keyspaces, then deleting the snapshot for the keyspaces we are not going to use.

  • The oozie loader job isn't running.

Creating the snapshots now.

btullis@cumin1001:~$ sudo cumin --mode async 'aqs100[4,7].eqiad.wmnet' 'nodetool-a snapshot -t T291469' 'nodetool-b snapshot -t T291469'

Mentioned in SAL (#wikimedia-analytics) [2021-09-24T15:06:18Z] <btullis> btullis@cumin1001:~$ sudo cumin --mode async 'aqs100[4,7].eqiad.wmnet' 'nodetool-a snapshot -t T291469' 'nodetool-b snapshot -t T291469'

Snapshots created successfully.

2 hosts will be targeted:
aqs[1004,1007].eqiad.wmnet
Ok to proceed on 2 hosts? Enter the number of affected hosts to confirm or "q" to quit 2
===== NODE GROUP =====                                                                                                                                                                                             
(4) aqs[1004,1007].eqiad.wmnet                                                                                                                                                                                     
----- OUTPUT -----                                                                                                                                                                                                 
Requested creating snapshot(s) for [all keyspaces] with snapshot name [T291469]                                                                                                                                    
Snapshot directory: T291469                                                                                                                                                                                        
================                                                                                                                                                                                                   
PASS |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (2/2) [01:16<00:00, 41.05s/hosts]          
FAIL |                                                                                                                                                                   |   0% (0/2) [01:16<?, ?hosts/s]
100.0% (2/2) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

Here is the reworked rsync transfer script. Trying a dry-run now.

#!/bin/bash

DOMAIN='eqiad.wmnet'
SOURCES=('aqs1004' 'aqs1007')
DESTINATIONS=('aqs1011' 'aqs1010')
INSTANCES=('a' 'b')
INCLUDES=$(cat <<-END
	local_group_default_T_editors_bycountry
	local_group_default_T_editors_bycountry/data-*
	local_group_default_T_editors_bycountry/data-*/snapshots
	local_group_default_T_editors_bycountry/data-*/snapshots/T291469/***
	local_group_default_T_lgc_pagecounts_per_project
	local_group_default_T_lgc_pagecounts_per_project/data-*
	local_group_default_T_lgc_pagecounts_per_project/data-*/snapshots
	local_group_default_T_lgc_pagecounts_per_project/data-*/snapshots/T291469/***
	local_group_default_T_mediarequest_per_referer
	local_group_default_T_mediarequest_per_referer/data-*
	local_group_default_T_mediarequest_per_referer/data-*/snapshots
	local_group_default_T_mediarequest_per_referer/data-*/snapshots/T291469/***
	local_group_default_T_mediarequest_top_files
	local_group_default_T_mediarequest_top_files/data-*
	local_group_default_T_mediarequest_top_files/data-*/snapshots
	local_group_default_T_mediarequest_top_files/data-*/snapshots/T291469/***
	local_group_default_T_pageviews_per_project_v2
	local_group_default_T_pageviews_per_project_v2/data-*
	local_group_default_T_pageviews_per_project_v2/data-*/snapshots
	local_group_default_T_pageviews_per_project_v2/data-*/snapshots/T291469/***
	local_group_default_T_top_bycountry
	local_group_default_T_top_bycountry/data-*
	local_group_default_T_top_bycountry/data-*/snapshots
	local_group_default_T_top_bycountry/data-*/snapshots/T291469/***
	local_group_default_T_top_pageviews
	local_group_default_T_top_pageviews/data-*
	local_group_default_T_top_pageviews/data-*/snapshots
	local_group_default_T_top_pageviews/data-*/snapshots/T291469/***
	local_group_default_T_top_percountry
	local_group_default_T_top_percountry/data-*
	local_group_default_T_top_percountry/data-*/snapshots
	local_group_default_T_top_percountry/data-*/snapshots/T291469/***
	local_group_default_T_unique_devices
	local_group_default_T_unique_devices/data-*
	local_group_default_T_unique_devices/data-*/snapshots
	local_group_default_T_unique_devices/data-*/snapshots/T291469/***
END
)
RSYNC=/usr/bin/rsync
RSYNCOPTIONS='-av'
EXCLUDES='--exclude "*"'

for i in $INCLUDES
do
  RSYNCINCLUDES+="--include $i "
done

for i in $(seq 0 1)
do
  for INSTANCE in "${INSTANCES[@]}"
  do
    echo ''
    echo "### Transferring ${SOURCES[$i]}.$DOMAIN instance $INSTANCE to ${DESTINATIONS[$i]}.$DOMAIN"
    sudo cumin --force "${SOURCES[$i]}.$DOMAIN" "$RSYNC $RSYNCOPTIONS $RSYNCINCLUDES $EXCLUDES /srv/cassandra-${INSTANCE}/data/ ${DESTINATIONS[$i]}.$DOMAIN::transfer_cassandra_${INSTANCE}_tmp"
  done
done

I have emptied the destination directories on aqs1010 and aqs1011. This cumin command shows that there are no files present.

btullis@cumin1001:~$ sudo cumin --mode async 'aqs101[0,1].eqiad.wmnet' 'tree /srv/cassandra-a/tmp' 'tree /srv/cassandra-b/tmp'
2 hosts will be targeted:
aqs[1010-1011].eqiad.wmnet
Ok to proceed on 2 hosts? Enter the number of affected hosts to confirm or "q" to quit 2
===== NODE GROUP =====                                                                                                                                                                                             
(2) aqs[1010-1011].eqiad.wmnet                                                                                                                                                                                     
----- OUTPUT -----                                                                                                                                                                                                 
/srv/cassandra-b/tmp                                                                                                                                                                                               
                                                                                                                                                                                                                   
0 directories, 0 files
===== NODE GROUP =====                                                                                                                                                                                             
(2) aqs[1010-1011].eqiad.wmnet                                                                                                                                                                                     
----- OUTPUT -----                                                                                                                                                                                                 
/srv/cassandra-a/tmp                                                                                                                                                                                               
                                                                                                                                                                                                                   
0 directories, 0 files
================                                                                                                                                                                                                   
PASS |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (2/2) [00:00<00:00,  2.09hosts/s]          
FAIL |                                                                                                                                                                   |   0% (0/2) [00:00<?, ?hosts/s]
100.0% (2/2) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

Dry-run succeded. Running the rsync now.

The rsync appears to have been successful. No errors shown from the script output and the directory sizes all appear to be correct.

btullis@cumin1001:~$ sudo cumin --mode async 'aqs101[0,1].eqiad.wmnet' 'tree /srv/cassandra-a/tmp|tail -n2' 'tree /srv/cassandra-b/tmp|tail -n2'
2 hosts will be targeted:
aqs[1010-1011].eqiad.wmnet
Ok to proceed on 2 hosts? Enter the number of affected hosts to confirm or "q" to quit 2
===== NODE GROUP =====                                                                                                                                                                                             
(1) aqs1010.eqiad.wmnet                                                                                                                                                                                            
----- OUTPUT -----                                                                                                                                                                                                 
                                                                                                                                                                                                                   
37 directories, 1329 files                                                                                                                                                                                         
===== NODE GROUP =====                                                                                                                                                                                             
(1) aqs1011.eqiad.wmnet                                                                                                                                                                                            
----- OUTPUT -----                                                                                                                                                                                                 
                                                                                                                                                                                                                   
37 directories, 1137 files                                                                                                                                                                                         
===== NODE GROUP =====                                                                                                                                                                                             
(1) aqs1010.eqiad.wmnet                                                                                                                                                                                            
----- OUTPUT -----                                                                                                                                                                                                 
                                                                                                                                                                                                                   
37 directories, 1097 files                                                                                                                                                                                         
===== NODE GROUP =====                                                                                                                                                                                             
(1) aqs1011.eqiad.wmnet                                                                                                                                                                                            
----- OUTPUT -----                                                                                                                                                                                                 
                                                                                                                                                                                                                   
37 directories, 1209 files                                                                                                                                                                                         
================                                                                                                                                                                                                   
PASS |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (2/2) [00:00<00:00,  2.05hosts/s]          
FAIL |                                                                                                                                                                   |   0% (0/2) [00:00<?, ?hosts/s]
100.0% (2/2) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

This is the reload script. I will run it on Monday.

#!/bin/bash

set -e

DOMAIN='eqiad.wmnet'
HOSTS=('aqs1010' 'aqs1011')
INSTANCES=('a' 'b')
KEYSPACES=$(cat <<-END
	local_group_default_T_editors_bycountry
	local_group_default_T_lgc_pagecounts_per_project
	local_group_default_T_mediarequest_per_referer
	local_group_default_T_mediarequest_top_files
	local_group_default_T_pageviews_per_project_v2
	local_group_default_T_top_bycountry
	local_group_default_T_top_pageviews
	local_group_default_T_top_percountry
	local_group_default_T_unique_devices
END
)
USERNAME=cassandra
PASSWORD=cassandra

for KEYSPACE in ${KEYSPACES}
do
  echo "### Working on keyspace ${KEYSPACE}"
  for i in $(seq 0 1)
  do
    for INSTANCE in "${INSTANCES[@]}"
    do
      echo ''
      DATAPATHS=$(sudo cumin --force --no-progress --no-color -o txt "${HOSTS[$i]}.${DOMAIN}" "find /srv/cassandra-${INSTANCE}/tmp/${KEYSPACE} -maxdepth 1 -mindepth 1 -type d"| \
        awk 'x==1 { print $0 } /_____FORMATTED_OUTPUT_____/ { x=1 }'| \
        awk '{print $2}')
      for DATAPATH in ${DATAPATHS}
      do
        TABLE=$(echo ${DATAPATH} | awk -F / '{print $NF}' | awk -F - '{print $1}')
        DESTPATH=/srv/cassandra-${INSTANCE}/tmp/${KEYSPACE}/${TABLE}
        echo "### Creating destination directory for table ${TABLE} in keyspace ${KEYSPACE}"
        sudo cumin --force "${HOSTS[$i]}.${DOMAIN}" "mkdir -p ${DESTPATH}"
        echo "### Moving table ${TABLE} in keyspace ${KEYSPACE} for instance ${INSTANCE} on ${HOSTS[$i]} to ${DESTPATH}"
        sudo cumin --force --mode async "${HOSTS[$i]}.${DOMAIN}" "find ${DATAPATH} -type f -exec mv {} ${DESTPATH} \;" "find ${DATAPATH} -type d -empty -delete"
        echo "### Reloading table ${TABLE} in keyspace ${KEYSPACE} on host ${HOSTS[$i]}.${DOMAIN} instance ${INSTANCE}"
        sudo cumin --force "${HOSTS[$i]}.$DOMAIN" "sstableloader -u ${USERNAME} -f /etc/cassandra-${INSTANCE}/cassandra.yaml -d ${HOSTS[$i]}-${INSTANCE} -pw ${PASSWORD} ${DESTPATH}"
        echo ''
      done
    done
  done
done

Reload of all 4 snapshots of these repaired tables is under way.

The load of all of the repaired tables from the 4 instances on aqs1004 and aqs1007 is complete.
@JAllemandou - Is there a testing process that you would like to carry out now?

Thanks @BTullis :) Assigning the task to myself for the last round of checks, then I'll close it.

Cleared these snapshots.

btullis@aqs1007:~$ sudo nodetool-a clearsnapshot -t T291469
Requested clearing snapshot(s) for [all keyspaces] with snapshot name [T291469]
btullis@aqs1007:~$ sudo nodetool-b clearsnapshot -t T291469
Requested clearing snapshot(s) for [all keyspaces] with snapshot name [T291469]
btullis@aqs1004:~$ sudo nodetool-a clearsnapshot -t T291469
Requested clearing snapshot(s) for [all keyspaces] with snapshot name [T291469]
btullis@aqs1004:~$ sudo nodetool-b clearsnapshot -t T291469
Requested clearing snapshot(s) for [all keyspaces] with snapshot name [T291469]
JAllemandou updated the task description. (Show Details)

I checked all data of the small tables between old and new - they match :)