Page MenuHomePhabricator

Check AQS with cassandra (serving + data)
Closed, ResolvedPublic

Description

To be checked:

  • Druid endpoint are functioning and returning same result as old cluster
  • Data loaded using oozie job is the same in both clusters (using AQS endpoints and cassandra direct calls for pageview-per-articles and mediarequest-per-file)
  • Data loaded from the original cluster is the same in both cluster (using AQS endpoints and cassandra direct calls for pageview-per-articles and mediarequest-per-file)
  • Data with special characters (tabs for pageview) is loaded and retrieved as expected in both clusters (manual AQS endpoints tests on specific examples)

Event Timeline

JAllemandou updated the task description. (Show Details)
JAllemandou moved this task from Next Up to In Progress on the Analytics-Kanban board.
JAllemandou edited projects, added Analytics; removed Reading Epics (Analytics).

Change 717174 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery@master] Fix mediarequest top cassandra3 loading jobs

https://gerrit.wikimedia.org/r/717174

Change 717174 merged by Joal:

[analytics/refinery@master] Fix mediarequest top cassandra3 loading jobs

https://gerrit.wikimedia.org/r/717174

Updates after Cassandra-2 data has been copied to the cassandra-3 cluster (1 rack) for small-ish tables (all except 2 tables): All tables are mostly complete, except for a small number of rows in some tables - we are investigating this with @BTullis. Our idea is that some rows might have replication error, and would not be present within the rack we have used to snapshot/copy. To verify that one possibility is to also load data for some faulty table from the other 2 racks (data being small the cost is not big.

I am running the following commands sequentially on aqs1004.eqiad.wmnet and aqs1007.eqiad.wmnet

sudo nodetool-a repair --full local_group_default_T_pageviews_per_project_v2 data
sudo nodetool-b repair --full local_group_default_T_pageviews_per_project_v2 data

Each repair operation takes around 10 minutes on this table.
I will then create a new snapshot on each of the four instances, then transfer and load this snapshot with sstableloader on the destination hosts.

This comment was removed by BTullis.
JAllemandou updated the task description. (Show Details)