Check AQS with cassandra (serving + data)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	JAllemandou
	Aug 31 2021, 11:45 AM

Description

To be checked:

Druid endpoint are functioning and returning same result as old cluster
Data loaded using oozie job is the same in both clusters (using AQS endpoints and cassandra direct calls for pageview-per-articles and mediarequest-per-file)
Data loaded from the original cluster is the same in both cluster (using AQS endpoints and cassandra direct calls for pageview-per-articles and mediarequest-per-file)
Data with special characters (tabs for pageview) is loaded and retrieved as expected in both clusters (manual AQS endpoints tests on specific examples)

Details

	Subject	Repo	Branch	Lines +/-
	Fix mediarequest top cassandra3 loading jobs	analytics/refinery	master	+2 -2

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	BTullis	T249755 Cassandra3 migration for Analytics AQS
Resolved	JAllemandou	T290068 Check AQS with cassandra (serving + data)
Resolved	JAllemandou	T291469 Repair and reload all cassandra-2 data tables but the 2 big ones
Resolved	JAllemandou	T291470 Repair and reload cassandra2 mediarequest_per_file data table
Resolved	JAllemandou	T291473 Test snapshot-reload from all instances using pageview-top data table
Resolved	JAllemandou	T291472 Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances

Event Timeline

JAllemandou created this task.Aug 31 2021, 11:45 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 31 2021, 11:45 AM

JAllemandou claimed this task.Aug 31 2021, 11:45 AM

JAllemandou updated the task description. (Show Details)

JAllemandou moved this task from Next Up to In Progress on the Analytics-Kanban board.

JAllemandou edited projects, added Analytics; removed Reading Epics (Analytics).

JAllemandou updated the task description. (Show Details)Aug 31 2021, 1:52 PM

odimitrijevic moved this task from Incoming to Analytics Query Service on the Analytics board.Sep 2 2021, 4:55 PM

Change 717174 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery@master] Fix mediarequest top cassandra3 loading jobs

https://gerrit.wikimedia.org/r/717174

Change 717174 merged by Joal:

[analytics/refinery@master] Fix mediarequest top cassandra3 loading jobs

https://gerrit.wikimedia.org/r/717174

Maintenance_bot removed a project: Patch-For-Review.Sep 3 2021, 9:10 AM

odimitrijevic triaged this task as High priority.Sep 9 2021, 4:48 PM

JAllemandou moved this task from In Progress to Paused on the Analytics-Kanban board.Sep 10 2021, 10:53 AM

JAllemandou moved this task from Paused to In Progress on the Analytics-Kanban board.Sep 14 2021, 4:17 PM

Updates after Cassandra-2 data has been copied to the cassandra-3 cluster (1 rack) for small-ish tables (all except 2 tables): All tables are mostly complete, except for a small number of rows in some tables - we are investigating this with @BTullis. Our idea is that some rows might have replication error, and would not be present within the rack we have used to snapshot/copy. To verify that one possibility is to also load data for some faulty table from the other 2 racks (data being small the cost is not big.

I am running the following commands sequentially on aqs1004.eqiad.wmnet and aqs1007.eqiad.wmnet

sudo nodetool-a repair --full local_group_default_T_pageviews_per_project_v2 data
sudo nodetool-b repair --full local_group_default_T_pageviews_per_project_v2 data

Each repair operation takes around 10 minutes on this table.
I will then create a new snapshot on each of the four instances, then transfer and load this snapshot with sstableloader on the destination hosts.

JAllemandou added a parent task: T249755: Cassandra3 migration for Analytics AQS.Sep 20 2021, 4:20 PM

JAllemandou updated the task description. (Show Details)Sep 21 2021, 8:46 AM

BTullis changed the status of subtask T291473: Test snapshot-reload from all instances using pageview-top data table from Open to In Progress.Sep 21 2021, 9:27 AM

BTullis changed the status of subtask T291469: Repair and reload all cassandra-2 data tables but the 2 big ones from Open to In Progress.Sep 21 2021, 10:04 AM

BTullis added a comment.Sep 22 2021, 9:21 AM

This comment was removed by BTullis.

BTullis changed the status of subtask T291470: Repair and reload cassandra2 mediarequest_per_file data table from Open to In Progress.Sep 22 2021, 4:04 PM

JAllemandou closed subtask T291473: Test snapshot-reload from all instances using pageview-top data table as Resolved.Sep 23 2021, 1:36 PM

BTullis changed the status of subtask T291472: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances from Open to In Progress.Sep 23 2021, 3:20 PM

JAllemandou moved this task from In Progress to Parent Tasks on the Analytics-Kanban board.Sep 28 2021, 12:58 PM

JAllemandou closed subtask T291469: Repair and reload all cassandra-2 data tables but the 2 big ones as Resolved.Oct 21 2021, 5:22 PM

JAllemandou added a project: Data-Engineering.