Load data and ensure proper compaction/compression is used.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Duplicate | • mobrovac | T125345 Many error 500 from pageviews API "Error in Cassandra table storage backend" | |||
Resolved | JAllemandou | T124314 Better response times on AQS (Pageview API mostly) {melc} | |||
Resolved | • Nuria | T140866 Continue New AQS Loading | |||
Resolved | JAllemandou | T145089 Load top article data into new AQS cluster |
Event Timeline
After changing compression from lz4 to deflate and relaoding a month of data (January 2016), we are down to about 120Gb per instance, which is way better than 250G. Proceeding with loading February 2016.
For my fellow an-engineers to replace me while I'm on holidays: https://etherpad.wikimedia.org/p/backfilling_aqs
Tested a bit how are we doing consistency wise and thus far things checkout. I found 1 issue. See repro below.
In new cluster:
http://localhost:7232/analytics.wikimedia.org/v1/pageviews/per-article/wikidata.org/all-access/user/Q604141/daily/20160601/20160630
In new cluster we are storing: {"project":"wikidata","article":"Q604141","granularity":"daily","timestamp":"2016060100","access":"all-access","agent":"user","views":null} and in the old cluster we have zeroes rather than nulls to represent lack of views[{"project":"wikidata","article":"Q604141","granularity":"daily","timestamp":"2016060100","access":"all-access","agent":"user","views":0}
I think storage wise the new cluster is correct but the API should not return null, it should map null to zero.
Change 309602 had a related patch set uploaded (by Nuria):
Change default compression scheme
Change 309602 merged by Nuria:
Update per-article compression scheme to default (LCS)
Change 315283 had a related patch set uploaded (by Elukey):
Update per-article compression scheme to default (LCS)
Change 315283 merged by Nuria:
Update per-article compression scheme to default (LCS)