Page MenuHomePhabricator

Continue New AQS Loading
Closed, ResolvedPublic21 Estimated Story Points

Description

Load data and ensure proper compaction/compression is used.

Event Timeline

JAllemandou set the point value for this task to 21.

After changing compression from lz4 to deflate and relaoding a month of data (January 2016), we are down to about 120Gb per instance, which is way better than 250G. Proceeding with loading February 2016.

Milimetric triaged this task as Medium priority.Aug 8 2016, 4:52 PM

For my fellow an-engineers to replace me while I'm on holidays: https://etherpad.wikimedia.org/p/backfilling_aqs

Tested a bit how are we doing consistency wise and thus far things checkout. I found 1 issue. See repro below.

Current API:
http://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/wikidata.org/all-access/user/Q604141/daily/20160601/20160630

In new cluster:
http://localhost:7232/analytics.wikimedia.org/v1/pageviews/per-article/wikidata.org/all-access/user/Q604141/daily/20160601/20160630

In new cluster we are storing: {"project":"wikidata","article":"Q604141","granularity":"daily","timestamp":"2016060100","access":"all-access","agent":"user","views":null} and in the old cluster we have zeroes rather than nulls to represent lack of views[{"project":"wikidata","article":"Q604141","granularity":"daily","timestamp":"2016060100","access":"all-access","agent":"user","views":0}

I think storage wise the new cluster is correct but the API should not return null, it should map null to zero.

We need to load data for all endpoints. Unique devices, top data.

Change 309602 had a related patch set uploaded (by Nuria):
Change default compression scheme

https://gerrit.wikimedia.org/r/309602

Data all loade for all endpoints except daily-top, currently finishing.

Change 309602 merged by Nuria:
Update per-article compression scheme to default (LCS)

https://gerrit.wikimedia.org/r/309602

Nuria moved this task from Ready to Deploy to Done on the Analytics-Kanban board.

Change 315283 had a related patch set uploaded (by Elukey):
Update per-article compression scheme to default (LCS)

https://gerrit.wikimedia.org/r/315283

Change 315283 merged by Nuria:
Update per-article compression scheme to default (LCS)

https://gerrit.wikimedia.org/r/315283