Page MenuHomePhabricator

Create new mediarequests table
Closed, ResolvedPublic3 Story Points

Description

As decided by the team, we're moving forward with creating a new dataset with a project dimension instead of retrofitting the old mediacounts table with project. The schema of the table will be:

CREATE EXTERNAL TABLE IF NOT EXISTS `mediarequests` (
    `base_name`                string COMMENT 'Base name of media file',
    `file_classification`      string COMMENT 'General classification of file (image, video, audio, data, document or other)',
    `file_type`                string COMMENT 'Extension or suffix of the file (e.g. jpg, wav, pdf)',
    `total_response_size`      bigint COMMENT 'Total number of bytes',
    `request_count`            bigint COMMENT 'Total number of requests',
    `transcoding`              string COMMENT 'Transcoding that the file was requested with, e.g. resized photo or image preview of a video',
    `agent_type`               string COMMENT 'Agent accessing the media files, can be spider or user',
    `referer`                  bigint COMMENT 'Wiki project that the request was refered from. If project is not available, it will be either internal, external, or unknown'
PARTITIONED BY (
    `year`                int    COMMENT 'Unpadded year',
    `month`               int    COMMENT 'Unpadded month',
    `day`                 int    COMMENT 'Unpadded day',
    `hour`                int    COMMENT 'Unpadded hour')
CLUSTERED BY(base_name) INTO 64 BUCKETS
STORED AS PARQUETFILE
LOCATION '/wmf/data/wmf/mediarequests'
;

Event Timeline

fdans created this task.Aug 5 2019, 1:35 PM

Change 528134 had a related patch set uploaded (by Fdans; owner: Fdans):
[analytics/refinery@master] Add creation query for new nediarequests dataset

https://gerrit.wikimedia.org/r/528134

Ottomata triaged this task as High priority.Aug 5 2019, 3:34 PM
Ottomata added a project: Analytics-Kanban.
Ottomata moved this task from Incoming to Analytics Query Service on the Analytics board.

Change 529911 had a related patch set uploaded (by Fdans; owner: Fdans):
[analytics/refinery@master] [wip] Add mediarequests hourly oozie job

https://gerrit.wikimedia.org/r/529911

Change 528134 merged by Mforns:
[analytics/refinery@master] Add creation query for new nediarequests dataset

https://gerrit.wikimedia.org/r/528134

Change 529911 merged by Milimetric:
[analytics/refinery@master] Add mediarequests hourly oozie job

https://gerrit.wikimedia.org/r/529911

Change 532725 had a related patch set uploaded (by Fdans; owner: Fdans):
[analytics/refinery@master] Changes related to timestamp partitioning for mediarequests

https://gerrit.wikimedia.org/r/532725

Change 532725 merged by Joal:
[analytics/refinery@master] Change partition structure to year/month/day/hour.

https://gerrit.wikimedia.org/r/532725

Krinkle removed a subscriber: Krinkle.Fri, Aug 30, 4:03 PM
Nuria removed subscribers: Ramsey-WMF, Abit, Ian_Furst and 14 others.

ping @fdans seems like if the oozie job is working we can move this ticket to done right?

fdans moved this task from Next Up to Done on the Analytics-Kanban board.Tue, Sep 10, 4:09 PM
Nuria closed this task as Resolved.Wed, Sep 11, 9:48 PM
Nuria set the point value for this task to 3.