Page MenuHomePhabricator

Clickstream datasets only reference 'other' link type, no 'link'
Open, Needs TriagePublic

Description

The clickstream datasets generated for month 2024-04 have a number of lines that matches previous months, but the between-pages links referenced are all flagged other while they should be flagged link.
This is a data-dependency issue: the job is not waiting for the linktarget table to be present in Hive before starting.

Details

TitleReferenceAuthorSource BranchDest Branch
Fix analytics clickstream job missing a sensorrepos/data-engineering/airflow-dags!709joalfix_clickstreammain
Customize query in GitLab

Event Timeline

Mentioned in SAL (#wikimedia-analytics) [2024-05-28T07:53:01Z] <joal> manually rerun clickstream job for 2024-04 to pick up linktarget data that was not present at the moment it ran automatically (T366042)

Change #1037370 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery/source@master] Update clickstream job - better joins

https://gerrit.wikimedia.org/r/1037370

Change #1037370 merged by jenkins-bot:

[analytics/refinery/source@master] Update clickstream job - better joins

https://gerrit.wikimedia.org/r/1037370