The clickstream datasets generated for month 2024-04 have a number of lines that matches previous months, but the between-pages links referenced are all flagged other while they should be flagged link.
This is a data-dependency issue: the job is not waiting for the linktarget table to be present in Hive before starting.
Description
Description
Details
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Update clickstream job - better joins | analytics/refinery/source | master | +23 -9 |
Event Timeline
Comment Actions
Mentioned in SAL (#wikimedia-analytics) [2024-05-28T07:53:01Z] <joal> manually rerun clickstream job for 2024-04 to pick up linktarget data that was not present at the moment it ran automatically (T366042)
Comment Actions
joal opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/709
Fix analytics clickstream job missing a sensor
Comment Actions
Change #1037370 had a related patch set uploaded (by Joal; author: Joal):
[analytics/refinery/source@master] Update clickstream job - better joins
Comment Actions
Change #1037370 merged by jenkins-bot:
[analytics/refinery/source@master] Update clickstream job - better joins
Comment Actions
joal merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/709
Fix analytics clickstream job missing a sensor