Page MenuHomePhabricator

External referrer & WDQS metrics stopped updating on 2021-04-25
Closed, DeclinedPublic

Description

Reported by Laurence "GreenReaper" Parry from (Wikibase Community User Group).

There was a previous outage that was tracked and resolved in T279443

Interestingly:

$ tail -n 1 external_traffic/referer_*
==> external_traffic/referer_data.tsv <==
2021-04-25	TRUE	external (search engine)	Ecosia	mobile web	498982

==> external_traffic/referer_data.tsv.tmp <==
Scaling row group sizes to 93.16% for 1 writers	access_method	date	is_search	pageviews	referer_class	search_engine

==> external_traffic/referer_nonbot_data.tsv <==
2021-04-25	TRUE	external (search engine)	Ecosia	mobile web	498872

==> external_traffic/referer_nonbot_data.tsv.tmp <==
Scaling row group sizes to 93.16% for 1 writers	access_method	date	is_search	pageviews	referer_class	search_engine

and

$ tail -n 1 wdqs/basic_usage*
==> wdqs/basic_usage.tsv <==
2021-04-25	/bigdata/namespace/wdq/sparql	TRUE	FALSE	950887

==> wdqs/basic_usage.tsv.tmp <==
Scaling row group sizes to 92.77% for 1 writers	date	events	http_success	is_automata	path

It seems that Reportupdater was updated in April to enable hive as a report type. Might be useful to switch to that, but still wouldn't explain the error here.


Unfortunately as of right now 90 days is already April 27th, so there's going to be a gap in the data no matter what. Just a question of how a big at this point.


Looking art those .tmp files it would appear I probably need to add some more grep's to https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/golden/+/refs/heads/master/modules/metrics/external_traffic/referer_data and https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/golden/+/refs/heads/master/modules/metrics/wdqs/basic_usage

Event Timeline

mpopov updated the task description. (Show Details)

After consulting with Isaac (Research) about external traffic one and Mike (Search Platform) about WDQS one, I'm making the call to finally decommission the dashboards (T227782). The data that has already been collected will remain available for historical/archival reasons, but no further work will be done to get new data. As such, I'm declining this task.