Page MenuHomePhabricator

WDCM Semantic and Geo dashboards do not respond
Open, HighPublic

Description

Event Timeline

GoranSMilovanovic renamed this task from WDCM Semantic Dashboard does not respond to WDCM Semantic and Geo dashboards do not respond.Tue, Nov 26, 8:41 AM
GoranSMilovanovic updated the task description. (Show Details)
GoranSMilovanovic moved this task from Technical Wishlist to WDCM on the User-GoranSMilovanovic board.
GoranSMilovanovic triaged this task as High priority.Mon, Dec 2, 10:38 AM
  • This is more serious than what it seemed to be following my initial assessments.
  • Possible cause: change in R packages used on the dashboard, possibly {curl}.
  • Inspecting now.

WDCM public data set:

  • wdcm_project.csv
  • Update timestamp: 2019-12-01 11:44

has no data on dewiki and many other.
Inspecting now.

GoranSMilovanovic added a comment.EditedMon, Dec 2, 10:59 AM
  • Moreover, dewiki and potentially other wikies are not found in the 2019-12-01 11:44 update at all;
  • However, dewiki was sqooped by WDCM_Sqoop_Clients.R from stat1004:

824 s5-analytics-replica.eqiad.wmnet -P 3315 dewiki 2019-12-01 12:56:40 2019-12-01 12:57:21

So it could be a pyspark ETL thing? Inspecting the issue.

Finding:

  • WDCM_Sqoop_Clients.R unexpectedly took ~13h to update;
  • wdcmModule_Orchestra.R was thus run (on 10:00 UTC) out of sync with the wdcm_clients_wb_entity_usage table;
  • and this is the possible cause of the observed missing data.

Next step:

  • re-run wdcmModule_Orchestra.R (full WDCM update).
  • Data sets are now complete following a re-run of the wdcmModule_Orchestra.R;
  • Next steps: figure out {curl} (possibly) related problems on WDCM Semantics and WDCM Geo.