Page MenuHomePhabricator

WDCM did not update in September
Closed, ResolvedPublic

Description

For reasons unknown, the WDCM main update (Overview, Usage, and Semantics dashboard) has failed for September.

  • This is critical; inspecting now.

Event Timeline

  • The eu_touched field is not present in the wbc_entity_usage table for enwiki anymore; this has probably led to the WDCM R engine update exiting.
  • However, no notice of this has been posted to T144010 (Drop eu_touched in Production).

Everything is done except s4 (commons) master as specified on the description of the task.

Adapting the Sqoop WDCM script now.

  • Additional clean-ups added to WDCM_Sqoop_Clients.R WDCMmodule;
  • WDCM_Sqoop_Clients.R WDCM module adapted for changes in the wbc_entity_usage tables;
  • Running WDCM_Sqoop_Clients.R manually from stat1004 now.
  • WDCM_Sqoop_Clients.R completed in approx. 8 hours;
  • Running ETL procedures for WDCM Sitelinks (Wikipedia Semantics now) to be able to continue the development of the respective dashboard, then
  • running manually a full main WDCM update (Semantics, Usage, Overview dashboards will be served).
  • Most recent: Error in fread(lFitems[i]) : File is empty: Gene_ItemIDs.csv;
  • inspecting now.
  • Running manually a full WDCM update now, while droping Gene(Q7187) from the dataset just in order to get the dashboard back running and updated ASAP;
  • Implement a different approach to collecting items for WDCM based on T202988 (thanks @Smalyshev @Lydia_Pintscher).
  • This is going to take some time; we will have to accept to live with the August update until I figure out exactly how to solve these problems once and for all.
  • If that means dump processing from R, then be it. Hopefully, service: gas will be able to provide.
  • WDCM semantic category Gene(Q7187) is back;
  • This is all probably solved in the related T203234 ticket.
  • The pressure at the Analytics Cluster is currently high;
  • wait until the most massive jobs there finished, then
  • run the WDCM main update manually; inspect all issues (if any; everything seems fine now).
  • 17:35 CEST (approx): running WDCM main update engine manually now.
  • WDCM main update completed.
  • {mapttpx} runs for the main updated are now parallelized.
  • Waiting for the CloudVPS component to sync, then evaluate.
  • In case of positive evaluation, put the main update back on cron.
  • Sync complete; WDCM main dashboards updated;
  • Changes in the WDCM_SqoopClients.R introduced;
  • Soon to re-run the update.

Closing.