Page MenuHomePhabricator

WDCM Regular Updates
Closed, ResolvedPublic

Description

  • Following the puppetization of the WDCM: run the whole system on regular monthly updates
  • Add a timestamp info somewhere in the WDCM dashboards

Event Timeline

@Lydia_Pintscher

Status:

  • during the weekend, another (final) manual WDCM update before officially putting the WDCM Engine in production; puppetization in itself is almost done (see T171258#3761115)
  • standardizing the update reporting code;
  • putting the whole thing on cron stat1005 in production should not be problematic.

This is stalled until T180902 is resolved and T171258 is fully resolved. In itself, putting the WDCM split across stat1004 and stat1005 on cron and sync with labs should not present a problem at all. Well hopefully not.

GoranSMilovanovic changed the task status from Open to Stalled.Nov 26 2017, 2:33 PM
  • WDCM is now running across stat1004 and stat1005 on cron and in sync with the wikidataconcepts.eqiad.wmflabs that serves the front-end.
  • Regular productionized updates are stalled until T180902 and T171258 are resolved.

@Lydia_Pintscher

  • Following the developments on T210147:
  • WDCM main update engine will run on weekly basis,
  • synced to start 10 hours after the onset of the Sqoop procedure (i.e. transfer from MariaDB to HDFS),
  • on 1st, 7th, 14th, 20th, and 27th each month - so we will have five monthly updates.

With the new procedures and following its scaling to Apache Spark we could run WDCM on daily basis with no trouble at all, except that we cannot do that because the Sqoop transfer from MariaDB (client wbc_entity_usage tables) to HDFS takes hours to complete. I will see if there is anything that I can do to speed it up, but I think the chances are slim.