Page MenuHomePhabricator

Refactor CloudVPS WMDE dashboards update engine
Closed, ResolvedPublic

Description

  • We run one update engine code from CloudVPS to update all WMDE dashboards that are not fully client-side dependent;
  • Refactor the code so that each dashboard has a separate update engine, update all repos.

Event Timeline

  • WDCM Usage and Overview dashboards are (partially) refactored now;
  • still running a single update engine for all WMDE CloudVPS engines.
  • WDCM Semantics dashboard is partially refactored.
  • Wiktionary Cognate Dashboard now has an independent update engine in CloudVPS;
  • testing now.
  • next step: TW Advanced Search Extension update engine: re-factor from WDCM.
  • TW Advanced Search Extension update engine is now re-factored from the WDCM CloudVPS update engine;
  • Next step: xml config/clean-up for the WDCM CloudVPS update engine.
  • The central component of the WDCM CloudVPS update is now fully re-factored + xml configuration file is ready;
  • this change affects:
    • WDCM Overview Dashboard
    • WDCM Semantics Dashboard
    • WDCM Usage Dashboard;
  • next steps: WDCM_Geo, WDCM_Biases, WDCM (S)itelinks, and WDCM (T)itles.
  • In order to make WDCM dashboards exclusively client-side dependent,
  • the fetch_label procedure (Wikidata API calls) must migrate from CloudVPS (the WDCM_Process.R module will be deprecated)
  • to production (wdcmModule_ETL.py).
  • test run of the new WDCM_Process.R (now implemented in the system's back-end, pyspark) successful;
  • 02/26: deploy in production as pyspark code; completely remove WDCM Process.R from the loop;
  • 02/27-28: deploy the client-side dependent version of the WDCM Usage Dashboard in production from CloudVPS.
  • running WDCM update engine manually to account for changes related to T217156
  • WDCM_CollectItems.R module modified and delivers as expected;
  • WDCM_ETL.py was a bit harder to crack but it is now being tested and it seems to run smoothly.
  • WDCM_ML.R machine-learning module adapted for new data sets;
  • running tests.
  • WDCM_ML.R machine-learning module tested;
  • new WDCM public data sets produced;
  • WDCM front-end updates are now stalled;
  • start switching to fully client-side dependent dashboards.
  • wdcm_category table too large (171 Mb);
  • debugging now; affected module is WDCM_ETL.py
  • bug fix for wdcm_category in WDCM_ETL.py;
  • testing now.
  • bug for wdcm_category solved;
  • proceed w. dashboard re-engineering (implement client-side dependencies).
  • bug for wdcm_category_item aggregate fixed;
  • proceed w. dashboard re-engineering (implement client-side dependencies): WDCM Usage dashboard.