Page MenuHomePhabricator

Production-ready Code for WD Identifiers Dashboard
Closed, ResolvedPublic


  • Productionize ETL and statistical procedures for the WD Identifiers Dashboard
  • Possible alternative solutions to T214897 - maybe sampling can be avoided, after all.

Event Timeline

  • Sampling approach as previously implemented (see: T214897) is now phased out;
  • Jaccard similarity matrix is now obtained from the full dataset w. {text2vec}.
  • Test run production-ready ETL/ML code: now.
  • Test successful.
  • The code will be deployed on stat1007, but
  • it will not run on crontab before the WD dump dataset in the WMF Data Lake is production ready;
  • until then, the updates will be scheduled and run manually.