Page MenuHomePhabricator

WDCM Dashboards Maintenance
Closed, ResolvedPublic

Description

This is an umbrella ticket for all WDCM Dashboards

  • improvements,
  • bug fixes,
  • new features.

Event Timeline

  • Fix the update back-end engine for the [[ URL | WDCM Biases Dashboard ]]. @RazShuty
  • ! - Advanced Search Extension Dashboard update is broken: inspect, fix.
  • Status: ! - Advanced Search Extension Dashboard update has *probably* failed on one day only,
  • leading to a discontinuation of the timeseries;
  • running updates manually now to prevent data losses;
  • inspect the issue and then put the engine back to automation.
GoranSMilovanovic lowered the priority of this task from High to Medium.Apr 2 2019, 9:50 AM
GoranSMilovanovic lowered the priority of this task from Medium to Low.Apr 2 2019, 7:02 PM
  • Advanced Search Extension dashboard update fixed.
GoranSMilovanovic raised the priority of this task from Low to High.May 27 2019, 8:05 AM
  • the WDCM_Sqoop_Clients.R procedure fails for some databases;
  • however, this seems to be happening on occassion only;
  • inspect and solve.

@Lydia_Pintscher

  • Correction. It is not the Apache Sqoop procedure in WDCM_Sqoop_Clients.R that fails (luckily!);
  • explanation: I've stumbled into an unfinished log file when I was checking this... and some databases, naturally, where missing - because the procedure did not complete all the passes yet.
  • Continue with: inspect why the usage number reported on WD_percentUsageDashboard 'oscilates'.

@Lydia_Pintscher I got it:

  • the dashboard updated before the Apache Sqoop run - which produces the essential data for the WDCM and this dashboard as well - was completed;
  • since we need the product of the Apache Sqoop procedure - the one and the only re-use Hive table that we use for everything - to be available in an asynchronous manner to various statistical systems,
  • the only solution is to correct the crontab run of the dashboard update from stat1004 to a later time (currently it is 08:00 UTC; the Apache Sqoop run starts at midnight and it seems to be taking more than nine hours to complete - and that is the root of the problem).

Implementing change; testing as of tomorrow.

GoranSMilovanovic lowered the priority of this task from High to Medium.May 27 2019, 11:10 AM
  • WDCM Geo is back on updates, re-designed, fully client-side dependent, and does not rely on the wdcm_maintable anymore (see T214586):

http://wmdeanalytics.wmflabs.org/WDCM_GeoDashboard/

  • Next step: do the same for the WDCM Biases dashboard.

@RazShuty @Lea_Lacroix_WMDE

  • The WDCM Biases Dashboard's back-end has to switch to Apache Spark;
  • it maintains large data sets, depends upon WDQS heavily by sending queries that time-out every now and then; beyond that,
  • it relies on geo-coordinates which, when fetched together with items from WDQS, produce long vectors that cannot be converted from raw to character() in R (and which then need to be parsed by {jsonlite} into readable data.frame types for processing).

We will deal with this dashboard once and for all.
@Lea_Lacroix_WMDE I know we said this dashboard is on maintenance only, but this is exactly it: if the changes do not take place, it will never update again.

  • Pyspark ETL procedures for WDCM Biases completed;
  • next step: re-factor R code to work with Spark outputs.

@Lea_WMDE

  • archive the Advanced Search Extension dashboard (tracking).
  • Goodbye Advanced Search Extension dashboard :(

@RazShuty @Lea_Lacroix_WMDE

The WDCM Biases dashboard is now back on updates:

  • it will be update once monthly, and
  • it's update is dependent upon the most recent version of the WD dump copy in hdfs (see T209655);
  • the dashboard is now fully client-side dependent,
  • and because it now uses Spark as its ETL back-end we can finally remove
  • the huge wdcm_maintable from HDFS that previously supported our ETL procedures for the WDCM system (Hive; see T214586).

@Lea_Lacroix_WMDE This is the only WDCM dashboard whose design is not consistent with others, take a look under "Wikidata Concepts Monitor" on the WMDE Analytics Portal.
I did not invest any work into its re-design since the decision was to keep the dashboard on maintenance only. However, for reasons of consistency, I think it should be done. Please let me know if you agree.

GoranSMilovanovic raised the priority of this task from Medium to High.Sep 6 2019, 7:39 PM

Check out what is happening with:

  • Russian Wikisource, Chechen Wikipedia, Russian Wikipedia and the Catalan Wikipedia in WDCM.
  • Following the latest WDCM update ruwiki is back;
  • still cannot localize what exactly happened to the previous update.
GoranSMilovanovic lowered the priority of this task from High to Medium.Sep 9 2019, 11:10 PM

Not updated 2019/11/08:

  • WDCM (T)itles,
  • WDCM (S)itelinks.

Note. This seems to have been a temporary problem only, the November updates are running smoothly. Monitoring.

Wikidata Pageviews per Namespace Dashboard not responding following the changes in T239199 (Kerberos Auth for all WMDE Analytics):

  • figured it out,
  • fixed the bug,
  • re-running initial data intake now.
  • WDCM Geo Dashboard not updated:
  • a problem is identified in the new WDCM directory structure on the stat100* machines;
  • fixed; running update procedures now;
  • also: the category Museum apparently has no data on the dashboard; inspecting the issue now.
  • update procedure completed; all public datasets in place;
  • monitoring for the Museum class related bug on the WDCM Geo Dashboard.
  • WDCM Navigation HTML centralized;
  • broken links across the WDCM dashboards (thus) fixed.

In focus until the end of 2020:

  • T261905 Productionize the WMDE Analytics Front-End
  • T239206 WDCM dashboards UI
GoranSMilovanovic lowered the priority of this task from Medium to Low.Mar 14 2021, 1:52 PM