Page MenuHomePhabricator

Productionize the WMDE Analytics Front-End
Open, Needs TriagePublic

Description

The WMDE analytics front-end, related or not to Wikidata, needs serious re-engineering in order to place all of its Shiny dashboards and R Markdown reports to production properly. The whole WMDE analytics front-end is currently running on Open Source Shiny Server from the CloudVPS Wmde-dashboards project. A Docker container encompassing the whole WMDE analytics front-end (or almost all of it) is currently running on a test server. Very soon, the front-end will be containerized from the same CloudVPS project where it is running now. However, a more thorough system engineering will need to take place to keep everything in order, manageable, and served more efficiently (i.e. the Open Source Shiny Server does not help us overcome the single-threaded nature of R, while some of our dashboards are quite demanding on the resource side).

In the first step, as soon as we receive an increased quota on our CloudVPS analytics project (see: T261743) to be able to spin three XL instances there, a separation is planned:

  • Instance A: WDCM (see below), Wikidata Analytics, and Wikidata Structural Systems
  • Instance B: Wiktionary Cognate Dashboard
  • Instance C: WMDE New editors team.

On all three instances we will switch from using Open Source Shiny Server to ShinyProxy, so that each new user connection will be spinning up its own Docker container serving the desired product.

We have the following analytics RStudio Shiny dashboards and standardized R Markdown notebooks developed, or in development, in WMDE, and running from the CloudVPS Wmde-dashboards project; all of them will be re-distributed across the three (new) virtual instances:

Wikidata Concepts Monitor (WDCM) dashboards - Instance A

Wikidata Analytics - Instance A

Wikidata Structural Systems - Instance A

Qurator Projects - Instance A

Wiktionary - Instance B

WMDE New Editors Team - Instance C

  • New Editors dashboard (development is currently postponed)
  • many R Markdown WMDE Banner Campaign reports.

With twenty data products currently served or under development, it is obvious that we cannot rely on a single instance of Open Source Shiny Server and manage all dependencies manually. While there was only WDCM, the situation was much simpler, and using one instance of Shiny Server running on one VM was acceptable; not anymore.

The {golem} framework will be used across all listed (and all future) data products in order to secure their robustness and deal with production/reproducibility issues.

Proposed timeline: Q4; completion expected until the end of 2020.

Event Timeline

2020/09/15, status:

A new Wikidata Analytics Portal is now available at the test server: http://datakolektiv.org/app/WikidataAnalytics