Page MenuHomePhabricator

Add graph for number of users in the unified stats dashboard
Closed, ResolvedPublic

Description

An initial version of the unified translation activity stats dashboard was complete din T310774. It provides a great overview of the basic translation metrics that can be filtered to focus on the different type of translations supported.

Currently, the dashboard shows the number of translations split by user expertise. However, there is no overview of the current number of users that have participated in translaiton activities. This ticket proposes to create an overview graph on the number of users. The graph for users can use the style of the "Translation last calendar month" and be placed next to "Monthly translations by user edit count". The table below shows the current and proposed change for the dashboard.

CurrentProposed
superset.wikimedia.org_superset_dashboard_119_.png (894×1 px, 154 KB)
Artboard.png (894×1 px, 132 KB)

Event Timeline

@Pginer-WMF

I've updated the dashboard to include a "Users last calendar month" graph, which provides the number of distinct users that have participated in translation activities in the most recent month.

Load Time Issue
This new user graph is taking significantly longer to load than the other graphs on the dashboard (anywhere from 90 seconds to 2 minutes depending on available bandwidth) and sometimes results in a timeout error. This is because the user data needed for this graph requires querying mediawiki_history, which takes longer to run compared to the other graphs that come directly from the edits_hourly dataset available in Superset. The edits_hourly dataset does not currently include data on the number of unique users.

If we find that this load time frequently results in timeout errors, there are a few options we can take to obtain this data more efficiently:
(1) Add more conditions to limit the amount of data to be queried. For example, limiting the time range, languages, or namespaces that are queried. This could be done easily by adjusting the query; however, we would likely need to apply the same conditions to the other charts so the data is consistent.
(2) Once T307883 is resolved, update the dashboard to include user data from the new editors_daily dataset, which will run as quickly as the other charts on the dashboard. This task is currently deprioritized so it will likely not be available soon.
(3) Create a separate smaller dataset of aggregated translation data to query and set up a job to automatically rerun the query every month. This will require some more time to set-up but would greatly reduce the load time and would also be useful for other planned additions to the dashboard such as newcomer retention T226170. A task for this work has already been created (T287306).

Dashboard Filter Update
The filters are now located on the right side of the dashboard (See screen shot below). The previous Filter Box was replaced with built-in dashboard filtering as part of the recent Superset update but the filters should still work as expected.

Screen Shot 2023-02-21 at 6.29.14 PM.png (1×1 px, 214 KB)

MNeisler subscribed.

Per discussions with @Pginer-WMF, we'll keep the user graph as is for now and eventually update the query based on a smaller aggregate dataset of Content Translation Data and ETL job to be created in T287306. I've updated T287306 to add updating this graph as part of the scope.

@Pginer-WMF - ok to resolve this task?

@Pginer-WMF - ok to resolve this task?

Sounds good. Thanks for the good work!