The Language team is interested in tracking a number of advanced metrics for Content Translation, such as the number of translators, the new translator retention rate (T226170, T194641), and overall deletion rates of translations (T286636).
However, most of these metrics cannot be reliably calculated over a large time range (such as a year) within the 180 second Superset query timeout. In addition, the new Content Translation data stream (T231316) will be even larger, making it even harder to compute any metrics (such as translation completion rate) within the timeout.
In the absence of architectural improvements to Superset such as asynchronous queries, the only way to dashboard these metrics in Superset is to create an ETL job which will periodically calculate these metrics and save them to the Data Lake.
The current tool for this is Oozie, but work is planned to replace Oozie with Airflow (T271429).
Task Requirements:
- Create ETL Job
- Update the Unified Experience Dashboard, incuding translator data, to use this aggregate dataset. This will enable all the charts to run more efficiently.