This task is primarily intended for documenting how Movement-Metrics is affected by problems with Dumps-Generation and mediawiki_wikitext_history. For that reason, it is not tagged with Data-Platform or Dumps-Generation.
SDS 2.6.2 (FY2023-24) has been focused on improving the delivery of the movement metrics report. Our critical path is as follows:
- XML dumps generation
- loading XML dumps to HDFS (Python script, template for running script, Puppet management of SystemD timers running script)
- mediawiki_wikitext_history
- research_article_quality (Airflow DAG, code)
- knowledge_gaps (Airflow DAG, code)
Before T357859, the average duration was 26 days. Afterward, the average duration has been 17 days.
data interval | days to availability of knowledge gaps | notes |
---|---|---|
2023-09 | 23.1 | 1 day delay due to T342911 |
2023-10 | 25.5 | |
2023-11 | 27.9 | 4 day delay due to T342911 |
2023-12 | 26.1 | 2 day delay due to T342911 |
2024-01 | 26.9 | 1 day delay due to T342911, knowledge gaps job issue (T358613) |
2024-02 | 10.6 | First run skipping Wikidata to save time (T357859), 1 day delay due to T342911 |
2024-03 | 18.7 | Dumps generation issue, ultimately resolved by skipping Commons (T362454), 1 day delay to T342911 |
2024-04 | 14.1 | Dumps generation issue (T364391) |
2024-05 | 23.8 | Major dumps generation issue (T365155) |