//This task is primarily intended for documenting how #movement-metrics is affected by problems with #dumps-generation and [mediawiki_wikitext_history](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history). For that reason, it is not tagged with #data-platform or #dumps-generation.//
[SDS 2.6.2](https://app.asana.com/0/1206332282349373/) (FY2023-24) has been focused on improving the delivery of the movement metrics report. Our [critical path](https://en.wikipedia.org/wiki/Critical_path_method) is as follows:
* **[XML dumps generation](https://meta.wikimedia.org/wiki/Data_dumps)**
* **loading XML dumps to HDFS** ([Python script](https://github.com/wikimedia/analytics-refinery/blob/master/bin/import-mediawiki-dumps), [template for running script](https://github.com/wikimedia/operations-puppet/blob/production/modules/profile/templates/analytics/refinery/job/refinery-import-mediawiki-dumps.sh.erb), [Puppet management of SystemD timers running script](https://github.com/wikimedia/operations-puppet/blob/production/modules/profile/manifests/analytics/refinery/job/import_mediawiki_dumps.pp))
* **[mediawiki_wikitext_history](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history)**
* **research_article_quality** ([Airflow DAG](https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/research/dags/article_quality_dag.py), [code](https://gitlab.wikimedia.org/repos/research/article-quality/))
* **knowledge_gaps** ([Airflow DAG](https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/research/dags/knowledge_gaps_dag.py), [code](https://gitlab.wikimedia.org/repos/research/knowledge-gaps/))
Before T357859, the usual duration was 25-27 days. Afterward, we expected it to be about 12 days. However, so far, we have had limited success meeting that expectation.
| data interval | days to availability of knowledge gaps | notes
| --- | --- | ----
| 2023-09 | 23.1 | 1 day delay due to T342911
| 2023-10 | 25.5 |
| 2023-11 | 27.9 | 4 day delay due to T342911
| 2023-12 | 26.1 | 2 day delay due to T342911
| 2024-01 | 26.9 | 1 day delay due to T342911, knowledge gaps job issue (T358613)
| 2024-02 | 10.6 | First run skipping Wikidata to save time (T357859), 1 day delay due to T342911
| 2024-03 | 18.7 | Dumps generation issue, ultimately resolved by skipping Commons (T362454), 1 day delay to T342911
| 2024-04 | 14.1 | Dumps generation issue (T364391)
| 2024-05 | 23.8 | Major dumps generation issue (T365155)
([raw data in spreadsheet](https://docs.google.com/spreadsheets/d/1MNgjdRptrugoHWy0HDyL-eQk3B6SrxO9TqryBSvpdu8/))