Page MenuHomePhabricator

Generate Airflow DAG for creating article-country SQLite DB
Open, Needs TriagePublic

Description

Per T385970, we'll need a SQLite DB dependency for the article-country model on LiftWing. Isaac produced the initial database but, per discussion with Fabian, we should build an Airflow DAG to make updating this dependency easier in the future. The current code is custom but we should be able to build directly on top of the geography and cultural data in the content gaps metrics because the increased accuracy of e.g., the coordinate-based data on the article-country model is not necessary for this DB so we can use the more efficient approach taken with the content gaps.

Steps:

Event Timeline

Isaac removed Isaac as the assignee of this task.Feb 21 2025, 11:05 PM

@fkaelin I did my best with the initial code here though I wasn't fully understanding what the right configuration for the SQLite DB piece should be and getting errors so I just left that part out. But I'm pretty sure the schema etc. matches so hopefully it's a simple fix. Let me know if there are larger concerns but I think should be ready to hopefully add on top of the content gaps Airflow DAGs.

Code: https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia/-/blob/master/article-country/article-country-sqlite-dependency.ipynb?ref_type=heads

Also apologies but realizing that this is related to an old request (T371900) that covered other dependencies associated with the model. Tagging @XiaoXiao-WMF for visibility too and to assist with prioritization. This isn't urgent (I've given ML Platform what they need to move forward) but is important to the long-term maintainability of this model.

Thanks Isaac.

Edited the description with a link to T377267 for the "move geographic models to research-datasets" work.