1. Implement [[ https://www.wikidata.org/wiki/User:Envlh/Denelezh/Schema | new schema ]]:
- transformed KPI table (rename: metric or indicator)
- pre-work is to list all metrics we need and see how they would be stored in "long" format
- alternatives to alembic?
2. WDTK layer:
- Start from denelezh-import, include all current properties
- include the properties needed for WHGI
3. Backfiller
- WHGI from old index files
- Denelezh from old db dumps
- investigate how many old
4. Orchestration
- Use airflow, with a dag:
- allow for both 1. Run WDTK → create CSVs → load data into DB → make aggregation
2. Run WDTK → create CSVs → make aggregations in memory → load data into DB
- alow for paraellized transformations (replace sql permuations in loop)
- see if wikipedia cloud has a spark cluster already
5. dev-ops
- setup a mysql 8.0 db, configure with envel
- on a seperate or new server on wmf-cloud