Page MenuHomePhabricator

Gather potential requirements for next version of denelezh-import
Closed, ResolvedPublic21 Estimated Story Points

Description

  1. Implement new schema:
    • transformed KPI table (rename: metric or indicator)
    • pre-work is to list all metrics we need and see how they would be stored in "long" format
    • alternatives to alembic?
  1. WDTK layer:
    • Start from denelezh-import, include all current properties
    • include the properties needed for WHGI
  2. Backfiller
    • WHGI from old index files
    • Denelezh from old db dumps
      • investigate how many old
  3. Orchestration
    • Use airflow, with a dag:
    • allow for both 1. Run WDTK → create CSVs → load data into DB → make aggregation
      1. Run WDTK → create CSVs → make aggregations in memory → load data into DB
    • alow for paraellized transformations (replace sql permuations in loop)
    • see if wikipedia cloud has a spark cluster already?
  4. dev-ops
    • setup a mysql 8.0 db, configure with envel
    • on a seperate or new server on wmf-cloud

Event Timeline

notconfusing updated the task description. (Show Details)Sep 10 2020, 8:15 PM
notconfusing renamed this task from Requirements gathering for next version of denelezh-import to Gathering potential requirements for next version of denelezh-import.Sep 11 2020, 8:26 PM
notconfusing updated the task description. (Show Details)Sep 11 2020, 8:28 PM
notconfusing added a comment.EditedSep 11 2020, 8:33 PM

Definite Requirement: include whgi-needed data

WHGI's https://github.com/notconfusing/Wikidata-Toolkit/blob/wigi/wdtk-examples/src/main/java/org/wikidata/wdtk/examples/GenderIndexProcessor.java

Definite requirement: Backfiller
Whatever the future schema is, allow previous data from WHGI and Denelezh to be filled in retroactively.

notconfusing updated the task description. (Show Details)Sep 11 2020, 9:26 PM
notconfusing updated the task description. (Show Details)

Definite requirement: transform KPI table from long to wide

notconfusing renamed this task from Gathering potential requirements for next version of denelezh-import to Gather potential requirements for next version of denelezh-import.Sep 22 2020, 1:16 AM
notconfusing updated the task description. (Show Details)
notconfusing set the point value for this task to 21.
notconfusing updated the task description. (Show Details)Sep 22 2020, 7:39 PM
notconfusing closed this task as Resolved.Sep 25 2020, 5:52 PM