Some ideas including those that came up during our knowledge transfer of high level metrics calculation with Connie. We can break these out into separate tasks if required.
**Q1 2021-Q4 2022**
Instrument
[x] ETL readers notebooks
[x] ETL editors notebooks
[] ETL remaining readers
Backfill
[] T287420
Tables
[] Enable movement metric calculation as analytics-product user: T295332, T288677, T291957, T291956
[] Also, need to look at Connie’s tables: cchen.repo_active_editors & cchen.new_editors --> when these are moved, also need to update Superset T287284
and [[ https://github.com/wikimedia-research/wiki-comparison/blob/master/data-collection/data-collection.ipynb | wiki comparison ]] T294653
**Q1 2022**
refactoring
[] Refactor: combine the different inserts for active_editors
Calculations
[] Add functions in editing-movement notebook (03.report) to calculate net new content (non-wikidata)
[] Make [[ https://docs.google.com/spreadsheets/d/1mK-R8qWzKjSeHMBBek9sJsbecdic9s3r28OIW7QkqrE/edit#gid=476321462 | readers ]]-[[ https://docs.google.com/spreadsheets/d/1wfTtHjQP9Kj0WME15ESJ-4dSMGMpbtY8qOuDVcwZovQ/edit#gid=1862467345 | editors ]] google sheet more reader friendly. Remove mobile-heavy metrics tab, add new columns like YoY, FY average, quarterly average, FY YTD average etc. ([[ https://docs.google.com/spreadsheets/d/1D7aJxhhA4apxRUVjKf_Vs6M8fleIlH-kVY1hefRP9Nw/edit#gid=781717421 | reference document ]])
Diversity calculations
- Adding diversity (see T295332) in repo_active_editors
[] Add diversity for new and returning active editors
- Diversity sheet
[] net new non-wikidata content = net new content MINUS new wikidata content
[] YOY - editors sheet (jupyter notebook has all data) - sum rows for the previous year (net new content MINUS new wikidata content )
Platform Evolution sheet calculations
[] % from baseline = status column
[] % of wiki data items ---> see wikidata items being reused (status)
No longer required as we will not be using MMTP sheets
~~[] Update the rpt repos to calculate data for the Movement metrics tables preparation sheets file~~
~~[] Work on consolidating and making the MMPT, Movement Metrics Preparation Table sheets more reader friendly (tables stay in MMTP only...keep YoY)~~
Viz
[] Fix - For some months now some of the editor global north and south charts haven't been showing up. You can see that the global north charts aren't showing on this March 2020 repo https://github.com/wikimedia-research/Editing-movement-metrics/blob/b64d7ffee70a4482a84787ae78343e0136d7ae90/03-report.ipynb
Platform Evolution R Viz
[] Starting y axis at 0
[] Ticks: Major breaks: 50 mil
[] Ticks: Minor breaks: 10mil
[] Remove x lines
[] Geom point only the latest point and those others to especially highlight
[] Colors: Switch to gray for previous year and only use blue for current year
Discuss
~~[] Use google sheet macro to update metrics in the correct format (instead of manually updates)~~
We are no longer going to update [[ https://docs.google.com/presentation/d/1D_MuQ4Cf23Agn1o_ausJtH5rrJysqtGIYzmK8xxEX7M/edit#slide=id.g4463d16142_0_0 | board deck ]] with detailed analysis. Only summary slides will be added every month
~~[] Add notes and analysis to the [board staging deck](https://docs.google.com/presentation/d/1D_MuQ4Cf23Agn1o_ausJtH5rrJysqtGIYzmK8xxEX7M/edit#slide=id.g4463d16142_0_0) ~~
[x] Tuning Session: automate the calculations and output calculated for editors (for TM) in the [[ https://docs.google.com/spreadsheets/d/1dMXlbE9zOlYbjaMsvp4h6f2wHIhqRh0zCA2GhZOtDY8/edit#gid=686067554 | TM sheet in the Tuning Session notebook ]]
[] Tuning Session: automate the calculations and output calculated for readers
[] Tuning Session: automate the calculations and output calculated for platform