== Request ==
Copying the task for this milestone from the [[ https://phabricator.wikimedia.org/T336426 | epic task ]]:
Milestone 2: Report for metrics NOT requiring `webrequest` aggregation
- First test report (in the format specified in the [[ https://docs.google.com/document/d/1SH_XbgxGpS7oHZqllPJoWEE459PgRBBzYMhzM4lV7hM/edit?usp=sharing | reporting template ]])
- number of [[ https://www.mediawiki.org/wiki/Analytics/Metric_definitions#Active_editor | active editors ]] (5 edits per month)
- number of admins (interface-admin)
- number of bureaucrats
- number of property creators
- number of items per active editor
- The presented information should be ready for "copy & paste" to be used in the respective reports.
- For the full report we'd also like the % change from the previous year and all data from the previous quarter
=== Assignee Planning ===
**Information below this point is filled out by WMDE Analytics and specifically the assignee of this task.**
=== Sub Tasks ===
Full breakdown of the steps to complete this task:
[x] Define all tables needed and add them to the ticket
[x] Make baseline connections to the given tables above and explore
- Including this as this is an initial task for me
[x] Define the following:
- Methods used:
- Total sum in the period
- Percentage change (yearly)
- Conditions for each group:
- How do we do retroactively check if the user had a certain permission at a given time? `snapshot`?
- Prior data exists in a preaggregated form, so we'd use that
- Aside from this the logs can be used to determine user rights/right changes at a given time
- Are admins and bureaucrats also active? When do they not count anymore?
- As of now they are still counted as an admin, but we might want to think about this for product metrics
- How to exactly differentiate property creations and the edits we want?
- Are we including edits to user pages?
- As of now including everything
- How to define an active editor over a quarter? Do we switch to 5 edits/30 days?
- We need to look into this further, but right now it's on a given day based on 5 edits over the last 30 days
- Are there conditions to consider for when an item is counted? Like for Wikipedia we might not count stubs?
- For Wikidata there's no distinction at all, and for Wikipedia we'd also just count them
- Time frame for aggregation:
- Current time that notebook is ran
- When would the notebook normally be ran by in case we're doing a cron job (for later planning)?
- Ideality would be self service at time of stakeholders choosing
- Current consideration is Grafana depending on data access constraints
[x] Define output and check with stakeholders (@Manuel)
- Aggregates for each group/value at the end of the time frame in a dataframe
- Rows:
- One row per quarter
- Columns:
- One column per group/value
- After each column, one column for yearly retrospective percentage change
- Are we including plots?
- For right now no
=== Data to be used ===
See [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake | Analytics/Data_Lake]] for the breakdown of the data lake databases and tables.
The following tables will be referenced in this task:
- We need a `WHERE` clause of `wiki_db = 'wikidatawiki'`
- For user permissions (admins, bureaucrats): [[ https://www.mediawiki.org/wiki/Manual:User_groups_table | wmf_raw.mediawiki_user_groups ]]
- For edit data and property creations: [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history | wmf.mediawiki_history ]]
- For the total number of items: [[ https://www.mediawiki.org/wiki/Manual:Page_table | wmf_raw.mediawiki_page ]]
=== Notes and Questions ===
Things that came up during the completion of this task, questions to be answered and follow up tasks:
- Note