This is a parent task for the work to be done for the Better Use of Data Program, which was started in FY2018/19.
# Roadmap FY2019-2020
##### Q1 (July - September)
##### Data Engineering
[x] Event Platform Client Libraries prototypes
[x] Develop Event Platform Client specification T228177
[x] Planning Stream & Schema usage T228656
[x] Prototype clients
[x] Prototype Android client
[x] Prototype iOS client
[x] Prototype JS browser client
##### Data Access
[x] Automated dashboard for Product Core Metrics (Readers)
[x] Internal production release of edits_hourly Druid datasets (for use in Superset and Turnilo)
##### Data Training
[x] Start product team trainings: best practices for working with data in the product development lifecycle
[x] Start product team trainings for core metrics: data exploration and reports
##### Tracking
- MEP stream configuration service planning
- MEP schema registry deployment
- Client-side error logging working group
---
##### Q2 (October - December)
##### Data Engineering
[x] Provide a test-ready Modern Event Platform clients for MediaWiki, the Android Wikipedia app, and the iOS Wikipedia app
[x] Develop Android client T228179
[x] Develop iOS client T228180
[x] Develop JS browser client T228181
[] MEP stream configuration service has been deployed for analytics events T233634
[] MEP EventGate instance has been deployed for analytics events T236386, T233629
[x] MEP client for MediaWiki has been tested with Vagrant T238544
[x] Cross-platform client-side error logging T229442
##### Data Quality
[x] Ensure Data Quality is considered as part of MEP T228228
[x] Document plan for technical changes needed to improve data T236504
[x] Document plan for process changes needed to improve data and present to Product Analytics team T235802
##### Data Access
[] Automated dashboard for Product Core Metrics (Edits)
##### Data Training
[x] Group training on core metrics: data exploration and reports
[x] Office hours with members from product teams: data exploration and reports
##### Tracking
- MEP engineering sync
- Client-side error logging working group
---
##### Q3 (January - March)
##### Data Engineering
[] Finish EPC for production
[] Develop production version of Sampling Controller
[] Document usage guidelines on-wiki
[] Recommendation for porting old schemas to new system
[] Document A/B testing procedures
[] Document funnel analysis procedures
[] **Integration**: EPC is available for use on the 3 major platforms
[] One (1) web team is able to use EPC for analytics
[] Android team is able to use EPC for analytics
[] iOS team is able to use EPC for analytics
[] Deploy error logging instrumentation to production T238544
##### Data Quality
[] Ensure Data Quality is considered during piloting of MEP Client Libraries
[] Pilot process changes needed to improve data with at least 1 Product Team T235802
##### Tracking
- TBD
---
##### Q4 (April - June)
##### Data Engineering
[] **Usage**: EPC is used on the 3 major platforms
[] One (1) web team is using EPC for analytics
[] Android team is using EPC for analytics
[] iOS team is using EPC for analytics
[] Advise all newly-created schema use EventGate-style JSONSchema
[] Port select EventLogging schema to EventGate-style JSONSchema
[x] Evaluate feasibility of cross-schema joins (comes with EPC automatically)
[] Develop Event Platform Client test suite T228178
[] Research and architect "session length" dataset (see https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/SessionLength for ideas)
##### Tracking
- TBD
##### Data Quality
[] [UNDER CONSIDERATION] Pilot technical changes needed to improve data T236504
# Stretch Goals
[] Evaluate analytics systems capacity ("are we going to break our whole system by pinging from a lot of clients on a second-by-second basis?")
[] **MEP for Product**
[] Develop schema registry UI
[] Develop stream configuration service UI
[] Develop CI and commit hooks
[] Research and architect "unique devices" dataset
[] Develop automated ingestion pipeline and dashboard defaults