Page MenuHomePhabricator

Better Use of Data
Open, MediumPublic

Description

This is a parent task for the work to be done for the Better Use of Data Program, which was started in FY2018/19.

Roadmap FY2019-2020

Q1 (July - September)
Data Engineering
  • Event Platform Client Libraries prototypes
    • Develop Event Platform Client specification T228177
    • Planning Stream & Schema usage T228656
  • Prototype clients
    • Prototype Android client
    • Prototype iOS client
    • Prototype JS browser client
Data Access
  • Automated dashboard for Product Core Metrics (Readers)
  • Internal production release of edits_hourly Druid datasets (for use in Superset and Turnilo)
Data Training
  • Start product team trainings: best practices for working with data in the product development lifecycle
  • Start product team trainings for core metrics: data exploration and reports
Tracking
  • MEP stream configuration service planning
  • MEP schema registry deployment
  • Client-side error logging working group

Q2 (October - December)
Data Engineering
  • Provide a test-ready Modern Event Platform clients for MediaWiki, the Android Wikipedia app, and the iOS Wikipedia app
  • MEP stream configuration service has been deployed for analytics events T233634
  • MEP EventGate instance has been deployed for analytics events T236386, T233629
  • MEP client for MediaWiki has been tested with Vagrant T238544
  • Cross-platform client-side error logging T229442
Data Quality
  • Ensure Data Quality is considered as part of MEP T228228
  • Document plan for technical changes needed to improve data T236504
  • Document plan for process changes needed to improve data and present to Product Analytics team T235802
Data Access
  • Automated dashboard for Product Core Metrics (Edits)
Data Training
  • Group training on core metrics: data exploration and reports
  • Office hours with members from product teams: data exploration and reports
Tracking
  • MEP engineering sync
  • Client-side error logging working group

Q3 (January - March)
Data Engineering
  • Finish EPC for production
    • Develop production version of Sampling Controller
  • Document usage guidelines on-wiki
    • Recommendation for porting old schemas to new system
    • Document A/B testing procedures
    • Document funnel analysis procedures
  • Integration: EPC is available for use on the 3 major platforms
    • One (1) web team is able to use EPC for analytics
    • Android team is able to use EPC for analytics
    • iOS team is able to use EPC for analytics
  • Deploy error logging instrumentation to production T238544
Data Quality
  • Ensure Data Quality is considered during piloting of MEP Client Libraries
  • Pilot process changes needed to improve data with at least 1 Product Team T235802
Tracking
  • TBD

Q4 (April - June)
Data Engineering
  • Usage: EPC is used on the 3 major platforms
    • One (1) web team is using EPC for analytics
    • Android team is using EPC for analytics
    • iOS team is using EPC for analytics
  • Advise all newly-created schema use EventGate-style JSONSchema
  • Port select EventLogging schema to EventGate-style JSONSchema
  • Evaluate feasibility of cross-schema joins (comes with EPC automatically)
  • Develop Event Platform Client test suite T228178
  • Research and architect "session length" dataset (see https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/SessionLength for ideas)
Tracking
  • TBD
Data Quality
  • [UNDER CONSIDERATION] Pilot technical changes needed to improve data T236504

Stretch Goals

  • Evaluate analytics systems capacity ("are we going to break our whole system by pinging from a lot of clients on a second-by-second basis?")
  • MEP for Product
    • Develop schema registry UI
    • Develop stream configuration service UI
    • Develop CI and commit hooks
  • Research and architect "unique devices" dataset
  • Develop automated ingestion pipeline and dashboard defaults

Related Objects

Event Timeline

kzimmerman updated the task description. (Show Details)Jun 6 2019, 5:31 PM
kzimmerman updated the task description. (Show Details)
kzimmerman updated the task description. (Show Details)
jlinehan updated the task description. (Show Details)Jun 6 2019, 6:34 PM
jlinehan updated the task description. (Show Details)
jlinehan updated the task description. (Show Details)Jun 6 2019, 6:46 PM
jlinehan updated the task description. (Show Details)
jlinehan updated the task description. (Show Details)Jun 6 2019, 8:22 PM
jlinehan updated the task description. (Show Details)
jlinehan updated the task description. (Show Details)Jun 6 2019, 8:26 PM
kzimmerman updated the task description. (Show Details)Jun 7 2019, 4:47 AM
kzimmerman updated the task description. (Show Details)Jun 8 2019, 12:34 AM
This comment was removed by Aklapper.
kzimmerman moved this task from Triage to MEP on the Better Use Of Data board.Jun 19 2019, 10:26 PM
kzimmerman moved this task from Triage to Backlog on the Product-Analytics board.Jun 19 2019, 10:42 PM
phuedx added a subscriber: phuedx.Jun 20 2019, 2:04 PM
kzimmerman updated the task description. (Show Details)Jun 20 2019, 8:49 PM
kzimmerman updated the task description. (Show Details)Jun 21 2019, 11:18 PM
kzimmerman triaged this task as Medium priority.Jun 26 2019, 12:46 AM
kzimmerman updated the task description. (Show Details)Jul 8 2019, 8:27 PM
jlinehan updated the task description. (Show Details)Jul 16 2019, 2:32 PM
jlinehan moved this task from MEP to To Do on the Better Use Of Data board.Jul 16 2019, 5:15 PM
jlinehan moved this task from To Do to MEP on the Better Use Of Data board.
kzimmerman updated the task description. (Show Details)Aug 15 2019, 8:43 PM
kzimmerman updated the task description. (Show Details)Aug 15 2019, 9:29 PM
jlinehan updated the task description. (Show Details)Aug 15 2019, 10:00 PM
kzimmerman moved this task from MEP to Epics on the Better Use Of Data board.Aug 26 2019, 6:41 PM
kzimmerman moved this task from Backlog to Epics on the Product-Analytics board.Sep 4 2019, 1:58 AM
jlinehan moved this task from Epics to Triage on the Better Use Of Data board.Oct 11 2019, 1:34 PM
kzimmerman edited subscribers, added: cchen; removed: Hannahjo1983.Oct 23 2019, 6:43 PM
jlinehan moved this task from Triage to Epics on the Better Use Of Data board.Nov 19 2019, 5:26 PM
cchen updated the task description. (Show Details)Dec 12 2019, 5:16 PM
kzimmerman updated the task description. (Show Details)Dec 12 2019, 11:16 PM
kzimmerman updated the task description. (Show Details)Dec 12 2019, 11:22 PM
kzimmerman updated the task description. (Show Details)
kzimmerman updated the task description. (Show Details)Dec 13 2019, 7:09 PM
kzimmerman updated the task description. (Show Details)Dec 13 2019, 7:11 PM
kzimmerman updated the task description. (Show Details)Dec 23 2019, 7:34 PM
kzimmerman updated the task description. (Show Details)
mpopov updated the task description. (Show Details)Jan 10 2020, 3:44 PM
Jhernandez removed a subscriber: Jhernandez.Apr 2 2020, 6:46 PM