Page MenuHomePhabricator

Keep more data longer (dashboard or otherwise)
Closed, ResolvedPublic

Description

As we were doing prep for the quarterly metrics, we realized that we didn't always have data for the quarter that we were reviewing, as some data is only kept for 60 or 90 days.
Let's figure out some wonderful ways to keep this data for longer for tracking purposes:

  • wikipedia portal pageview by device (desktop vs mobile)
  • wikipedia portal clickthrough rate by device (desktop vs mobile)
  • proportion of wikipedia portal on mobile devices in US vs elsewhere
  • regions of wikipedia portal users
  • pageviews from full-text search (desktop vs mobile)
  • we're already working on some enhanced maps tracking
  • anything else?

Event Timeline

debt created this task.Aug 3 2017, 10:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 3 2017, 10:11 PM

Can we use golden to collect those data that are not on dashboard and keep them in /srv/published-datasets/discovery?
For those on the dashboard but with a max_data_points limit, can we just create extra reports and remove the max_data_points limit?
@mpopov Any other ideas? ;)

Can we use golden to collect those data that are not on dashboard and keep them in /srv/published-datasets/discovery?
For those on the dashboard but with a max_data_points limit, can we just create extra reports and remove the max_data_points limit?
@mpopov Any other ideas? ;)

Without patching Reportupdater to output two reports (a limited one and an unlimited one), that's what I was thinking too. This means calculating the same thing twice, but it's the simplest solution.

chelsyx added a comment.EditedAug 30 2017, 7:24 PM

According to config.yaml files in golden, the following reports have a max_data_points limit, and we are going to create duplicated reports with max_data_points removed for tracking purpose:

Search:

  • app_event_counts_langproj_breakdown
  • mobile_event_counts_langproj_breakdown
  • desktop_event_counts_langproj_breakdown
  • paulscore_approximations_fulltext_langproj_breakdown
  • search_threshold_pass_rate_langproj_breakdown
  • cirrus_langproj_breakdown_no_automata
  • cirrus_langproj_breakdown_with_automata

Portal:

  • all_country_data
  • last_action_country
  • most_common_country
  • first_visits_country

The following metrics haven't been tracked by our dashboards and we are going to create new reports for them in golden:

  • wikipedia portal pageview by device (desktop vs mobile)
  • wikipedia portal clickthrough rate by device (desktop vs mobile)
  • proportion of wikipedia portal on mobile devices in US vs elsewhere
  • pageviews from full-text search (desktop vs mobile)
  • search return rate
  • SERPs by access method

Change 374900 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/golden@master] Duplicate reports without max data points limit to keep data longer

https://gerrit.wikimedia.org/r/374900

Change 374900 merged by Bearloga:
[wikimedia/discovery/golden@master] Duplicate reports without max data points limit to keep data longer

https://gerrit.wikimedia.org/r/374900

Change 375408 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/golden@master] Add new datasets in search and portal

https://gerrit.wikimedia.org/r/375408

Change 375408 merged by Bearloga:
[wikimedia/discovery/golden@master] Add new datasets in search and portal

https://gerrit.wikimedia.org/r/375408

mpopov removed a project: Patch-For-Review.

Deployed to prod. Good job, @chelsyx!

debt closed this task as Resolved.Sep 22 2017, 8:35 PM

🎉