Page MenuHomePhabricator

Priority metric improvements: Programs metrics by geo
Closed, ResolvedPublic

Event Timeline

After some initial exploring of data from WikiEd and Event Metrics, here are outstanding questions/issues that need discussion:

  1. Organizers and participants of events with different start and end year
    1. currently, start year
  2. Users who logged location in multiple countries
    1. currently, only one (the latest is being considered)
  3. How can we account for unavailability of data in final outputs (if at all)?
  4. EventMetrics usage seems to be much larger than for events/programs
  5. 90 day availability of geo data will be helpful!
  6. For improvement of match %, we will need to explore storing individual editors' geo-data for a longer period.

Sharing my thoughts briefly here:

On points 1 & 2 my recommendation is to be inclusive to include the usernames in multiple countries and/or years - Similarly, if a user name is a campaign organizer as well as a participant or course organizer, etc. I would include them for each, in all relevant bins.

Regarding Q3 - it would seem that until we better understand the data through repeat measures we should continue to triangulate these metrics with other signals as we work to understand and develop the new data pipeline.

Regarding point 4 and the very low hit rate from Event Metrics usage data, it feels like this is too varied a use space, partly because it offers a non-public way of cohort tracking; and without public transparency, like the dashboards offer, it contains a wide variety of "events", many of which are not events at all but data mapping efforts related to the tool's functionality. Further, with much lower hit rates, it also does not appear to offer a representative pulse point - I recommend we focus on the P&E dashboards for our metric use case rather.

Regarding point 5 - yes, agreed. We discussed also that in the meantime, the hit rates for the P&E dashboard usage data are quite high and we should explore the existing data to understand timing and source differences in hit rates to consider potential avenues to capture more robust signal metrics within the current data pipelines further, including, pulling more recent data and querying repeatedly in a year to calculate an average hit rate for the year rather to limit timing and seasonality influences on representativeness.

Regarding point 6 - yes, and there are at least two potential routes to consider that I can think of ... which are each at a completely different scale - maybe we can have a brainstorming session about this after we explore the crosstabs of the hits further?

  • Retaining the data pulled from WikiEd
  • Frequency of retrieval and averaging the percent ranks
  • Eliminating erroneous data from geo matches
    • 2 countries: 13.95%
    • 3 countries: 7.57%
    • 4 countries: 4.96%
    • 5 countries: 3.37%
JAnstee_WMF changed the task status from Open to In Progress.Jun 23 2022, 12:50 AM
JAnstee_WMF triaged this task as Medium priority.

We have finalized the considerations and metrics to be developed for Build 1.