Page MenuHomePhabricator

New Data Pipeline for New and Returning Active Editor metrics by Geo (Country & Region) and wiki
Open, MediumPublic

Description

Background/Goal

Current Situation:

Currently, we do not have a way to associate new editors with geo-location. The reason being is that fingerprint-based queries are time-limited because of our privacy policy. However, we’ve built aggregate tables that do not link usernames with geolocation such as:

wmf.geoeditors_monthly
geoeditors_edits_monthly
unique_editors_by_country_monthly

Purpose:
It would be useful to create a similarly aggregated tables that displays a count of new editors and returning editors (editors who have done more than 4 edits in the month they registered in) by geolocation and wiki. The ideal would be two separate tables which contain aggregated counts of new active editors and returning active editors The benefit would be to granulate our data and improve analysis by being able to link new editors with geo-location to identify any regional trends that could explain changes in our editor counts (ex. T351759). There have been substantial YoY drops in new active editors since October 2023 and it has become difficult to perform geo analysis on possible causes for these drops without last year’s data. Previous work has been done here by data engineering and GDI: https://codeshare.io/1YDQzj that was not productionized. Updated sample code is contained in the design specs google sheet document linked below.

Ideal Delivery Date: June 2024
Stakeholders: Foundation-level Metrics committee, Regional Learning Sessions, Foundation and Movement leadership

KR/Hypothesis(Initiative)

Success metrics

  • How we will measure success

Example areas:

  • Deadlines
  • User satisfaction
  • Performance
  • Accessibility
  • Maintenance
  • Movement impact
  • Scalability
  • Data Quality
  • Integration
  • Compliance

In scope

  • known scope

Out of Scope

  • known boundaries

Artifacts & Resources

Link to diagrams

Link to specifications, architecture and design docs
https://docs.google.com/document/d/1d_EXAiuSjviidNVru0XNhlg6BhEBWA9Wg79l1BTTlNo/edit

Link to product one pagers

Event Timeline

@Mayakp.wiki, @JAnstee_WMF - I think this is the type of thing that should go through the governance process. Do we have a data steward, technical steward, sign off on the metric, testing, etc?

Do we plan to do that here? Is there a different queue this work should go to before implementation?

nshahquinn-wmf moved this task from Incoming to Waiting on others on the Movement-Insights board.

Hi @lbowmaker , Since the Contributors metric in SDS1.1.1 was rejected and the hypothesis was disproven, we didn’t go ahead with implementing data stewardship even though data stewards were selected.

I would request that we treat this as a regular data request as we don’t plan to implement data governance on the Contributors metric area this FY (we are focussing on unique devices and page views in SDS1.1.2) . For this request pls consider the Movement-Insights team as a the business users and collaborators.

Thanks @Mayakp.wiki! Does this support current or upcoming OKR work?

Hi @VirginiaPoundstone this work supports the Essential Work we have to perform of providing movement metrics each month to inform internal staff as well as share externally.
As a part of the monthly movement metrics, we need to analyze trends and investigate anomalies, to inform executive leadership as well as engage various teams to resolve potential issues. This view of the editor data will help us do that.

Final update here: The requestor (Movement Insights) will implement on our own with consultation and/or code review from Data Platform Engineering.