Page MenuHomePhabricator

Wikistats "Active Editors by Country" does not follow definition for active editors
Open, MediumPublicBUG REPORT

Description

Data Platform Engineering Bug Report or Data Problem Form.

Please fill out the following
Please ensure you set priority

What kind of problem are you reporting?

  • Access related problem
  • Service related problem
  • Data related problem
For a data related problem:
  • Is this a data quality issue?
  • What datasets and/or dashboards are affected?
    • Wikistats Contributing Metrics, "Active editors by country"
  • What are the observed vs expected results? Please include information such as location of data, any initial assessments, sql statements, screenshots.

Looking at Jan 2024, why is the total number of active editors for enwiki 41k here, but when looking at country-level, summed, it's ~62k, per adding the total number for editors with 5-99 edits (56k) with the total number for editors with 100+ edits (6k)?)

For the DE Team to fill out
Which systems does this effect?
  • Hive
  • Druid
  • Superset
  • Turnilo
  • WikiDumps
  • Wikistats
  • Airflow
  • HDFS
  • Goblin
  • Scqoop
  • Dashiki
  • DataHub
  • Spark
  • Jupyter
  • Modern Event Platform
  • Event Logging
  • Other
Impact Assessment:

Does this problem qualify as an incident?

  • Yes
  • No

Does this violate an SLO?

  • Yes
  • No
Value CalculatorRank
Will this improve the efficiency of a teams workflow?1-3
Does this have an effect of our Core Metrics?1-3
Does this align with our strategic goals?1-3
Is this a blocker for another team?1-3

Event Timeline

kzimmerman triaged this task as Medium priority.
Pppery renamed this task from NEW BUG REPORT Wikistats "Active Editors by Country" does not follow definition for active editors to Wikistats "Active Editors by Country" does not follow definition for active editors.Mar 14 2024, 4:27 AM
VirginiaPoundstone added a subscriber: Htriedman.

@kzimmerman thank you for filing this task. I have a few follow up questions:

In order to prioritize this we need to know:

  • Does this relate to a current OKR?
  • Does this metric have a business owner?
  • Does this relate to the newly forming contributor metric?

We are going to untag wikistats, because this is not about the UI, but about the data itself.

The differential privacy work that @Htriedman has been working on may also be worth evaluating as an alternative.

I believe this dataset that's already being published is strictly better and in my opinion should replace the current active editors by country data: https://analytics.wikimedia.org/published/datasets/geoeditors_weekly/ (also the monthly version)

@kzimmerman @Milimetric happy to set up a meeting next week to discuss the differences between the DP and non-DP versions of the geoeditors monthly/weekly datasets

@Milimetric FWIW about the weekly dataset — folks from product analytics told me that maintaining and publishing the monthly dataset is important for continuity with existing dataset.

Also, the DP geoeditors dataset only goes back to July 2023, but it would be relatively to extend that dataset (at least for editor counts) back to Jan 2018.

Hi @VirginiaPoundstone, my responses are below:

  • Does this relate to a current OKR?

I'm not sure if this relates to a current OKR - @CMyrick-WMF could answer that.

  • Does this metric have a business owner?

It does not (yet), but @OSefu-WMF / the Movement Insights team should be the point of contact.

  • Does this relate to the newly forming contributor metric?

Yes - and Omari is leading discussions about the contributor metric.

  • Does this relate to a current OKR?

I'm not sure if this relates to a current OKR - @CMyrick-WMF could answer that.

Not that I'm aware of. I was just using Wikistats as a way to QA the outputs of some queries I ran for my work related to language metrics.