Page MenuHomePhabricator

Weekly updates on editors
Closed, ResolvedPublic

Description

Editors data is NOT available weekly in the refined dataset we usually use for reporting on editors (mediawikihistory) - it's only available monthly.

  • Weekly edit count
  • Editor count (ideally in content namespaces, if possible)
    • Logged-in
    • IP addresses
  • Identify & explain differences between weekly estimates and monthly refined data (provide equivalent calculations for monthly, so people can understand the difference)

Extract data from the events database (edits/deletions/page creations)

Timeline: initial draft on Friday

Details

Due Date
Apr 4 2020, 1:00 AM

Event Timeline

kzimmerman updated the task description. (Show Details)

Have published the weekly editing dashboard at weekly editing dashboard. Want to clarify two differences.

The data sources of year 2020 and year 2019 are different due to a data limitation. Edits data of year 2020 is extracted from table event.mediawiki_revision_create, which records the latest edits events. But I found in table event.mediawiki_revision_create the data between 2019-9-23 and 2019-9-29 was missing ( discussing in another ticket T233718). So for data of year 2019, I used wmf.mediawiki_history as the data source, which is also the data source of our monthly editing metrics.

The definition of weekly editors is different from that of monthly editors. Monthly active editors is defined as the number of registered users who made at least 5 content edits across all projects in the given month (Product data dictionary). When defining weekly editors, considering the threshold of 5 content edits per week would be too high, it is defined as the number of registered users who made at least 1 content edit across all projects in the given week, which represents an approximately equivalent frequency to 5 edits per month for monthly active editors. Given weekly editors definition adopts a lower threshold , it captures less frequent editors, resulting in sum of weekly editors being higher than that of monthly editors for the same time period.

@jwang @kzimmerman -- thank you for working on this. Is this weekly edits data available in Turnilo/Superset? Or accessible just through the dashboard you posted?

@MMiller_WMF, it's just through the dashboard I posted. The database, which Turnilo/Superset can access to, is scooped on a monthly basis. The latest weekly data is not available for these tools.

@MMiller_WMF the tables @jwang is using are available in superset so you can run selects like these through superset SQL Lab:

SELECT month, COUNT(*) AS edits
FROM event.mediawiki_revision_create
WHERE year='{YEAR_YYYY}' AND  (rev_timestamp >= '{START_YYYY_MM_DD}' AND rev_timestamp < '{END_YYYY_MM_DD}') 
GROUP BY month

fancier work done in python in the notebook will of course be hard to do with just mysql.

Please have in mind that it is on our plan next year to offer a more frequently updated version of edits_hourly.