Page MenuHomePhabricator

[SPIKE] Generate a list of the usernames of people who are editing Wikipedia from Sub-Saharan Africa
Closed, ResolvedPublic

Description

This task involves the work of generating a list of the usernames of people who have either a) attempted to publish an edit to a Wikipedia project in the previous 90 days and/or b) successfully published an edit to a Wikipedia project in the same period.

This information will help us:

  • Understand the kinds of edits people within our target audience are arriving at the wikis attempting to make (T313071, T313566)
  • Understand what people within our target audiences are experiencing after publishing an edit to Wikipedia
    • Are other volunteers posting messages on their user talk pages? Are the edits they are making getting reverted? Are the edits they are making getting "Thanked"?
  • Contact people within our target audience (T313847)

So that we can:

  • Decide on an initial edit check to implement (T312180)
  • Ensure people within our target audience are experiencing the check(s) we will be implementing in ways that align with the design principles we will have established in T313852

Requirements

  • A list of the usernames of people who have published and/or attempted to publish an edit to any Wikipedia project in the past 90 days from a country within Sub-Saharan Africa
  • The number of edits each person has attempted to publish and/or successfully published within the past 90 days, grouped by project
  • The number of cumulative edits each person has made
  • The specific country within which they have been editing Wikipedia
WARNING: the usernames of people editing from within any of the countries listed here should be EXCLUDED from the list this task will produce.

Open questions

  • What – if any – privacy/security precautions will we need to adhere to as part of the completion of this task? @ppelberg to consult with the Legal Team before work on this task begins.

Event Timeline

ppelberg renamed this task from {SPIKE] Generate a list of the usernames of people who are editing Wikipedia from Sub-Saharan Africa to [SPIKE] Generate a list of the usernames of people who are editing Wikipedia from Sub-Saharan Africa.Aug 16 2022, 8:55 PM
ppelberg updated the task description. (Show Details)

Update: I've asked the legal team about what – if any – privacy protocols we might need to adhere to as part of gathering the data this task describes.

MNeisler triaged this task as Medium priority.

@ppelberg
Note: Pending the resolution of T314178, we do not currently know the geolocation information (country and region) of editors that attempt but never complete an edit. In the meantime, I can plan to review editors from Sub-Saharan Africa that published an edit.

MNeisler moved this task from Doing to Done on the Product-Analytics (Kanban) board.

@ppelberg
I've shared a spreadsheet with the requested data. Further use and sharing is still pending legal review.

The usernames identified in the spreadsheet currently meet the following conditions:

  • Under 100 cumulative edits on the specified wiki
  • Registered
  • Published[i] an edit in the past 60 days[i] to the main namespace of the Wikipedia project
  • Made an edit from one of the countries identified within the Sub-Saharan Africa region

[i]Pending the resolution of T314178, we do not currently know the geolocation information (country and region) of editors that attempt but never complete an edit.
[ii] Data is based on editors_daily dataset which is only kept for 60 days. See data retention details on the wiki page.

Data Retention Note:
This data will only be kept for 90 days pending review and approval by Legal to extend the data retention period.

@ppelberg
I've shared a spreadsheet with the requested data. Further use and sharing is still pending legal review.

This is looking good, @MNeisler.

As we talked about earlier today, the next steps are to:

  • 1. ADD a column that includes the cumulative edits each person has made at a given project, across namespaces
  • 2. ADD a column that includes the date a given account was created
  • 3. ADD a column that includes the date a given account published its first edit to the main namespace of a Wikipedia project. Note: we did NOT talk about adding this data. So, if you anticipate adding this data to be more difficult than "2." and "3." please let me know.

@ppelberg

I've updated the list with all the columns identified in T314548#8237914.

A couple notes and clarifications regarding the data:

  • Each row reflects a distinct editor by country and Wikipedia project. If a user has edited more than one project or from more than one country, they will appear multiple times in the list. If you sort by username, you can find all a user's edit history grouped together.
  • There are about 9 instances where the account creation date is after the date of the editor's first edit to the main namespace. This is a known issue that is caused when those edits were made on a different wiki and then imported over to the current wiki to be translated. See slack thread if you're interested in more details.

Let me know if you have any further questions

Update

  • Megan and I are awaiting guidance from WMF-Legal about how/if we can produce the dataset this ticket is asking for.
ppelberg added a subscriber: LMixter.

Update

  • Megan and I are awaiting guidance from WMF-Legal about how/if we can produce the dataset this ticket is asking for.

Per the guidance @LMixter shared today offline, work on this task can resume provided we modify the scope to meet the following requirements:

  1. The query we run returns data that does not include any mention of the specific country people are editing within
  2. We delete the data this query will produce within 90 days of it being generated

...I've updated the task description to reflect the above.

ppelberg updated the task description. (Show Details)
MNeisler moved this task from Doing to Done on the Product-Analytics (Kanban) board.

@ppelberg

I've shared with you shared a spreadsheet with the requested data. I've rerun the query to provide data on editors that published at least 1 edit in the last 90 days and modified the query and results to remove any mention of the specific country people are editing within. Please let me know if you have any questions or suggested updates.

Note we will need to delete this data on March 1, 2023 in accordance with data retention guidelines.

@ppelberg

I've shared with you shared a spreadsheet with the requested data. I've rerun the query to provide data on editors that published at least 1 edit in the last 90 days and modified the query and results to remove any mention of the specific country people are editing within. Please let me know if you have any questions or suggested updates.

Note we will need to delete this data on March 1, 2023 in accordance with data retention guidelines.

Noted. Thank you for making this explicit, @MNeisler.

Per the conversation we had offline today, we're going to delete the existing data and NOT regenerate a new list for now.

If/when a need for a list of this sort resurfaces again, we'll re-use the query you wrote to generate this initial list.