Page MenuHomePhabricator

Develop strategies and tools for segmenting wikis [2018-19 AP output 4.3]
Open, NormalPublic

Description

Objective

This project is output 4.3 of the Audiences 2018–19 annual plan:

Instead of implementing programs that attempt to affect all wikis at the same time, it is common for a given Audiences program to focus just on groups of wikis, such as mid-size wikis, or large wikis. Given that we focus our work on groups of wikis, we should be able to report out using those groupings. The output here are evolving sets of segmentations that classify different wikis into groupings relevant for the Audience department's work. These will be used to align strategic planning, program focus, and reporting on Audiences department impact -- making it possible to report out using the same groupings as we use in our daily work.

See also the project brief [Wikimedia Foundation only].

Timeline / work plan

Time constraint: As this is an annual plan goal, it needs to be finished by June 2019 at the latest. However, it should be done much earlier than that: it's a strategic priority, and the the sooner it's completed, the sooner people can incorporate it into their thinking.

Current timeline

Phase 1 ✓

Produce a big spreadsheet of data that can be sorted and filtered.

Phase 2 (T221563)

Deprioritized

Recommend a standard set of key dimensions with standard classes for each (for example, monthly active editors might be a dimension, with low being 0–49, medium being 50–499, and high being 500+). We don't want too many dimensions (6 is about right) or too many classes per dimension (3-4 is about right).

Phase 3 (T203033)

Deprioritized

Use some unsupervised learning to try to cluster the wikis into meaningful groups which we can name, describe, and make the standard groups for understanding our wikis.

The results are uncertain, because it's hard to predict whether unsupervised learning will have meaningful results, but there's only one way to find out!

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2018, 1:14 PM
Neil_P._Quinn_WMF triaged this task as High priority.Mar 29 2018, 5:52 AM
Neil_P._Quinn_WMF updated the task description. (Show Details)
Neil_P._Quinn_WMF updated the task description. (Show Details)
Neil_P._Quinn_WMF added subscribers: MNovotny_WMF, dchen.
Neil_P._Quinn_WMF renamed this task from Develop basic segmentation strategies for wikis to Develop strategies and tools for segmenting wikis.Apr 11 2018, 5:23 PM
Neil_P._Quinn_WMF added a project: Epic.
Neil_P._Quinn_WMF updated the task description. (Show Details)
MBinder_WMF moved this task from Triage to Doing on the Product-Analytics board.Apr 12 2018, 8:28 PM
Elitre added a subscriber: Elitre.Apr 27 2018, 5:19 PM
Neil_P._Quinn_WMF updated the task description. (Show Details)
leila added a subscriber: leila.Jul 27 2018, 5:52 PM
Neil_P._Quinn_WMF added a comment.EditedAug 10 2018, 7:57 PM

@MNovotny_WMF, @JKatzWMF, and I discussed the timeline for phase 3 today and came up with the following plan:

  • I'm expecting to start working half-time on the project at the start of September (possibly with initial exploration before that).
  • I should start off by trying out clustering strategies rather than by adding more dimensions, since that's the part of the process we lack experience with.
  • We will meet again midway through September with more knowledge of the project to set a firmer timeline.
  • We hope to have something shareable by the end of September.
JKatzWMF removed Neil_P._Quinn_WMF as the assignee of this task.Oct 4 2018, 10:53 PM
JKatzWMF moved this task from Next Up to Backlog on the Product-Analytics board.
Neil_P._Quinn_WMF renamed this task from Develop strategies and tools for segmenting wikis to Develop strategies and tools for segmenting wikis [2018-19 AP output 4.3].Jan 17 2019, 11:41 PM
Neil_P._Quinn_WMF claimed this task.
Neil_P._Quinn_WMF updated the task description. (Show Details)
Neil_P._Quinn_WMF moved this task from Backlog to Next Up on the Product-Analytics board.
Neil_P._Quinn_WMF moved this task from Next Up to Doing on the Product-Analytics board.
kzimmerman lowered the priority of this task from High to Normal.Apr 4 2019, 5:08 AM
Neil_P._Quinn_WMF updated the task description. (Show Details)

We have decided that, sadly, we don't have the capacity to finish phases 2 and 3 during this fiscal year (by June), so we have deprioritized them. We would still like to do them someday, so they live on as separate tasks (T221563 and T203033).

However, we do want to keep supporting the wiki segmentation dataset, so we will be making some fixes and updates to it (T221566) before we close out this project.

jlinehan moved this task from Tracking to To Do on the Better Use Of Data board.Jul 16 2019, 5:18 PM