Page MenuHomePhabricator

Develop strategies and tools for segmenting wikis [2018-19 AP output 4.3]
Closed, ResolvedPublic

Description

Objective

This project is output 4.3 of the Audiences 2018–19 annual plan:

Instead of implementing programs that attempt to affect all wikis at the same time, it is common for a given Audiences program to focus just on groups of wikis, such as mid-size wikis, or large wikis. Given that we focus our work on groups of wikis, we should be able to report out using those groupings. The output here are evolving sets of segmentations that classify different wikis into groupings relevant for the Audience department's work. These will be used to align strategic planning, program focus, and reporting on Audiences department impact -- making it possible to report out using the same groupings as we use in our daily work.

See also the project brief [Wikimedia Foundation only].

Timeline / work plan

Time constraint: As this is an annual plan goal, it needs to be finished by June 2019 at the latest. However, it should be done much earlier than that: it's a strategic priority, and the the sooner it's completed, the sooner people can incorporate it into their thinking.

Current timeline

Phase 1 ✓

Produce a big spreadsheet of data that can be sorted and filtered.

Phase 2 (T221563)

Deprioritized

Recommend a standard set of key dimensions with standard classes for each (for example, monthly active editors might be a dimension, with low being 0–49, medium being 50–499, and high being 500+). We don't want too many dimensions (6 is about right) or too many classes per dimension (3-4 is about right).

Phase 3 (T203033)

Deprioritized

Use some unsupervised learning to try to cluster the wikis into meaningful groups which we can name, describe, and make the standard groups for understanding our wikis.

The results are uncertain, because it's hard to predict whether unsupervised learning will have meaningful results, but there's only one way to find out!

Event Timeline

nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf renamed this task from Develop basic segmentation strategies for wikis to Develop strategies and tools for segmenting wikis.Apr 11 2018, 5:23 PM
nshahquinn-wmf added a project: Epic.
nshahquinn-wmf updated the task description. (Show Details)

@MNovotny_WMF, @JKatzWMF, and I discussed the timeline for phase 3 today and came up with the following plan:

  • I'm expecting to start working half-time on the project at the start of September (possibly with initial exploration before that).
  • I should start off by trying out clustering strategies rather than by adding more dimensions, since that's the part of the process we lack experience with.
  • We will meet again midway through September with more knowledge of the project to set a firmer timeline.
  • We hope to have something shareable by the end of September.
JKatzWMF moved this task from Next Up to Backlog on the Product-Analytics board.
nshahquinn-wmf renamed this task from Develop strategies and tools for segmenting wikis to Develop strategies and tools for segmenting wikis [2018-19 AP output 4.3].Jan 17 2019, 11:41 PM
nshahquinn-wmf claimed this task.
nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf moved this task from Backlog to Next Up on the Product-Analytics board.
nshahquinn-wmf moved this task from Next Up to Doing on the Product-Analytics board.
kzimmerman lowered the priority of this task from High to Medium.Apr 4 2019, 5:08 AM

We have decided that, sadly, we don't have the capacity to finish phases 2 and 3 during this fiscal year (by June), so we have deprioritized them. We would still like to do them someday, so they live on as separate tasks (T221563 and T203033).

However, we do want to keep supporting the wiki segmentation dataset, so we will be making some fixes and updates to it (T221566) before we close out this project.

I don't think there's any value to this task anymore; in particular, the 2018-19 fiscal year is done so there's no big project anymore. T221566 lives on separately.