Newcomer tasks: article configurations for topics
In T231506, we are prototyping several methods for surfacing interesting articles to newcomers who have no editing history. One of the topic matching approaches we want to prototype would do this:

  1. Newcomers to select topics of interest from a list (the same list of 27 topics from the welcome survey).
  2. For each of the 27 topics, we have a configuration of articles that represent those topics. For instance, if the newcomer selects "Music", the articles representing it might be "Music", "Concert", "Pop music", and "Singing".
  3. Then we feed those articles into the "morelike" algorithm to find articles similar to those, and return the ones that have maintenance templates on them.

Essentially, this method "pretends" the newcomer edited some of the core articles around their topics of interest to get suggestions for similar articles.

We want to actually prototype this method in our target wikis and see how it works. Therefore, the task is for our team's ambassadors to fill in this spreadsheet with articles that map to the 27 topics. Each language has a tab, where can be listed up to five articles. There do not need to be five articles for each one. There could 3 or 4 if those adequately represent the topic. The "English example" tab is meant to show what this should look like. That tab is only partially completed because we don't want the English list to bias the other lists too much.

We're not sure how well this will go, and we don't know the best way to choose articles for the list. That's why we want to try it out, and change if we need to.

Here are some thoughts:

  • We want to be conscious of cultural biases. Perhaps the articles that are important for "Arts" in the English-speaking world are different than the ones for "Arts" in the Arabic speaking world. That's why we're asking ambassadors to do this for their own wikis, instead of making one English list and linking to articles in each language.
  • One useful tool may be the Vital Articles list, maintained on English Wikipedia. This helped me find important articles as I was creating examples for English. If other wikis have something similar, it may be helpful.
  • When listing articles, it is probably better to list general, overarching topics instead of specific examples. For instance, it is probably better to list "Painting" instead of "Vincent Van Gogh". This will help us not bias toward specific artists from specific regions. However, if there is something you think is particularly relevant to the culture of your wiki -- for instance, perhaps there is a specific type of art that is important to your culture -- it would be good to list that.

Event Timeline

Assigning to @Trizek-WMF to monitor this work. I made subtasks for each wiki with attached due dates. The due dates for Arabic and Czech are Sept 25 and the due date for Korean is Sept 27, since we have not yet talked about this task with @revi.

Does the Article IDs in the Google Doc mean anything? In another words, will different order of articles change something?

@Urbanecm -- I think I answered this in chat, but just so that it's also here: yes, the article IDs are the priority level. So the ID #1 should be the article that is most closely related to the topic. But it's not very important that it be in the exact right order -- it's just in case we want to try sending fewer articles to the algorithm to see how it performs, and we want to know which ones to send. Does that make sense?

Today, @Dyolf77_WMF, @Trizek-WMF, @Urbanecm and I discussed this work, and we have some notes from @Dyolf77_WMF's experience:

  • Filling in four or five entries for each of the 27 topics took about four hours.
  • Sometimes, five options felt like it was not enough to fully encapsulate the topic.
  • A good resource to find other good links about a topic is the "See also" section at the bottom of the first article chosen for that topic.

The ambassadors have all finished filling out the lists. This task is finished.

Also, when we are ready, we should take all the contents and publish them to each community wiki as we have done for task types, e.g. for Czech in addition to, we will also need to have

