Newcomer tasks: evaluate topic matching prototypes
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MMiller_WMF
	Sep 30 2019, 10:22 PM

Description

In T231506, we explored several methods with which to surface articles to newcomers based on the topical interests of those newcomers. This is difficult because newcomers have no editing history with which to make recommendations.

This task is about evaluating three methods that we've put into interactive prototypes. Below, we describe each of the prototypes and how ambassadors can evaluate them.

1. Morelike

Prototype
How it works: Newcomer selects from a list of 27 broad topics. Each of those 27 topics has a corresponding list of articles that are pre-set by ambassadors in T233465 (the "seed" articles). For each of the topics that the newcomer selects, the prototype takes the seed articles and does a search for more articles that have a lot of the same words in common with the seed articles. It narrows the results to those that have a maintenance template and displays the results.
How to evaluate:
- Select your language.
- For each of the 27 topics...
  - Select the topic.
  - Select all the task type checkboxes.
  - Leave all other settings alone.
  - Look at the first ten articles that get returned and count how many of the ten are good results for that topic. For instance, the article on "Elevator" would be a good result for the topic "Engineering". But the article on "Shoes" would not.
  - Write down that score in this sheet.
  - If any topic has fewer than 10 results, indicate that by making a note in the cell.
Notes:
- You can click each result to see details about its templates, categories, and the search that was run.
- The prototype contains some additional algorithm settings. You are welcome to play with those and record some of your notes about what you notice, but we're evaluating them based on the default settings.
- Although the prototype allows you to select only certain maintenance templates, we think you should select all of them for this exercise, because we're really only evaluating the topic matching abilities here. We can separately count how many results show up for each maintenance template.

2. Free text

Prototype (same as morelike)
How it works: Newcomer types in some text in the search field, and it runs a normal search, just like the search bar in Wikipedia, but narrowed to articles with maintenance templates. This allows the user to search for more specific topics.
How to evaluate:
- Select your language.
- Select all the task type checkboxes.
- Leave all the other settings alone.
- Type one topic at a time into the free text field. Please try 15 different topics of your choice, that can more or less specific. A more general one might be "Swimming" and a more specific one might be "Pokemon".
- Look at the first ten results for that search term, and count how many look like good results.
- Put the search term and the score in this sheet.

3. ORES

Prototype
How it works: There is a machine learning model in English Wikipedia that classifies any English article into a topic. The topics are made through the English WikiProject hierarchy and are not the same as the ones from the "morelike" list (but we could align them later if we like this approach). The method takes all the articles with maintenance templates in the target wiki, then finds the ones that also exist in English Wikipedia, then gets their ORES topic score from English and applies it to the target language's version. That means that the only articles that come up are the ones that exist in English, too. That's not optimal because it would mean that we don't recommend any local-language articles for editing, but we still want to try this method out to see how good it is.
How to evaluate:
- Select your language.
- For each of the 42 topics...
  - Select the topic.
  - Select all the task type checkboxes.
  - Look at the first ten articles that get returned and count how many of the ten are good results for that topic.
  - Write down that score in this sheet.
  - If any topic has fewer than 10 results, indicate that by making a note in the cell.
Notes:
- We only have the English names of the topics, but if we like this method, we would figure out how to translate them to local languages.

Details

Due Date: Oct 16 2019, 12:00 PM

Related Objects
Search...

Status	Assigned	Task
Resolved	MMiller_WMF	T238608 [EPIC] Growth: Newcomer tasks 1.1.0 (topic matching)
Resolved	kostajh	T231506 Newcomer tasks: prototype topic matching
Resolved	MMiller_WMF	T234272 Newcomer tasks: evaluate topic matching prototypes
Resolved	Urbanecm_WMF	T234347 Newcomer tasks: evaluate topic matching prototypes (cs)
Resolved	Dyolf77_WMF	T234348 Newcomer tasks: evaluate topic matching prototypes (ar)
Resolved	revi	T234424 Newcomer tasks: evaluate topic matching prototypes (ko)

Event Timeline

@kostajh @Trizek-WMF -- this is the evaluation protocol I made for testing the prototypes. Once @kostajh says the prototypes are ready, I'll make a separate task for each ambassador to work on this. I think this will take a lot of time, so maybe the ambassadors will need a couple weeks with it.

In the meantime, please comment or change things if you think it could be better.

@kostajh @Trizek-WMF -- this is the evaluation protocol I made for testing the prototypes. Once @kostajh says the prototypes are ready, I'll make a separate task for each ambassador to work on this. I think this will take a lot of time, so maybe the ambassadors will need a couple weeks with it.

I've copied over all of the topic titles into the respective configuration files on MediaWiki.org. Because reading Korean and Arabic is difficult for me, it's hard to know if I copy/pasted everything correctly. Ambassadors, please feel free to glance over my lists. Any updates should please be made directly on MediaWiki.org at https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Newcomer_tasks/Prototype/topics/{langCode}.json. Those files are read directly by the prototype, so any change you make on mediawiki.org will show up in the UI of the morelike/keyword search prototype.

kostajh updated the task description. (Show Details)Oct 1 2019, 2:24 PM

Trizek-WMF triaged this task as High priority.Oct 1 2019, 3:58 PM

Trizek-WMF set Due Date to Oct 15 2019, 1:00 PM.

MMiller_WMF mentioned this in T234347: Newcomer tasks: evaluate topic matching prototypes (cs).Oct 1 2019, 4:32 PM

MMiller_WMF mentioned this in T234348: Newcomer tasks: evaluate topic matching prototypes (ar).

• Mholloway unsubscribed.Oct 1 2019, 4:35 PM

• Mholloway subscribed.

kostajh mentioned this in T231506: Newcomer tasks: prototype topic matching.Oct 2 2019, 6:40 AM

@MMiller_WMF the ORES prototype has been updated with datasets for Arabic, Czech and Korean.

Language	Tasks with topics	Task without topics
ko	15,511	6,496
cs	19,610	7,440
ar	23,626	10,816

It's also worth noting that (with the exception of Arabic, which would have tens of thousands more potential tasks due to my temporarily removal of the two most populated templates), these are the actual counts of potential tasks to show in suggested edits if we go with this approach. (cc @nettrom_WMF and @RHo )

Trizek-WMF mentioned this in T234424: Newcomer tasks: evaluate topic matching prototypes (ko).Oct 2 2019, 10:30 AM

I am going to move this task to the Epic column on the work board. It appears we are now having discussion on it but there is nothing actionable because the actionable next steps are for the ambassadors in the subtasks, if that is incorrect let me know!

Trizek-WMF changed Due Date from Oct 15 2019, 1:00 PM to Oct 15 2019, 12:00 PM.Oct 2 2019, 2:48 PM

Trizek-WMF changed Due Date from Oct 15 2019, 12:00 PM to Oct 16 2019, 12:00 PM.

Hello, I've noticed that for cs/Arts at least, it gives different results each time I try to submit a query. See screencast on https://martin.urbanec.cz/files/screencasts/newcomer_tasks_prototypes_cs_arts_01.webm. @MMiller_WMF said that's a bug in the chat, and asked me to put more information in this task.

Thanks, @Urbanecm. That's not the behavior I was expecting. @kostajh is out until Monday, and he'll be able to take a look then. Maybe it is randomizing results or something.

In T234272#5544385, @Urbanecm wrote:

Hello, I've noticed that for cs/Arts at least, it gives different results each time I try to submit a query. See screencast on https://martin.urbanec.cz/files/screencasts/newcomer_tasks_prototypes_cs_arts_01.webm. @MMiller_WMF said that's a bug in the chat, and asked me to put more information in this task.

I can reproduce similar behavior for freetext. OTRS seems to work fine.

I can reproduce similar behavior for freetext. OTRS seems to work fine.

To clarify, do you mean ORES?

Hello, I've noticed that for cs/Arts at least, it gives different results each time I try to submit a query. See screencast on https://martin.urbanec.cz/files/screencasts/newcomer_tasks_prototypes_cs_arts_01.webm. @MMiller_WMF said that's a bug in the chat, and asked me to put more information in this task.

The requests to the search API are done asynchronously, so sometimes one finishes earlier than another. If the exact order is important (e.g. always show "Kopírovat úpravy" results then "Reference" then "Info" etc) I could change the prototype to do that. Or I could change it to completely randomize the result order, if that is preferable.

In T234272#5550113, @kostajh wrote:

I can reproduce similar behavior for freetext. OTRS seems to work fine.

To clarify, do you mean ORES?

Ah, yes.

Hello, I've noticed that for cs/Arts at least, it gives different results each time I try to submit a query. See screencast on https://martin.urbanec.cz/files/screencasts/newcomer_tasks_prototypes_cs_arts_01.webm. @MMiller_WMF said that's a bug in the chat, and asked me to put more information in this task.

The requests to the search API are done asynchronously, so sometimes one finishes earlier than another. If the exact order is important (e.g. always show "Kopírovat úpravy" results then "Reference" then "Info" etc) I could change the prototype to do that. Or I could change it to completely randomize the result order, if that is preferable.

Leaving for @MMiller_WMF :).

In T234272#5550113, @kostajh wrote:

The requests to the search API are done asynchronously, so sometimes one finishes earlier than another. If the exact order is important (e.g. always show "Kopírovat úpravy" results then "Reference" then "Info" etc) I could change the prototype to do that. Or I could change it to completely randomize the result order, if that is preferable.

My 2c is that having the exact order is important for someone who is searching for this reason, so that they do not think that there is a bug in the search query.

My 2c is that having the exact order is important for someone who is searching for this reason, so that they do not think that there is a bug in the search query.

I can see that. OTOH, the instructions for evaluating are to look at the first ten articles returned, and given that it's likely that the first 10 will belong to a single template only (e.g. maybe results 1 through 31 are for "Kdo?" and results 32-40 are for "Kdy?", the evalutor should only look at the first 10 and therefore they only see template results for "Kdo?") maybe randomizing the result list would be more fair for a comparison, since we're asking the evaluators to look a single time at the results and not multiple times.

I'm happy to do whatever makes sense to @RHo and @MMiller_WMF, just let me know.

Hi @kostajh , the list of topic on the spreadsheet does not match with the 42 topics in the ORES tool. Can you have a look on it please?

Dyolf77_WMF closed subtask T234348: Newcomer tasks: evaluate topic matching prototypes (ar) as Resolved.Oct 7 2019, 9:30 PM

@Urbanecm -- thanks for noticing that the results change on successive searches. I think that for the purposes of this evaluation, we shouldn't worry about it. I think you should just do one search and record the results from that search. When it comes time for implementation, we'll make sure to randomize correctly.

Dyolf77_WMF reopened subtask T234348: Newcomer tasks: evaluate topic matching prototypes (ar) as Open.Oct 8 2019, 8:13 AM

Dyolf77_WMF closed subtask T234348: Newcomer tasks: evaluate topic matching prototypes (ar) as Resolved.Oct 8 2019, 9:18 AM

Hi @kostajh , the list of topic on the spreadsheet does not match with the 42 topics in the ORES tool. Can you have a look on it please?

@Dyolf77_WMF what does not match?

leila unsubscribed.Oct 8 2019, 7:33 PM

@Dyolf77_WMF what does not match?

In the ORES sheet, the topic Geography.Bodies of water (this topic is on the tool) is missing and for the 4 first topics on the same spreadsheet, are not matching with what is existing on the tool.

We've done a review today. Martin is doing to finish it asap.

@Urbanecm and @revi, what is the status of your sub-tasks?

Sorry @Trizek-WMF, I thought I updated the subtask. Done from my side, moved to an appropriate column.

revi closed subtask T234424: Newcomer tasks: evaluate topic matching prototypes (ko) as Resolved.Oct 24 2019, 11:11 PM

• Charlotte subscribed.Oct 25 2019, 8:59 PM

We are currently working on deciding exactly how to proceed based on these evaluations, which I will post on this task.

All subtasks done.

I will resolve this task once I post on it what our decision is from these evaluations.

Assigning to you then.

Following these evaluations, we have decided to proceed with the ORES drafttopic model. That is because:

The ORES model performed the best in terms of accuracy. In other words, the highest number of results looked correct for each topic.
There is a dedicated team (the Scoring team) for supporting and improving ORES.
It is a system already in production that we can scale to more wikis in the future.

MMiller_WMF mentioned this in T242400: Newcomer tasks: ambassadors test morelike.Jan 10 2020, 1:44 AM

MMiller_WMF mentioned this in T245368: Newcomer tasks: evaluate new ORES topic models.Feb 16 2020, 7:07 PM

Urbanecm edited subscribers, added: Urbanecm_WMF; removed: Urbanecm.Aug 26 2020, 2:10 PM

MMiller_WMF mentioned this in T266201: Link-based and text-based topic evaluations October 2020.Oct 21 2020, 11:40 PM

Newcomer tasks: evaluate topic matching prototypesClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Newcomer tasks: evaluate topic matching prototypes
Closed, ResolvedPublic
Actions

Related Objects
Search...