Page MenuHomePhabricator

Add a link: rejection reasons
Closed, ResolvedPublic

Description

As part of the information we use to make improvements to "add a link", we want to look at the rejection reasons given by users for the link suggestions they reject. These reasons may give us a sense of how we could improve our algorithm, or help us better understand the mindset of our users.

Perhaps a simple way to do this analysis would be just to count up each instance of a rejection across all "add a link" sessions, regardless of whether it is the user's first session. We would want to restrict just to "newcomers" -- perhaps those whose accounts were created after May 29 when the feature was deployed. Then crosstab the rejection reasons (including a "no response", since users can close the rejection dialog without selecting anything) against these attributes:

  • Wiki
  • Platform
  • User's number of "add a link" edits

We can also consider crosstabbing by whether the user went through onboarding, but that might be more challenging and can be a "nice to have".

Event Timeline

@nettrom_WMF -- I made this simple task for counting us rejection reasons, including some specifications of how it might be done. But you are more than welcome to alter the specifications if you see a better way to get to valuable info. Depending on what we find here, we might want to dig in deeper (e.g. perhaps by splitting up rejection reasons by whether a user went through onboarding).

I've completed a first pass at this analysis, and found that we don't appear to be learning much from splitting the data in any of the described ways. The rejection reasons are quite sturdy across wiki, platform, and edit counts.

I limited the data to rejections occurring within a given timespan since registration. First set to 7 days, then to 28 days, finding that the patterns generally hold. Let's look at the overall distribution of reasons:

PlatformRejection reasonN%
DesktopAlmost everyone knows what it is2,73253.0%
DesktopLinking to wrong article1,37726.7%
DesktopOther68813.0%
DesktopText should include more or fewer words3787.3%
MobileAlmost everyone knows what it is1,83553.3%
MobileLinking to wrong article79123.0%
MobileOther48414.1%
MobileText should include more or fewer words2717.9%
MobileUndefined621.8%

The distribution of these reasons is generally the same across all wikis, platforms, and tenure levels (number of Add a Link edits made). For some combinations of these features, we run into the issue of having few data points available (e.g. some wikis lean strongly to one platform) and that might result in a somewhat different distribution (e.g. just ones labelled "Text should include more or fewer words"). Generally, we have a lot of data for users with few edits as that is what most users are.

One thing we do appear to see is that for some wikis is that the usage of "Other" decreases as user tenure increases, meaning that it might be a catch-all/safe category for less experienced users and that they become more confident in labelling a link with one of the other categories as they make more judgements.

Based on this, my conclusions are:

  • We'd want to consider figuring out ways to limit the number of links we suggests for things "almost everyone knows what is".
  • We'd consider changing the workflow so users have a straightforward way to modify the target, and maybe also the words used in the link.

I've added the above table as well as the methodological notes and findings as a new section called Rejection Reasons in the Add a Link MW page's "Measurement" section. The section links back to this phab task. The notebooks for this analysis have been pushed to GitHub: data gathering notebook, analysis and graphs notebook. Closing this task as resolved.

@nettrom_WMF, "Everyday" is for a term everyone knows, right? I'm asking as the rejection reasons you use are not documented.

@nettrom_WMF, "Everyday" is for a term everyone knows, right? I'm asking as the rejection reasons you use are not documented.

Oh darn, thanks for catching that! And you're right in what "Everyday" refers to. I'll go update both the table in my comment above as well as the mediawiki.org page to use the phrases from the interface.

@nettrom_WMF -- I'm reopening this task because @RHo and I want an additional piece of data to help us decide whether to take action on T269648: Add a link: changing the link target (not for initial release). Would you be able to extract a list of target/suggestion pairs for suggestions that got the rejection reason of "linking to wrong article"? We want to look through some to see if the user would have had a good chance of finding a better target.

I'm thinking that we could pull the list of about 100 pairs (maybe from a random smattering of users) from French Wikipedia, Arabic Wikipedia, Czech Wikipedia, Spanish Wikipedia, and Bengali Wikipedia. We will then ask the ambassadors to look through them.

What do you think?

@MMiller_WMF : Yeah, we've got the link target and text logged in the instrumentation. It might also be useful to extract the probability score and not use it in the reviews, so that we can later see if there's a correlation (e.g. that those where a better target might be found have a higher or lower probability).

A couple of suggestions:

  1. I'd like to restrict it by user so that highly active users don't end up having multiple samples in the dataset, if possible.
  2. I'll only use rejections that have "wrong target" as the one and only reason for rejection.

Then we'll sample 100 rejections from each wiki and put it into a spreadsheet.

@nettrom_WMF -- thank you, that sounds fine. Let's do it!

The request for a dataset of rejections labelled "Linking to wrong article" (in the system logged as "wrong-target") has been completed. The dataset was gathered from the five wikis requested. In addition to the suggested link target, link text, and probability score, we also captured the title of the article at time of editing as well as currently. We found that there were three cases of data and applied different sampling strategies for those:

  1. Wikis with enough users and data points, in this case more than 100. We randomly sample 1 data point from each user to get a dataset where each user is represented once, then randomly sample 100 from those.
  2. Wikis with not as many users as requested samples, but with enough data points. In that case, we start by randomly sampling 1 data point from each user. We then remove those samples from the dataset, shuffle the users, then iterate over the users and randomly sample 1 data point (without replacement) from them until we've gotten enough samples.
  3. Wikis with neither as many users nor data points as requested. In that case we used the entire dataset.
nettrom_WMF moved this task from Doing to Done on the Product-Analytics (Kanban) board.

I'm reassigning this to @MMiller_WMF and moving it to "Needs Sign-off" on the PA kanban board. The work on my end has been completed and once the ambassadors have completed their evaluations, I think this task can be resolved.

The ambassadors have completed their evaluations, with notes on that task. Thank you!