Page MenuHomePhabricator

Some autocomplete suggestions (some shown after typing just a single letter) in the search field are offensive
Open, HighPublic

Description

Wiktionary provides offensive words as autocomplete suggestions in the main search box (and possibly elsewhere). It has been observed that the N-word is sometimes a suggestion when the letter "n" alone is typed. Similarly, "f" and "c" and probably many other entries return offensive suggestions. This is, of course, offensive. While dictionaries must continue to include offensive words, largely to document their inherent offensiveness, there is no reason to suggest them. People who want to look them up can type the full words.

The N-word's page includes "offensive" 25 times, "vulgar" 11 times, and "slur" 3 times. It is absolutely clear that it is offensive and racist. There is no excuse to suggest it.

I know that some people might argue that these words are real words, so they should be suggested. No, that is an argument for why offensive words must remain in the dictionary, not for why they should be suggested. I can also imagine that people will argue that there are multiple pages for phrases that begin with the N-word and that, without suggestions, they will not be found. This is a specious argument. Wiktionary does not have a mission of promotion — its purpose is documentation. If someone wants the definition of a racist phrase that is in Wiktionary, they can find it. Any important offensive phrases could also be linked from other pages. Not only is there no need to promote such phrases to people who are not looking for them, it is offensive to do so. It implies a normalization of offensive words that is not true.

I imagine other people might argue that it's just the way the suggestion algorithm works. That argument is never valid. Algorithms can and should be changed.

Event Timeline

I am reopening this because @DannyS712 has misunderstood it. I am very aware of Wikimedia rules about censorship and I am not suggesting that Wiktionary be censored. Of course, offensive words need to be here in order to document their offensiveness. But, if you take the argument that "Wikimedia projects are not censored" too literally, then the page for the N-word should merely say "a Black person" and should not mention that it is offensive, vulgar, or a slur. Those (accurate) descriptions are far closer to censoring than fixing this bug would be. Yet they are needed to properly define and document the word.

Let's actually look at https://en.wikipedia.org/wiki/Wikipedia:Offensive_material#%22Not_censored%22 in detail:

"A cornerstone of Wikipedia policy is that the project is not censored. Wikipedia editors should not remove material solely because it may be offensive, unpleasant, or unsuitable for some readers. However, this does not mean that Wikipedia should include material simply because it is offensive, nor does it mean that offensive content is exempted from regular inclusion guidelines. Material that could be considered vulgar, obscene or offensive should not be included unless it is treated in an encyclopedic manner. Offensive material should be used only if its omission would cause the article to be less informative, relevant, or accurate, and no equally suitable alternative is available."

Including a page which is vulgar, obscene or offensive as an autocomplete suggestion or as a random page is not treating it in an encyclopedic manner and it fails the test "Offensive material should be used only if its omission would cause the article to be less informative, relevant, or accurate, and no equally suitable alternative is available". Also note this comment (about images):

Wikipedia is not censored, but Wikipedia also does not favor offensive images over non-offensive images.

The autocomplete suggestion feature promotes and therefore favors certain pages. It is not a core feature of Wiktionary. It is, in fact, not necessary at all. It is a feature that was created at some point to (hopefully) make Wiktionary more useful. It should not be suggesting — promoting and favoring — offensive words. There is no reason for it and no excuse for it. If autocomplete cannot be changed to make it not show offensive suggestions, then the feature itself should be removed.

I would further argue that T14596: Random pages may be offensive, T186179: [Bug] NSFW article appearing in Randomizer on Explore feed, T2682: Delay/wait for confirmation of likely porn images using algorithm detection and probably others have been closed erroneously. Including such words and not censoring them in the body of Wiktionary does not imply that an unnecessary feature (showing random pages) MUST show such offensive pages. Take the similar feature that Wikipedia has of featuring articles on the home page. Has the detailed Wikipedia on the N-word ever been a featured article? Of course not. It is not favored over other articles which are not offensive, yet the argument made here would require that it be featured at some point.

Hi @RoyLeban, thanks for taking the time to report this and welcome to Wikimedia Phabricator!

Closing as for reasons already covered by DannyS712. Furthermore, please first discuss this in the community of the Wiktionary that you are refering to.
For the English Wikipedia there would be something like https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_censored

For clarification: I'm declining this because the ticket says "Offensive suggestions should be removed".
That is out of scope. Making pages indiscoverable is the very opposite of a search function.

The autocomplete suggestion feature promotes and therefore favors certain pages.

Please provide specific examples with clear and complete steps to reproduce, step by step, so someone else could manage to reproduce. Please follow https://www.mediawiki.org/wiki/How_to_report_a_bug - thanks a lot!

@Aklapper You are pointing to an article that I have already read (many years before I submitted this bug report) and that you and @DannyS712 are misinterpreting. You're also both misreading what I wrote, which is disappointing and a bit sad. I am not suggesting censorship. I am stating that autocomplete returning racist and offensive results is actually in violation of Wikimedia policies (as I cited), and that it should be fixed.

This is not an English issue. It is likely that the same problem exists in all languages.

I don't want to get into an Open/Close war, and I'll admit I am not familiar with bug reporting here. Could you point me to the appeal procedure? Thanks.

Please provide clear steps to reproduce an issue. See https://www.mediawiki.org/wiki/How_to_report_a_bug . Thanks!

RoyLeban renamed this task from Offensive suggestions should be removed to Offensive suggestions should be removed from autocomplete.Sep 25 2020, 7:58 AM

@Aklapper OK, let me also be clearer. First, I renamed the task to make it clearer I'm not suggesting that offensive words not be included in the corpus. Second ...

REPRO STEPS that work right now for me:
Open www.wiktionary.com
Type "f" into search box
The F-word appears as an autocomplete suggestion

REPRO STEPS that don't work for me now but did earlier:
Open www.wiktionary.com
Type "n" into search box
The N-word appears as an autocomplete suggestion

If you want to look up the N-word, the F-word, or the C-word, you should be able to do so. If you type those words into the search box, it should work. And when you get to those pages, they will properly tell you that those words are offensive, vulgar, slurs, etc.

But if you type just an "n", "f", or "c" those words should not be offered as autocomplete suggestions. It is offensive and, in the case of the first word, racist. All three of those are real examples. It appears that these suggestions change with usage. Right now, "n" and "c" don't show the N-word and C-word, respectively, but they did earlier. "f" does show the F-word as the third autocomplete suggestion. The same should be true if you type a letter or two.

The autocomplete suggestion feature promotes and therefore favors certain pages.

This is precisely true. Wikimedia policies state that offensive content should not be favored (which includes promotion) when other alternatives are available, and that is certainly the case here, with many, many inoffensive and more likely alternatives for autocomplete.

Let's also talk about "random page" for a moment. The random page feature, which is a completely optional and auxiliary part of Wiktionary (and Wikipedia) also favors and promotes pages that it shows. It should not favor offensive content, for the same reasons as above.

Aklapper renamed this task from Offensive suggestions should be removed from autocomplete to Some suggestions after typing a single letter in the search autocompletion could be seen as offensive.Sep 25 2020, 8:17 AM
Aklapper edited projects, added Discovery-Search; removed All-and-every-Wiktionary.

Thanks, this is helpful. You seem to be using the portal at https://www.wiktionary.org/ and the dropdown is set to "English"?

The call in the background for n is https://en.wiktionary.org/w/api.php?action=opensearch&limit=10&format=json&callback=portalOpensearchCallback&search=n

https://www.mediawiki.org/wiki/Help:CirrusSearch provides some background info on page weighting and ranking.

Wikimedia policies state that offensive content should not be favored (which includes promotion) when other alternatives are available

Please provide links for statements, to make it more likely that people are talking about the same thing - thanks!

@Aklapper Thanks for the rename. Definitely better, but not quite right though.

  • The single letter examples just show how bad it is. Should the offensive words be demoted to not appear until after more letters are typed, that doesn't make them non-offensive.
  • "could be seen as" isn't right. Wiktionary itself says these words are offensive and the three particular words I mentioned have no non-offensive alternate meanings.

Thanks for the CirrusSearch link. The prefer-recent option is probably the reason reproducibility changes, but boosting of other pages might make a difference as well

Please provide links for statements

This was above so I didn't repeat. From https://en.wikipedia.org/wiki/Wikipedia:Offensive_material#%22Not_censored%22 in detail:

"Offensive material should be used only if its omission would cause the article to be less informative, relevant, or accurate, and no equally suitable alternative is available."

What I'm quoting refers to an article, but the principle is clear. In the case of autosuggest, it is definite that 9 of 10 will not be what the person is looking for. In many instances, 10 of 10 will not be what they're looking for. So any other word that matches the initial letters typed in which is not offensive is an equally suitable alternative.

Thanks for the rename. Definitely better, but not quite right though.

Then I'm not sure what's wanted, I'm afraid... Completely delisting words that can be seen as offensive is clearly out of scope. Maintaining a manual list of "bad words" not to display is likely also out of scope. Tweaking the search algorithm and how it gives weight might be possible, but not sure how feasible that is.

CBogen triaged this task as High priority.

Reopening because the search team does consider this problem in scope, although the exact solution/approach is yet to be defined. We will likely wait to tackle the problem until a new Search PM, yet to be hired, is onboarded.

As a short-term fix, we did tweak the algorithm so that the the N-word should no longer appear in the autosuggestions when typing the first few letters.

RoyLeban renamed this task from Some suggestions after typing a single letter in the search autocompletion could be seen as offensive to Some autocomplete suggestions (some shown after typing just a single letter) in the search field are offensive.Sep 26 2020, 4:49 AM

@CBogen Thank you. I had a feeling this would be acknowledged as an issue that should be fixed once the right people saw it. @Aklapper I have renamed this again. The key word is "are offensive" as defined by Wiktionary itself.

In terms of implementation, I have thought about this a little. How do we know when a word should not appear? Many (most?) offensive words are described as such within their pages, but this would require searching those pages, which is not practical, even if done in advance. I think a better approach is to use already existing categories. In English, we have these categories:

https://en.wiktionary.org/wiki/Category:English_offensive_terms
https://en.wiktionary.org/wiki/Category:English_ethnic_slurs
https://en.wiktionary.org/wiki/Category:English_vulgarities

I suggest that any word which appears in one of these categories does not get shown as an autocomplete suggestion but that doing a search on that term does work (yes, it's not censored).

The principle I am proposing (which I also suggest should be clarified in the policy) is that (a) no offensive information should be censored [already true], (b) offensive information is indicated as such [already true for the most part], (c) offensive information can be found when somebody is looking for it [already true], and (d) offensive information is never promoted or favored, and cannot be stumbled upon when somebody is looking for it [not true today].

It might be acceptable to provide an autocomplete suggestion for an entry typed in its entirety in the search box. I'm not sure this is necessary. If this is done, N----R would suggest the N-word among other entries but it would not suggest N----RF----T or N----RB----H or any of the other words and phrases that begin with the N-word. Also, I think it would be reasonable to handle spaces (and dashes, etc.) so that somebody types (e.g.,) the full phrase "N----R B----H" (with a space), then the entry without a space would be shown.

Entries excluded from autocomplete should also be excluded from random pages, word of the day, etc.

Side note: Of course, this is the way the unabridged Wiktionary should work. Should there be an abridged version or a kids' version, then I would suggest that obscenities, vulgar terms, etc., should be part of the abridgment. Unabridged dictionaries and encyclopedias should document even the abhorrent, but there is no requirement that they presented in abridged versions.

This does not seem like a technical matter within the purview of technicians, but rather a policy matter. Moreover it seems as if it gets near the core of what the projects represent to the world.

But there might be ways of finessing the question. Most other online dictionaries don't give suggestions after a user types in a single letter. (Consult the dictionaries accessible through https://www.onelook.com/?w=on&ls=a OneLook.com to run your own experiments.) Almost all have suggestions once four letters are typed in. "Nigger", "penis", "prick" show up on the suggestion lists are that point. I don't know whether users are attracted to Wiktionary because it offers suggestions that reflect the interests of other Wiktionary users, because they like to get to their search target more quickly, or because it is forgiving of spelling mistakes made by English language learners. Nor do I know what performance and server load advantages (or disadvantages) come from making suggestions after fewer than four letters. But it seems to me that offering suggestions only after three or four letters are typed would yield the result that fewer users would potentially be offended or be outraged that Wiktionary doesn't censor autocompletion suggestions at the cost of frustrating the word-search efforts of some English-language learners and slowing the word-search efforts of others.

Why should this be limited to English words offensive to (some) English speakers? What about English words offensive to (some) foreign-language ("FL") speakers? What about FL words offensive to (some) FL speakers? What about FL(i) words offensive to FL(j) speakers (j not = i). What about those who find "God" blasphemous, preferring "G-d"?

Responding to comments by @DCDuring:

  1. This is a technical matter because it needs a technical solution. As explained above, the current implementation is in violation of policy by promoting (favoring) offensive content when other equally useful content is available. Note that I am suggesting a slight clarification, not a change, of policy: see above, (d) offensive information is never promoted or favored, and cannot be stumbled upon when somebody is looking for it [not true today]
  2. This is not an English-only problem. I use English as an example because (a) it's where I saw the problem, and (b) English is the language I know.
  3. Four letters (or any number of letters) is arbitrary. It might be appropriate for specific words in specific languages, and not for other words in other languages. There is no need or requirement that Wiktionary promote (favor) any offensive word or any particular word. As stated above, significantly greater than 90% of autocomplete suggestions are wrong. It is not an undue burden to make those people who want to look up offensive words type them in full.
  4. Wiktionary already has lists of words that are offensive, vulgar, slurs, etc. in, I believe, all languages. Leveraging these lists is a technical solution. Maintaining those lists is a policy matter, but they already exist.
  5. I do not have a problem if Wiktionary also does not suggest terms viewed as sacrilegious or blasphemous such as G-d. Again, this is not about censorship. This is about whether or not certain terms should be promoted. The long-standing policy says no.

Should somebody disagree with the policy and believe that Wiktionary should favor offensive words, then that is a policy discussion which should not happen here.

As a fellow en.wikt administrator, I find DCDuring's comments to be made in bad faith. The straw man about "God" is frankly absurd; he knows full well that "nigger" is broadly offensive in unmarked contexts and hurts WMF's credibility in a manner that "God" does not. And to suggest that we are "frustrating the word-search efforts of some English-language learners" by requiring them to type more letters in order to land on the entry for "nigger" is classic concern trolling.

Leban

  1. Virtually all problems may have technical solutions. The issue is whether a broader population of interested parties should be involved in the decision making.
  2. Coming up with a practical resolution to this might involve basic practicality as well as consideration of what users may expect based on their experience on competing dictionary websites.
  3. The Wiktionary lists of offensive or vulgar terms is far from complete, accurate, or authoritative in any language.
  4. It is not a question of whether any one user, such as Leban, has a problem with broadening the coverage of the proposed special treatment of certain terms, but whether it retards users in general in their efforts to gain access to the possibly controversial (and therefore likely important) information they are looking for.

The claim that this proposal is not really censorship because the words are still in Wiktionary ignores the history of the evolution of censorship and other norms in society. Most harmful norms start small.

Metaknowledge:
The question at hand is whether this kind of putting a thumb on the scales of access to information is a good idea, which is a policy matter. The ways in which the proposal would slow access to desired information and the potential breadth of such slowing seem to me to be important considerations, not to be marginalized by name-calling. The algorithm that generates autocompletion suggestions reflects the actual recent searches of our actual users. Is it offensive that such users are looking up the words in question? I am not sure what the limits of application of the proposed policy would be, but I expect that a new Victorianism will lead to rather broad application of any means of suppressing certain autocompletion suggestions.

I realize that Phabricator is not the best place for discussion and debate, but I also feel it's useful to respond to comments by @DCDuring in this context.

  1. A broader population of interested parties have already weighed in to determine the policies on which this bug report is based. While I have suggested a clarification, it is just that, not a change in policy. There are standard ways to suggest changes to policies.
  2. Red herring. Most other dictionaries (web or otherwise) do not have either autocomplete suggestions or random pages, so there is no way to compare. In the implementation of these non-essential features, Wiktionary is favoring offensive words over other equally acceptable words (the reason why or the algorithm behind it does not matter).
  3. Red herring. The fact that the lists may be incomplete is not a reason to do nothing. Lists can be improved over time. We cannot let the perfect be the enemy of the good. WP:NOTPERFECT
  4. It is true that it is not a question of whether one user has a problem, but the rest is false. No user who wishes to look up the definition of an offensive term will be impeded from doing so. Zero. The change simply prevents people from being shown offensive information unexpectedly.

On other points

  • This is not censorship. This false claim is often used as a means of repression: "It's just a word."
  • This is not putting the thumb on any scale of access to any information — it is changing the system so that offensive information is not favored and promoted.
  • The statement that this change to bring the non-essential features of autocomplete and random promotion in line with policy will slow down access to information. As noted, more than 90% of autocomplete suggestions are incorrect. Random suggestions, by definition and by design, do not provide anything that somebody knows they are looking for.
  • Anyone's opinion on whether looking up certain words is objectionable or not is irrelevant. Nobody is prevented from looking up any entry.

If the criticisms are taken at face value, it is impossible to implement either autocomplete or random page in a way that conforms with policy. If that is indeed the case (and I believe it is not), then there would be no alternative other than removing both features.