Page MenuHomePhabricator

[Migrated] Fixing ambiguous typos
Open, LowPublic

Description

There are quite a lot of typos that have had to be rejected for the RETF page because either the correction isn't unambiguous (e.g. 'distict' could be a typo for 'district' or 'distinct', or because it's valid in one context, but not in another e.g. 'Valparaiso' is correct when referring to https://en.wikipedia.org/wiki/Valparaiso, Florida, but should be corrected to https://en.wikipedia.org/wiki/Valparaíso when referring to the city in Chile.
I'd like suggest an enhancement to AWB to help with situations like those. There would be a new 'Ambiguous Typos' list, much like the current 'Typos' list, with entries along the lines of

<AmbigTypo find="\b([Dd])istict\b" replaceOptions="$1istrict,$1istinct">

AWB would read this list and, on finding the RegEx value in an article, would present a panel much like the current link disambiguation panel, for the AWB user to select from the listed replace options. @Colonies_Chris 08:22, 19 September 2007 (UTC)

Event Timeline

Reguyla raised the priority of this task from to Needs Triage.
Reguyla updated the task description. (Show Details)
Reguyla added a project: AutoWikiBrowser.
Reguyla moved this task to Interface on the AutoWikiBrowser board.
Reguyla added subscribers: Reguyla, Aklapper.

@Jogers 09:10, 19 September 2007 (UTC) wrote:

Sounds like an interesting idea.

@Rjwilmsi 11:15, 30 May 2008 (UTC) wrote:

This would be a useful feature, provided that users had an option to 'ignore ambiguous typos' i.e. AWB would not change a word matching an ambiguous typo and would not prompt the user for the correct correction. Otherwise I could envisage users being regularly pestered by message boxes ;)

@Rich_Farmbrough 14:39, 2 June 2010 (UTC) wrote:

Certainly would, but how about we smarten the regexer as well? "Distict of" is almost certainly "District of" similarly "Business distict" and "congressional distict". I will put some data on https://en.wikipedia.org/wiki/Wikipedia:AutoWikiBrowser/Typos/distict.

@Rich_Farmbrough 14:39, 2 June 2010 (UTC) wrote:

P.S. if someone will buy me Google's n-grams I will produce the rules based on them.

@Rich_Farmbrough 18:18, 2 June 2010 (UTC) wrote:

Yes they are the ones.

Aklapper triaged this task as Low priority.Feb 10 2023, 12:06 PM