Page MenuHomePhabricator

Typo regex not working
Closed, ResolvedPublic

Description

Some weeks ago, on it.wiki, I added the following regex to the AWB typo list:
FIND:(k[mg])\.( [a-z])
REPLACE:$1$2
This was intended to remove dots following km and kg, checking for a lowercase letter to avoid false positives. The regex was tested on various regex testers and it worked perfectly. Then, another user tested it with AWB and no dot correction was made, as you can see: https://it.wikipedia.org/w/index.php?title=Utente:Pracchia-78/AWB/Collaudi&diff=prev&oldid=90249441
So I tried to debug the regex and determined that this wasn't due to a conflict with another typo regex. I also tested it on my own and it worked as expected: https://it.wikipedia.org/w/index.php?title=Utente:Daimona_Eaytoy/Sandbox&diff=90681324&oldid=90681227
I finally figured out that if the regex matches a wikilink like [[A 30 milioni di km. dalla Terra]] in the page, it won't make any dot correction, regardless of the position of the link itself, while other typos are still corrected as expected. But it isn't over: if the above wikilink is reduced to something like [[3 km. d]], no typo is corrected at all.
So, summing up:

  • Match "in the middle" of a wikilink ==> Dot correction: NO, Other typos: YES
  • Match "in the end" of a wikilink ==> Dot correction: NO, Other typos: NO
  • No match within wikilinks ==> Everything is fine

Is this an AWB bug or something else?
Many thanks

Event Timeline

Daimona created this task.Sep 19 2017, 9:39 AM

AFAIK, AWB won't typo fix in links at all...

That makes sense, especially in this particular case: that wikilink was the only example of false positive that should never be corrected. However, even avoiding wikilinks, other corrections should be made as usual without breaking the whole typo correction.

As documented at https://en.wikipedia.org/wiki/Wikipedia:AutoWikiBrowser/Typos#AutoWikiBrowser_.28AWB.29 it is by design that "If a typo rule matches a wikilink target, this rule will be ignored on the whole page". This was done as a way of avoiding false positives. So your first and third bullet are correct, second one (Match "in the end" of a wikilink) is not correct (or at least should not be).

@Rjwilmsi Thanks, I will import those specifications on it.wiki since I couldn't find them. The fact that the whole correction is ignored to avoid FP makes sense too, so the trouble is the second bullet. I couldn't find any previous diff to show it, so I made a new one.

  1. This is what happens without wikilink match: https://it.wikipedia.org/w/index.php?title=Utente:Daimona_Eaytoy/Sandbox&diff=91471606&oldid=91471597
  2. This is with an "end match": https://it.wikipedia.org/w/index.php?title=Utente:Daimona_Eaytoy/Sandbox&diff=91471590&oldid=91471570
  3. This is with a "middle match": https://it.wikipedia.org/w/index.php?title=Utente:Daimona_Eaytoy/Sandbox&diff=91471671&oldid=91471663

The first and the third edits differ slightly in the way you said: in the third one the dot correction is avoided while other rules are still applied. However, if you check the second edit, there's another missing correction: the non-breaking space isn't added after "47", even though some corrections are still done. I suppose that this is what I meant at the time, but now I see that the "nbsp rule" is also matched within the wikilink, so it's ignored as well. Obviously in the "middle match" it doesn't match because of other words in the middle. So if I'm right there's nothing wrong with "end match" and it works as expected. And the funny thing is, it took me about 20 minutes of writing and re-writing this message to figure it out.
Many thanks, and sorry for bothering.

Rjwilmsi closed this task as Resolved.Sep 22 2017, 1:25 PM
Rjwilmsi claimed this task.

OK, good that it's understood and resolved.