Page MenuHomePhabricator

Allow adding other attributes to <Typo ...>
Open, Needs TriagePublic


The code for reading the RegExTypoFix seems to depend on only known attributes in the XML definition of a regexp. Would it be possible to modify the regexp so that other attributes can be added, even if they're not used by AWB?

It would allow defining other attributes that other tools could use: for example, an identifier for the regexp, a flag indicating if the fix can be applied automatically... It would be ufesul for me to implement T258889 for disabling specific regexp in WPCleaner, or other features I've in mind.

private static readonly Regex TypoRegex = new Regex("<(?:Typo)?\\s+(?:word=\"(.*?)\"\\s+)?find=\"(.*?)\"\\s+replace=\"(.*?)\"\\s*/?>", RegexOptions.Compiled);

First proposition

Allow any extra attributes after the 3 existing attributes (word, find, replace). Just replace the existing regular expression by:

private static readonly Regex TypoRegex = new Regex("<(?:Typo)?\\s+(?:word=\"(.*?)\"\\s+)?find=\"(.*?)\"\\s+replace=\"(.*?)\"(?:\\s+[a-z]+=\".*?\")*\\s*/?>", RegexOptions.Compiled);

I just added a non capturing group repeated 0 to n times (?:\\s+[a-z]+=\".*?\")* for extra attributes.

Second proposition

Allow any attributes in any order. The regular expression can be simplified:

private static readonly Regex TypoRegex = new Regex("<(?:Typo)?(?:\\s+([a-b]+)=\"(.*?)\")+\\s*/?>", RegexOptions.Compiled);

The regular expression will capture a series of attribute name and attribute value, so the code will need to be modified to extract the values for word, find and replace, and check that find and replace do exist. A suggested patch for the loop on the Matches (not tested or compiled):

foreach (Match m in TypoRegex.Matches(text))
    string find = null;
    string replace = null;
    for (int i = 1; i < m.Groups.Length - 1; i += 2)
        if (string.Equals("find", m.Groups[i].Value))
            find = m.Groups[i + 1].Value;
        if (string.Equals("replace", m.Groups[i].Value))
            replace = m.Groups[i + 1].Value;
        if (!string.isNullOrEmpty(find) && !string.isNullOrEmpty(replace))
            typoStrings.Add(find, replace);
    catch (ArgumentException)
        RegExTypoFix.TypoError("Duplicate typo rule '" + find + "' found.");
        return new Dictionary<string, string>();

Event Timeline

NicoV created this task.Jul 26 2020, 5:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 26 2020, 5:27 PM
NicoV updated the task description. (Show Details)Dec 5 2020, 9:04 AM
NicoV updated the task description. (Show Details)Dec 7 2020, 10:34 AM
NicoV added a subscriber: Rjwilmsi.Dec 7 2020, 10:37 AM

Hello @Rjwilmsi (or other AWB developer?): what is the process to submit patches?

NicoV updated the task description. (Show Details)Dec 7 2020, 10:40 AM

@NicoV the info here is enough to understand the change wanted, however it needs to be coordinated with an AWB release as we can't make changes to typo list pages until a new AWB release with this change has been rolled out.

NicoV added a comment.Dec 16 2020, 6:41 AM

Thanks @Rjwilmsi

Yes, I understand that typo list pages can only be changed once a new AWB release has been rolled out (and probably after some time to be sure users have upgraded). That's why I'm pushing to include this patch, so that it won't be too long before modifications can be made in the typo list pages. At least, once I know this patch will be included, I can also start modifying WPCleaner to take advantage of this, even before we can add the new attributes to the typo list.