Page MenuHomePhabricator

[[AB|(AB]]) replaced as ([[A]])) instead of expected ([[AB]])
Closed, ResolvedPublic

Description

Hello,

thanks a lot for great job!

AWB replaces [[AB|(AB]]) as ([[A]])) instead of expected ([[AB]])

Event Timeline

Are you using a custom find and replace regex?

I am using some elementary find & Replace like Find = ttt Replace with = tt

Ok, so this probably isn't a bug in AWB, but a problem with your regex.

Unless you tell us what you're doing, we can't help.

Please read https://www.mediawiki.org/wiki/How_to_report_a_bug

I am sorry for wasting your (and my) time.
I can't reproduce this bug.

A.sav reopened this task as Open.EditedSep 10 2022, 1:50 PM

How to reproduce.

  1. Start vanilla AutoWikiBrowser
  2. Connect to uk.wikipedia.org
  3. Source: Wiki search (text)
  4. Wiki search set to: insource:/Йоханннес/
  5. Options: check Find and replace
  6. Click Normal settings
  7. Add one rule: Find = ннн, Replace with = нн
  8. Type Ok
  9. Type Make list
  10. In your list will be one arcticle: Галатея (Пігмаліон)
  11. Click Start in Start
  12. Look at Line 72:

AWB replaces [[Ермітаж|(Ермітаж]]) with ([[Ерміта]]))

Expected behaviour: ([[Ермітаж]])

One more case and even more interesting case:

  1. Set Wiki search to Саїід

. . .

  1. Add one rule: Find = їі, Replace with = ії

. . .

  1. Your list is Месія (серіал)

AWB proposition is from [[Нью-Мексико|(Нью-Мексико]], to ([[Нью-Мексик]]),

@Reedy, what kind of additional info do you need to confirm this issue?

So your reproduction can be simplified a lot...

How to reproduce.

  1. Start vanilla AutoWikiBrowser
  2. Connect to uk.wikipedia.org
  3. Add Галатея (Пігмаліон) to the list of articles to be processed
  4. On the skip tab, turn off "Only genfixes"
  5. Click Start in Start

Before:
[[Файл:Pygmalion and Galatea (Boucher).jpeg|ліворуч|міні|[[Франсуа Буше|Ф. Буше]] "Пігмаліон і Галатея « [[Ермітаж|(Ермітаж]])]]

After:
[[Файл:Pygmalion and Galatea (Boucher).jpeg|ліворуч|міні|[[Франсуа Буше|Ф. Буше]] "Пігмаліон і Галатея « ([[Ерміта]]))]]

We can create a simplified recreation criteria (rather than the whole page) using that text... But it's obviously not clear at this point if it's because it's embedded in an Image link or not...

https://uk.wikipedia.org/w/index.php?title=%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D1%82%D1%83%D0%B2%D0%B0%D1%87%3AReedy%2FSandbox&type=revision&diff=37167989&oldid=37167986

We can simplify, and see that it isn't; https://uk.wikipedia.org/w/index.php?title=%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D1%82%D1%83%D0%B2%D0%B0%D1%87%3AReedy%2FSandbox&type=revision&diff=37168023&oldid=37168000

Same for the other example...

https://uk.wikipedia.org/w/index.php?title=%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D1%82%D1%83%D0%B2%D0%B0%D1%87%3AReedy%2FSandbox&type=revision&diff=37168114&oldid=37168106

beforeafter
[[Ермітаж|(Ермітаж]])([[Ерміта]])
[[Нью-Мексико|(Нью-Мексико]]([[Нью-Мексик]])

I guess some of the link simplification code in the General Fixes is getting rather confused.

It's cutting off the last letter too, which is odd. Some weird off by one error?

It's Parsers.SimplifyLinks at fault...

// [[dog|(dog)]] --> ([[dog]])
if(lb.StartsWith("(") && ("(" + Tools.TurnFirstToUpperNoProjectCheck(lb.Trim("()".ToCharArray())) + ")").Equals("(" + Tools.TurnFirstToUpperNoProjectCheck(la) + ")"))
    articleText = articleText.Replace(pipedlink, "([[" + b.Substring(1, b.Length-2) + "]])");

https://github.com/reedy/AutoWikiBrowser/blame/4cdcd796de960d58f7d86a1c63c66ae4fa513572/WikiFunctions/Parse/WikiLinks.cs#L336-L338

It basically assumes there's the trailing ) in the link...

Ok, so https://sourceforge.net/p/autowikibrowser/code/12540/ improves both situations; it doesn't eat the trailing characters.

originalAWB pre updatecurrentlyfixed?
[[Ермітаж|(Ермітаж]])([[Ерміта]])([[Ермітаж]]))partial, not completely
[[Нью-Мексико|(Нью-Мексико]]([[Нью-Мексик]])([[Нью-Мексико]])

So we now still result in a double )) on the first example