Page MenuHomePhabricator

"exceptinside" exceptions in replace.py are ignored when they overlaps
Closed, InvalidPublicBUG REPORT

Description

List of steps to reproduce:

  • I'm running pywikibot version 7.2.0 on Python 3.10.4 and Win11

What happens?:

pwb.py replace.py -regex -page:"Foot-Ball Club Juventus 1910-1911" "\W([nN])è\W" "né" -exceptinside:"{{sic\|[^{}]+}}" -exceptinside:"citazione *=[^\|\n]+" -lang:it

When I launch for example this replacement on the above article, it tries to replace "nè" with "né" within "{{sic|nè}}".
Please note that on this specific article these two exceptions are overlapping. Interestingly if I will remove one of them, the replacement will not occour.

pwb.py replace.py -regex -page:"Foot-Ball Club Juventus 1910-1911" "\W([nN])è\W" "né" -exceptinside:"{{sic\|[^{}]+}}" -lang:it
pwb.py replace.py -regex -page:"Foot-Ball Club Juventus 1910-1911" "\W([nN])è\W" "né" -exceptinside:"citazione *=[^\|\n]+" -lang:it

It happens with exceptinside exceptions both from command line and from user-fixes.py.

The article I'm using as example is https://it.wikipedia.org/wiki/Foot-Ball_Club_Juventus_1910-1911 and it contains:

{{cita news|url=http://www.byterfly.eu/islandora/object/libria:41302/datastream/PDF/content/libria_41302.pdf|titolo=A Torino. Juventus F.C.-Piemonte F.C. fanno match pari, 1-1.|pubblicazione=[[La Stampa Sportiva]]|pp=6-7|data=4 dicembre 1910|accesso=27 novembre 2020|citazione=Il match, annunciato per le ore 15, ebbe una buona mezz'ora di ritardo per la semplice ragione che {{sic|nè}} giuocatori, {{sic|nè}} l'arbitro si diedero premura di essere puntuali. Il pubblico restò così trenta minuti a battere i piedi dal freddo e ad ammirare ogni altra cosa {{sic|fuorchè}} l'annunciata partita.}}

As you can see in this case

  • exceptinside:"citazione *=[^\|\n]+" matches "citazione=Il match, annunciato per le ore 15, ebbe una buona mezz'ora di ritardo per la semplice ragione che {{sic|nè}} giuocatori, {{sic|nè}} l'arbitro si diedero premura di essere puntuali. Il pubblico restò così trenta minuti a battere i piedi dal freddo e ad ammirare ogni altra cosa {{sic|fuorchè}} l'annunciata partita.}}"
  • exceptinside:"{{sic\|[^{}]+}}" matches "{{sic|nè}}"

What should have happened instead?:
The correct operation of each exceptinside exception should be indipendent from the other exceptions. Even if they are overlapping.

The above problem is pretty critical since it is able to nullify exceptions in unpredictable cases causing wrong edits and significant amount of pain.

Event Timeline

Basilicofresco renamed this task from "exceptinside" exceptions with replace.py are ignored when they overlaps to "exceptinside" exceptions in replace.py are ignored when they overlaps.Jun 30 2022, 11:01 AM
Basilicofresco updated the task description. (Show Details)

This is not a overlap problem: -exceptinside:"citazione *=[^\|\n]+" does not work because the regex halts at the first | sign and all text behind it could be replaced anyway. It might be a problem of nested templates if you expand the regex to citazione *=[^\n\}]+. Maybe citazione *=[^{]+({{[^}]+}}[^\|\n\{\}]+)* works as expected.

Holy cow. You are absolutely right.