Page MenuHomePhabricator

Investigate how to handle differating punctuation around ref tags
Open, MediumPublic

Description

Untill now AWB changes

punct, ref, comma

to

punct, comma, ref

What should be done for cases like

full-stop, comma, ref

Event Timeline

These cases of conflicting punctuation probably need flagging for manual processing.

Samtar renamed this task from Investigate how to hanlde differating punctuation around ref tags to Investigate how to handle differating punctuation around ref tags.Jul 30 2016, 8:26 AM
Samtar added a subscriber: Samtar.

@Magioladitis handle in what context? Could you update the description to be a little more descriptive of the task/issue? What Tags do you think could be applied to this? Thank you! :)

@Samtar Sladen wants a bot or a tool to find these issues regurarly and create a list of pages with this particular issue for manual processing. They asked to fill the report since till now AWB changes

punct, ref, comma

to

punct, comma, ref

but provides no actual fix of the issue.

Samtar triaged this task as Medium priority.Jul 30 2016, 8:34 AM
Samtar added a project: AutoWikiBrowser.
Samtar updated the task description. (Show Details)

@Magioladitis I see the issue now, thank you :)

Could AWB not do a regex search/replace for

\w+\.,(\{\{)|(<ref)

to find these instances of full-stop, comma, ref?

Ideally AWB/Yobot should check the preceding character before relocating punctuation in the vicinity of <ref/> to before <ref/>, and flag ambiguous cases for manual processing.

What happens:

  1. "word.<ref/>, word"→"word.,<ref/> word"
  2. "word\n<ref/>. word"→"word\n.<ref/>"

What should happen:

  1. "word.<ref/>, word"→ (no action, flagged for manual resolution)
  2. "word\n<ref/>. word"→"word.<ref/>"

The original issue was brought to Yobot's operator's attention on 18 July 2016; the report and examples can be found at: