Search and Replace is replacing an extra character for some words - Sinhala wiki
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• bzimport
	Oct 22 2009, 6:43 AM

Description

Author: wikibugs

Description:
Screen print of the error

Reporting against Babaco Release : r57957

Steps to Reproduce ::
Link : http://prototype.wikimedia.org/si.wikipedia.org/%E0%B6%B8%E0%B7%94%E0%B6%BD%E0%B7%8A_%E0%B6%B4%E0%B7%92%E0%B6%A7%E0%B7%94%E0%B7%80

1)Select a random page
2)Edit a section
3)Select a word and select a replace word
4)Replace
<<Extra character is added>>

Expected Outcome::
There should not be any extra character

Test Environment::
Browser (User-Agent): Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/532.0 (KHTML, like Gecko)Chrome/3.0.195.27 Safari/532.0

Browser (User-Agent): Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)

Browser (User-Agent): Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3

Version: unspecified
Severity: major
Platform: PC

Attached:

Wiki_r57957_2009-10-22_SearchandReplace_Sin.pdf69 KBDownload

Details

Reference: bz21228

Related Objects
Search...

View Standalone Graph

This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Status	Assigned	Task
		· · ·
Resolved	None	T38111 "Babaco" release of the Usability Initiative (tracking)
Resolved	• TrevorParscal	T23228 Search and Replace is replacing an extra character for some words - Sinhala wiki
		· · ·

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:51 PM

• bzimport added a project: MediaWiki-extensions-UsabilityInitiative.

• bzimport set Reference to bz21228.

• bzimport created this task.Oct 22 2009, 6:43 AM

My gut says this is probably due to a bad interaction between regexes and multibyte strings; if that's the case, we can't do much about it.

Basically what I think is happening is that the [^ ] part of the regex is selecting one byte, but the character at that position is really two (or more) bytes long. That one byte will be matched and replaced, but the second (and any subsequent) bytes will stick around and be interpreted as a different character. I'll try to confirm this suspicion later.

The suspicion in comment #1 doesn't seem to be right, so now I think this may have something to do with compound characters. Could you paste all texts from the PDF (textarea contents before, search regex, replace string, textarea contents after) in a bug comment?

The underlying search and replace code is completely different now that we are using an iframe rather than a textarea.

(In reply to comment #3)

The underlying search and replace code is completely different now that we are
using an iframe rather than a textarea.

That doesn't necessarily mean that multibyte character handling is magically fixed. Reopening and asking Calcey to try and reproduce again; please close as FIXED or WORKSFORME if this can't be reproduced any more.

I've tested this with double-byte characters quite a bit now, and am sure it's fixed.

Note that Sinhala seems to be using three-byte characters.

wikibugs wrote:

Verified and closed

Search and Replace is replacing an extra character for some words - Sinhala wikiClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Search and Replace is replacing an extra character for some words - Sinhala wiki
Closed, ResolvedPublic
Actions

Related Objects
Search...