Page MenuHomePhabricator

database side truncated comments not shown in history or diff
Closed, ResolvedPublic

Description

The diff https://ru.wikipedia.org/w/index.php?title=Википедия:Именование_статей/Иноязычные_названия&diff=prev&oldid=65843747 has a comment which was truncated on database side to be 255 bytes but it is not shown in the gui.

With T85700 an application side truncation was added for new revisions, but there need a way to handle the existing bad truncated comments.

The problem is that htmspecialchars does not allow half bytes to be processed and is returning null, which than is shown as empty string.

The api returns a placeholder utf character \ufffd
https://ru.wikipedia.org/w/api.php?action=query&prop=revisions&revids=65843747 instead of nothing.

Event Timeline

Umherirrender raised the priority of this task from to Needs Triage.
Umherirrender updated the task description. (Show Details)
TTO added a subscriber: TTO.

You can reproduce this on your local MW installation by running the following SQL query:

UPDATE `revision` SET `rev_comment`=FROM_BASE64("0JfQsNGJ0LjRidC10L3QsCDQktC40LrQuNC/0LXQtNC40Y860JjQvNC10L3QvtCy0LDQvdC40LUg0YHRgtCw0YLQtdC5L9CY0L3QvtGP0LfRi9GH0L3Ri9C1INC90LDQt9Cy0LDQvdC40Y86INC/0L7QstGC0L7RgNGP0Y7RidC40LXRgdGPINC90LXQutC+0L3RgdC10L3RgdGD0YHQvdGL0LUg0L/RgNCw0LLQutC4IChb0KDQtdC00LDQutGC0LjRgNC+0LLQsNC90LjQtT3RgtC+0LvRjNC60L4g0LDQstGC0L7Qv9C+0LTRgtCy0LXRgNC20LTRkdC90L3R") WHERE `rev_id`=XXX

Change 329814 had a related patch set uploaded (by TTO):
Replace invalid UTF-8 sequences with U FFFD in edit summaries

https://gerrit.wikimedia.org/r/329814

Change 329814 merged by jenkins-bot:
Replace invalid UTF-8 sequences with U FFFD in edit summaries

https://gerrit.wikimedia.org/r/329814