Page MenuHomePhabricator

Erroneous and broken HTML entities (such as "&#39 ;") displayed for certain characters in the ProofreadPage edit box
Closed, DuplicatePublic

Description

In the French Wikisource, several users reported an extremely annoying bug.

When editing any new page from a book which has an OCR layer, a few characters are replaced by an HTML entity which is erroneous:

  • apostrophe: &#39 ;
  • semi-colon: &#59 ;
  • quotation mark: &#34 ;
  • maybe others?

These entities are wrong at several levels:

  • obviously, there should no space before the semi-colon;
  • in French, we use the typesetter's apostrophe (’), not the typewriter apostrophe (');
  • anyway, Unicode characters should be used, not HTML entities.

For an example, see page 100 (or any page that has not been edited yet) in https://fr.wikisource.org/wiki/Livre:Conf%C3%A9rences_in%C3%A9dites_de_l%27Acad%C3%A9mie_royale_de_peinture_et_de_sculpture.djvu

This bug appeared a couple of days ago, maybe on Oct. 14th. It is very annoying since these characters are very common in French.

It occurs even when not being logged, hence it does not depend on personal preferences. Browsers tested: Firefox, Internet Explorer, Chrome.

Thank you.

Event Timeline

Seudo updated the task description. (Show Details)
Seudo updated the task description. (Show Details)
Seudo updated the task description. (Show Details)
Aklapper renamed this task from Erroneous HTML entities appear in the edit box to Erroneous and broken HTML entities (such as "&#39 ;") displayed for certain characters in the ProofreadPage edit box.Oct 16 2020, 9:10 AM
Aklapper added a project: ProofreadPage.

@Xover: Ah, thanks! Feel free to Edit Related Tasks...Close As Duplicate in such cases. :)