Page MenuHomePhabricator

Empty <text> elements created
Closed, ResolvedPublic

Description

After the latest deployment the test wiki front page. Behaves unexpectedly.

As expected bullet point lists are ignored but they are followed by what looks like empty <text> tags. Upon closer investigation these actually contain a single line break.
Expected behaviour would probably be to remove text nodes between utterances which only contain whitespace (or are empty).

Event Timeline

Lokal_Profil renamed this task from Empty `<text>` elements created to Empty <text> elements created.Mar 8 2017, 9:58 AM

Even if <text> elements will likely disappear with the implementation of T153841: Improve utterance storage we would still like to get rid of these empty nodes in the future storage.

Change 345522 had a related patch set uploaded (by Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Don't create empty <text> elements

https://gerrit.wikimedia.org/r/345522

Change 345522 merged by jenkins-bot:
[mediawiki/extensions/Wikispeech@master] Don't create empty <text> elements

https://gerrit.wikimedia.org/r/345522

Mentioned in SAL (#wikimedia-labs) [2017-03-30T12:09:27Z] <Sebastian-WMSE> Deploy latest from Git master: f79170e (T159669)

Lokal_Profil assigned this task to Sebastian_Berlin-WMSE.

This task has been resolved in that empty <text> nodes are removed as are whitespace only <text> nodes between utterances. Note however that the example given in the description is not resolved as these whitespace only text nodes occur inside an utterance. It is expected that the example case will also be resolved with the implementation of T149091: Segment by tags.

Whitespace only nodes inside an utterance cannot be removed otherwise e.g. <i>hello</i> <b>world</b> would end up becoming "helloworld".