Page MenuHomePhabricator

HTML-formatted ruby searches do not consider furigana word annotations in a user-friendly way
Open, MediumPublic

Description

Background info: Furigana; Ruby character#HTML markup.

Words annotated with HTML ruby are not searchable in a user-friendly way when HTML formatting is stripped, as the reading becomes inline by default. For example, annotated

Tokyo

becomes inline 東(To)京(kyo), so neither searching for 東京 nor Tokyo would work.

( example of searching for HTML-formatted ruby as proof that this is a problem, another example )

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Suzukaze-c renamed this task from HTML-formatted ruby searches do not consider furigana to HTML-formatted ruby searches do not consider furigana word annotations in a user-friendly way.Nov 6 2016, 12:33 AM
Suzukaze-c updated the task description. (Show Details)

So, this looks like what happens is you have the html structure:

some
<ruby>
   <span>stuff</span>
    <rp>(</rp><rt>superscript</rt><rp>)</rp>
</ruby>
using ruby

To create the content for the page, we strip out all the html tags and leave the remaining text, this means that converts into

some stuff(superscript) using ruby

instead of the expected

some stuff using ruby

This example isn't really perfect, because the superscript could be over a single symbol which isn't an entire word, meaning the word itself gets split up in the search.

I don't have any great suggestions for a solution, the simple method would be to rip out everything in the <rt> and <rp> tags, but that would prevent searching for that of course.

Deskana moved this task from needs triage to search-icebox on the Discovery-Search board.