Page MenuHomePhabricator

[8 hours] Investigate ways to handle text breaking in SVGs
Closed, ResolvedPublic

Description

Problem:

SVG 1.1 does not support automatic line breaking. This means multiline labels need to be broken up into text or tspan elements. The pre-existing SVG Translate tool seems to consider such elements as distinct elements to label:

image.png (849×822 px, 115 KB)
Note that "Collective Municipality" are two different elements. This is a problem because:

  • Translating two words separately and putting them together does not translate the phrase accurately
  • What may be two words in English may be one word in another language, making it hard to know if the user has finished making all the translations if they leave some blank.

Proposed solutions:

  1. Give user separate labels for every broken chunk but visually cluster them so the user can see that they are a translation unit together.
    • Mock from the design prototype:
      image.png (475×493 px, 19 KB)
    • This still causes the user having to themselves figure out where to put what but with the help of the preview function, it still is a lot easier than before.
  2. Give the user a multiline input with the text labels broken up, as in the image
    • This option gives the user control over how the translation splits back up
    • Maybe allow them to only have as many chunks as the original label does (as in, don't allow input beyond three lines if the original label is broken into three lines)
    • Rough design made by mangling Prateek's HTML prototype:
      image.png (107×474 px, 8 KB)

Investigation tasks:

  • Compare the technical aspects of the proposed solutions above - what's doable, what's not, how crazy do SVGs get etc.
  • What are the possible ways to handle the labels when they are broken up into text labels instead of tspan elements?

Event Timeline

Niharika triaged this task as Medium priority.Aug 24 2018, 9:03 PM
Niharika created this task.

Another option that some svg editors use is to use namespaces to embed an html div tag which supports word wrapping. There are probably significant downsides to doing this, but if we are listing options it probably bears mentioning

That would not be easily possible. There must be something like a "max-width" aka content area https://www.w3.org/TR/SVG2/text.html#TermContentArea value for the relevant text areas. I mean the same considerations were suggested with Flow-text, which comes not in the final SVG (1.1) specification.
If we would support SVG 2 it would indeed be easy, as it uses the "Auto-wrapped text" https://www.w3.org/TR/SVG2/text.html#TextLayoutAuto.

Niharika renamed this task from Investigate ways to handle text breaking in SVGs to [8 hours] Investigate ways to handle text breaking in SVGs.Aug 28 2018, 9:53 PM
Niharika added a project: Spike.

That would not be easily possible. There must be something like a "max-width" aka content area https://www.w3.org/TR/SVG2/text.html#TermContentArea value for the relevant text areas. I mean the same considerations were suggested with Flow-text, which comes not in the final SVG (1.1) specification.
If we would support SVG 2 it would indeed be easy, as it uses the "Auto-wrapped text" https://www.w3.org/TR/SVG2/text.html#TextLayoutAuto.

Yeah, this is going to be hard. This task is to figure out the simplest solution (using SVG 1.1). I understand that SVG 2.0 comes with some breaking changes that will not play nicely with many image editors.

Ignoring subscripts, font shifts, and other exotica:

  1. Don't do it
  1. One-for-one text as SVGTranslate.
  1. Merge/Split where users breaks lines. Put the TU source and target into textboxes with newlines where a tspan starts a new line. The translator puts newlines where he wants line breaks. The tool converts the newlines back to tspans. Translation database is never told about line breaks; linebreaks only stored in SVG.
  1. Tool breaks lines at trivial points. Learn inline-size. Put the source/target in a one-line text element, use the SVGDOM to find the text positions of whitespace chars (+ ­), and build the result into an SVG 1.1/2.0 compatible multi-line text element. Text.getNumberOfChars(), .getStartPositionOfChar(), .getEndPositionOfChar(). Kerning, letter-spacing, and word spacing can be free. Small issue with text-anchor and whitespace.
  1. Put linear text into HTML p element, set font, set xml:lang, set dir, set left/mid/right, set CSS width, and copy HTML's result to SVG text element.

Generally, the tool should only expect 1 to 3 lines; figures should not have complicated text. There's a vertical aligment issue akin to text-anchor.

A typical translated label:

      <switch>
        <text systemLanguage="ru">
			<tspan>Нервный</tspan>
			<tspan x="-8" y="20">гребень</tspan>
		</text>
        <text systemLanguage="de">Neuralleiste</text>
        <text systemLanguage="hr">
			<tspan>neuralni</tspan>
			<tspan x="-6" y="20">greben</tspan>
		</text>
        <text>Neural crest</text>
      </switch>

With offsets for separate lines defined basically by hand, I don't see how we can implement this short of making a mini visual SVG editor.

The cited example is from File:Neural_crest.svg

https://commons.wikimedia.org/w/index.php?lang=hr&title=File%3ANeural_crest.svg

There's a bit more going on with that translation unit. Here's more context:

    <g transform="translate(106,446)" style="font-weight:bold;text-align:end;text-anchor:end">
      <switch>
        <text systemLanguage="ru">
			<tspan>Нервный</tspan>
			<tspan x="-8" y="20">гребень</tspan>
		</text>
        <text systemLanguage="de">Neuralleiste</text>
        <text systemLanguage="hr">
			<tspan>neuralni</tspan>
			<tspan x="-6" y="20">greben</tspan>
		</text>
        <text>Neural crest</text>
      </switch>
    </g>

The artist is correctly trying to right align (text-anchor:end) the translations, but the artist has been confused by xml:space rules and SVG's text block processing. The desired result without tweaking x is obtained by

    <g transform="translate(106,446)" style="font-weight:bold;text-align:end;text-anchor:end">
      <switch>
        <text systemLanguage="ru">
			<tspan>Нервный</tspan><tspan x="0" y="20"> гребень</tspan>
		</text>
        <text systemLanguage="de">Neuralleiste</text>
        <text systemLanguage="hr">
			<tspan>neuralni</tspan><tspan x="0" y="20"> greben</tspan>
		</text>
        <text>Neural crest</text>
      </switch>
    </g>

The 'x' attribute is needed to reset the x-coordinate. Starting a new XML line (putting whitespace #text) between the two tspans puts a space at the end of the first tspan, and the fudging of the x attribute shifts the second line left to account for that extra trailing space (fudging makes the two lines end at about the same x-coordinate). You can see that on the file page because Russian and Croatian have more space between the end of their first line and the start of the leader than the other languages. Instead, the XML should follow xml:space rules and not have any whitespace between the two tspans. The space between the two words has been added inside the second tspan to make text selection give the correct result ("Нервный гребень" rather than "Нервныйгребень").

Even if xml:space rules are ignored, the end result is only off by one space (indented lines are much worse under xml:space="preserve" rules).

My analysis indicates that there's no particularly good way to do wrapping in SVG 1.1. The standard itself only suggests embedding XHTML. This kinda works (you need to remove <body> from their example, otherwise MW's upload filter rejects it), however it raises questions how well it's supported and whether this will open door for new exciting security holes. The punchline is that you still need to specify the size for <foreignObject> somehow for wrapping to happen. So unless the creator has created some particularly weird XML for labels by hand, it would be hard to achieve wrapping. I think we should just document this as a known problem and move forward without it.

My analysis indicates that there's no particularly good way to do wrapping in SVG 1.1. The standard itself only suggests embedding XHTML. This kinda works (you need to remove <body> from their example, otherwise MW's upload filter rejects it), however it raises questions how well it's supported and whether this will open door for new exciting security holes. The punchline is that you still need to specify the size for <foreignObject> somehow for wrapping to happen. So unless the creator has created some particularly weird XML for labels by hand, it would be hard to achieve wrapping. I think we should just document this as a known problem and move forward without it.

So you're saying that out of the three (well, two) proposed solutions in the task description, neither is doable?

You're concentrating on UI of this while I'm explaining that trying to wrap text in SVG is problematic in principle.

You're concentrating on UI of this while I'm explaining that trying to wrap text in SVG is problematic in principle.

I know it's problematic. I'm trying to understand the extent of it. What are the ways we can make this less of a problem? I'm not looking to fix every single problem case out there. I'm looking for what we can do from a technical+UI perspective to make this easier on the users, if that makes sense.

I think we discussed that we can also use this tool to educate user. Maybe wrapping is another place where we can alert the user of the limitations of the tool and point them to something like Inkscape if they have that particular need?

It would allow us to provide a tool for the simplest of cases and, if it provides value, build on it in the future for more difficult situations.

@MaxSem - Do we know how either of the existing tools would handle the example you provide? Would they just ignore the tspans or what?

@MaxSem - Do we know how either of the existing tools would handle the example you provide? Would they just ignore the tspans or what?

They understand tspans and offer you to translate them, however they neither can create new tspans (e.g. if a label in Russian is longer than an English one and needs two lines instead of one) nor can they position tspans to make text look properly aligned.

It seems like a reasonable compromise that we would do something similar.

That is, we'll let you translate existing tspans but alignment or creation of new tspans is outside the use case for the tool.

Is that acceptable, @Niharika?

I think this investigation has digressed into a different problem than the one I described in the task description.
Let me try to explain that more clearly: The problem is that sometimes a text label is broken up in the UI while it is meant to be a single phrase. This makes it harder to translate it properly. See the example I give in the task description ("Collective Municipality").
What I was hoping this investigation would do is to come up with a suggestion for the best way to mitigate this problem from the user's perspective. I proposed a couple solutions in the description. Are those plausible? Are there better ways to do this that I haven't thought of?

If a user adds a translation that creates alignment issues - that is okay for now. The user can make use of the Preview feature to see if a label is longer than it should be and doesn't wrap and they will fix it.

Thanks for the clarification.

Based on my understanding, I think the first option is easier and maybe indicates to the user more clearly that we aren't going to merge multiple lines. I think it makes it easier to keep track of what text goes into what container in the SVG code.

@Niharika It looks like we settled on a solution to this in T206712. Can we close this one?

Niharika moved this task from Needs Review/Feedback to Q2 2018-19 on the Community-Tech-Sprint board.

@Niharika It looks like we settled on a solution to this in T206712. Can we close this one?

I went with what you suggested in your last comment. We should reopen this ticket if it turns out that that's not as feasible as we thought.