Page MenuHomePhabricator

SVG Translate: Provide Error Message (no support for tspan without coordinates)
Open, Needs TriagePublic5 Estimated Story PointsBUG REPORT

Description

I dummy-translated a file from a German dialekt (langtag pdc) to High German (langtag de):

Original language:
https://commons.wikimedia.org/w/index.php?lang=pdc&title=File%3ADouble-slit.svg

Updated language:
https://commons.wikimedia.org/w/index.php?lang=de&title=File%3ADouble-slit.svg

In the SVG below, the relevant text elements have systemLanguage="pdc" and systemLanguage="de".

<switch>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-pdc" systemLanguage="pdc"><tspan id="trsvg15">Beobachtungs-</tspan><tspan x="341" y="42" id="trsvg8-pdc">schirm</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-ta" systemLanguage="ta"><tspan x="341" y="42" id="trsvg8-ta">திரை</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-de" systemLanguage="de"><tspan x="341" y="42" id="trsvg8-de">Beobachtungs-</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14"><tspan x="341" y="42" id="trsvg8">screen</tspan></text>
</switch>

expected result:

<switch>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-pdc" systemLanguage="pdc"><tspan id="trsvg15">Beobachtungs-</tspan><tspan x="341" y="42" id="trsvg8-pdc">schirm</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-ta" systemLanguage="ta"><tspan x="341" y="42" id="trsvg8-ta">திரை</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-de" systemLanguage="de"><tspan id="trsvg15-de">Beobachtungs-</tspan><tspan x="341" y="42" id="trsvg8-de">schirm</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14"><tspan x="341" y="42" id="trsvg8">screen</tspan></text>
</switch>

Screenshot of SVG Translate:
https://tools.wmflabs.org/svgtranslate/File:Double-slit.svg
(compare it with the original file)

Screenshot from 2019-08-12 17-17-08.png (1×1 px, 95 KB)

The word "schirm" (from systemLanguage="pdc" id="trsvg8-pdc") is missing in the SVG Translate wordlist.

The word "Beobachtungs-" (from systemLanguage="pdc" id=trsvg15-pdc") is translated to systemLanguage="de" id="trsvg8-de" instead of id="trsvg15-de, so it ends up being on the lower tspan (where "shirm" should be). The tspan with id="trsvg15-de" is missing.

Note: There was a product decision to not handle these cases, but we can include a message to users so that they're informed of this lack of support.

Event Timeline

The original file has a two-line label of "Observation" "Screen".

On 6 August 2017, the default version's first word was removed by changing <tspan>Observation</tspan> to just <tspan/>. The default English translation was now just "Screen" instead of "Observation Screen".

That empty tspan element survived until 22 March 2019 when an invocation of SVG Translate that successfully added the two-line pdc translation but apparently removed the empty tspan element from the default translation. The removal seems to be the origin of the problem. The switch default went from a 2-line translation to a 1-line translation. The first line has disappeared (RIP trsvg15).

A subsequent Tamil translation using SVG Translate on 25 March 2019 succeeded, but put the one-line Tamil translation where the second line ("screen") was.

The subsequent High German "translation" of this report confronts the same issue. The switch element's default clause has one line, so SVG Translate may just grab the first text from the pdc clause.

SVG Translate should not have removed the empty tspan element.

A workaround might be to reinsert the tspan element but with the content &nbsp;.

Might this be another example of the issue discussed in this bug T216283#5133746?

We only support the number of <tspan>s that are in the default language. If a translation has more, the "extra" ones will be omitted.

SVG Translate should not have removed the empty tspan element.

Might be reasonable.

ifried renamed this task from SVG Translate does not handle tspan without coordinates to SVG Translate: Provide Error Message (no support for tspan without coordinates).Sep 3 2019, 11:41 PM
ifried updated the task description. (Show Details)
ifried set the point value for this task to 5.Sep 3 2019, 11:47 PM
ifried moved this task from Needs Discussion to Up Next (May 6-17) on the Community-Tech board.

This file contains a non-standard SVG format that it is not currently supported . We did take a look at the code to see if we can add a message to let the users know about the lack of support. The system is not able to recognize SVG files that are not formatted as expected so it would take a great effort to add the logic to identify this particular use case.

HMonroy subscribed.

This file contains a non-standard SVG format that it is not currently supported

@HMonroy: What does "standard" mean here exactly, apart from specs? https://tools.wmflabs.org/svgcheck/index.php states the file is valid, and it's harder to contribute a patch without knowing what the problem is, and it seems like you investigated the problem?

Looks like the problem is described in the task summary, however I'm still surprised by the use of the word "standard" here.

This file contains a non-standard SVG format that it is not currently supported .

@HMonroy

  • What do you mean by This file contains a non-standard SVG format ?
    • Which file do you mean?
    • What do you mean by a file contain a format? (Maybe I don't know English that well)
    • What is a non-standard SVG format? (SVG 1.0 and SVG 1.1 are IMHO standard SVG formats, and SVG 1.2 and SVG 2.0 are drafts, which are "often" used)

To clarify (not shure what you ment): The code of https://commons.wikimedia.org/wiki/File:Double-slit.svg is not only xml-valid it is also strictly according to the SVG 1.1 Document type definition and it is rendered correclty by all common renderer (librsvg,Inkscape,Chrome,Firefox,Internet Explorer, assuming:[Batik,ImageMagic,resvg]).
The only bit complex parts are IMHO

  • <switch>(But this should be IMHO the aim of SVG translate)
  • everything in <defs></defs> (But this are IMHO not relvant for SVG translate)

This file contains a non-standard SVG format that it is not currently supported

@HMonroy: What does "standard" mean here exactly, apart from specs? https://tools.wmflabs.org/svgcheck/index.php states the file is valid, and it's harder to contribute a patch without knowing what the problem is, and it seems like you investigated the problem?

Looks like the problem is described in the task summary, however I'm still surprised by the use of the word "standard" here.

My apologies, I did not mean the actual standard SVG file format. This file is valid, but SVG translate does not recognize when there is a different number of tspans for a translation. It assumes that the number of tspans in the default language is the same for every other language. For instance, when looking at the file https://commons.wikimedia.org/wiki/File:Double-slit.svg file, in the`switch` for the word screen the file only contains 1 tspan so it ignores any other tspans.

<switch>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-pdc" systemLanguage="pdc"><tspan id="trsvg15">Beobachtungs-</tspan><tspan x="341" y="42" id="trsvg8-pdc">schirm</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-ta" systemLanguage="ta"><tspan x="341" y="42" id="trsvg8-ta">திரை</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-de" systemLanguage="de"><tspan x="341" y="42" id="trsvg8-de">Beobachtungs-</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14-lfn" systemLanguage="lfn"><tspan x="341" y="42" id="trsvg8-lfn">fs</tspan></text>
<text x="340.9375" y="25.84375" xml:space="preserve" id="trsvg14"><tspan x="341" y="42" id="trsvg8">screen</tspan></text>
</switch>

Please note "This diagram was created with Inkscape, and then manually edited." under Summary in https://commons.wikimedia.org/wiki/File:Double-slit.svg so this case is not very common. We can take another look if this becomes a bigger problem.

Please note "This diagram was created with Inkscape, and then manually edited." under Summary in https://commons.wikimedia.org/wiki/File:Double-slit.svg so this case is not very common. We can take another look if this becomes a bigger problem.

Since Inkscape (without xml-editor, which is identical manually editing) can't create <switch Tags and can't add tags in switch-emlements and only display the language of the current GUI-language specified in the preferences (at start of the Program), every file containing <switch-Elements I know were manually created or created by Inkscape and edited afterwards. I don't know if Adobe Illustrator supports switch, but there are hardly any switch-files created in Adobe Illustrator-files with switch tags. Hence I assume Adobe Illustrator does not support switch-tags.
I checked "randomly" 20 files in Category:Translation_possible_-_SVG_(switch):

  • 10 were created in Inkscape (I,Im,H) and later manually edited (Inkscape switch-files without manually editing (before or afterwards) simply cannot exist.)
  • 4 were created in a Texteditor
  • 4 unknown (this means (a) uncommon software or (b)manully edited after softwareuse or (c) created in Texteditor)
  • 1 Adobe Illustator and later manually edited (can be seen in history)
  • 1 Adobe Illustator (and I assume manually edited later, the tool to insert the created with-template cannot recognice manually edits, it just searches for keywords, such as Illustator or Inkscape)

@HMonroy "This diagram was created with Inkscape, and then manually edited.", seems to be ~ 50% of the switch-files (I assume even more because of the unknown files).

So what do you consider as common for switch-files on commons?

So what do you consider as common for switch-files on commons?

SVG translate does not support when the number of tspans are not the same as the default language for a given translation. It does not matter how the file was generated as long as the file has the same number of tspans for each translation; otherwise, it will omit any extra tspans. I consider that is not common that a translation has a different number of tspans than the default language.