Page MenuHomePhabricator

In SVG files larger than 256kB with <switch> elements, the translations are not recognized
Open, LowestPublicBUG REPORT

Description

Steps to Reproduce:
Take any SVG file with the first <switch> tag appearing after $wgSVGMetadataCutoff (256kB).

Actual Results:
no translations dropdown to choose

Expected Results:
translations dropdown to choose

Event Timeline

JoKalliauer created this task.

Due to performance reasons it might be the expected result to not check large SVGs till the end for <switch-tags.

Aklapper renamed this task from SVGs larger than 265kB wich switch-elements the translations are not regogniced to In SVG files larger than 256kB with <switch> elements, the translations are not recognized.Dec 29 2020, 9:54 AM
Aklapper updated the task description. (Show Details)

Two proposals: increase the number of bytes read or shift multilingual testing to upload time (when the file is read anyway).

In T40010, Ponor looked at 30 SVG files and stated the mean file size was 700 kB. JoKalliauer stated that only about 500 SVG files are being uploaded every day. Johannes also says that SVGs are 2.8 percent of uploads.

SVG illustrations will be placing text on top of a drawing, so most text elements will be at the end of the file.

At one point, SVG uploads were limited to 10 MB. I do not know if that limit is still in effect.

I do not know how long it takes for MW to parse an XML file.

  1. We might change $wgSVGMetadataCutoff to be 3 times the average SVG file, that is 2 MB. That should allow must SVG files to be read completely and therefore correctly processed. It means that reading humongous SVG files may take up to 8 times longer, but the average case should only be 3 times longer. (It will also take up to 1.7 MB more process memory, which may be a more stringent limitation).
  1. As I understand it, the SVG file is parsed every time a page built. I'd also believe the page must be read completely when it is uploaded. At upload, the SVG file could be scanned for systemLanguage attributes, and then an entry could be made in the database whether it is multilingual. If there were no language attributes, then a page build need not scan the file at all (it could get image width and height from the imageinfo database). If there were systemLanguage attributes, then it could scan the first 2 MB of the file (or even the entire file). Having such a flag may even decrease the SVG processing time if a small percentage of SVG files are multilingual.
  1. Alternatively, the database could include all the langtags discovered in at file upload, so the SVG file would not have to be reread to build a page.

When I've run into this problem, I've used two workarounds.

One is to add a hidden switch near the top of the file:

<switch visibility="hidden">
  <text systemLanguage="en">English</text>
  <text systemLanguage="de">Deutsch</text>
  <text>English</text>
</switch>

The second is to add a similar switch to the defs element:

<defs>
  <g id="legend">
    <switch>
      <text systemLanguage="en">English</text>
      <text systemLanguage="de">Deutsch</text>
      <text>English</text>
    </switch>
  </g>
</defs>

SVG Translate offers to translate the text, and the users if the users add a translation, then it will show up on the File page.

SVG Translate could always add such an element near the front of the file. A trick would be to set the id to an SVG Translate GUID. Then SVG translate could always add the language without offering it to the user.