Page MenuHomePhabricator

Icon display fails at runtime when user submitted value stored in database is URL encoded
Open, MediumPublicBUG REPORT

Assigned To
None
Authored By
Nux
Aug 6 2023, 7:53 PM
Referenced Files
F37327740: obraz.png
Aug 6 2023, 7:53 PM
F37327738: obraz.png
Aug 6 2023, 7:53 PM
Subscribers

Description

Steps to replicate the issue (include links if applicable):

What happens?:

The icon doesn't work (not displayed). It has polish letters in the name, maybe that is the problem?

What should have happened instead?:

Display the icon.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

obraz.png (476×1 px, 42 KB)

obraz.png (410×1 px, 31 KB)

Event Timeline

Toolhub's frontend javascript makes an Action API call to commons to find the right image files to load. In this case the API call seems to be returning an error response:

{
  "batchcomplete":true,
  "query":{
    "pages":[
      {
        "title":"File:Logo_Dzie%C5%84_Nowego_Artyku%C5%82u_Orem_version.svg",
        "invalidreason":"The requested page title contains invalid characters: \"%C5\".",
        "invalid":true
      }
    ]
  }
}

The problem seems to be double URL encoding of the filename. The URL you added (https://commons.wikimedia.org/wiki/File:Logo_Dzie%C5%84_Nowego_Artyku%C5%82u_Orem_version.svg) is URL encoded already, but the Toolhub software doesn't know this so it applies another level of URL encoding. This results in Toolhub sending the filename as Logo_Dzie%25C5%2584_Nowego_Artyku%25C5%2582u_Orem_version.svg which then is decoded to the still URL encoded Logo_Dzie%C5%84_Nowego_Artyku%C5%82u_Orem_version.svg value by MediaWiki instead of the desired Logo Dzień Nowego Artykułu Orem version.svg value.

The quick fix for this particular case is to update the URL in Toolhub to the un-encoded value of https://commons.wikimedia.org/wiki/File:Logo_Dzień_Nowego_Artykułu_Orem_version.svg. I have done this at https://toolhub.wikimedia.org/tools/dna/history/revision/46196/diff/48064 and the desired icon is now rendering.

The larger question is how can we detect and correct this so of thing automatically in the future? Is it more correct to URL decode all submitted values before storing or to assume that user submitted values are already encoded and not add URL encoding when sending to the action api? I think I would lean towards the first solution (decoding on input so that "raw" values are stored in the database and transmitted by the API) because assuming that stored data is properly escaped for use in HTML or a URL is a classic web security vulnerability.

bd808 renamed this task from SVG icon with Polish letters is not displayed to Icon display fails at runtime when user submitted value stored in database is URL encoded.Nov 9 2023, 7:06 PM

The larger question is how can we detect and correct this so of thing automatically in the future? Is it more correct to URL decode all submitted values before storing or to assume that user submitted values are already encoded and not add URL encoding when sending to the action api? I think I would lean towards the first solution (decoding on input so that "raw" values are stored in the database and transmitted by the API) because assuming that stored data is properly escaped for use in HTML or a URL is a classic web security vulnerability.

The real tricky bit of this is figuring out if the input URL actually is URL encoded or not. One heuristic is to decode and then compare to see if the decoded URL is shorter than the original input or not. That decode also needs to guard against hard errors which can be raised by attempting to decode input that is not encoded but also contains characters that look like encoded input. See https://stackoverflow.com/questions/2295223/how-to-find-out-if-string-has-already-been-url-encoded for more discussion and ideas. I don't think there is actually a definitive answer for this, but we may be able to find a reasonably stable partial solution, especially for the icon case which we can also test by submitting to the Action API as part of the validation procedure.

bd808 triaged this task as Medium priority.Nov 9 2023, 7:18 PM
bd808 moved this task from Backlog to Groomed/Ready on the Toolhub board.

The quick fix for this particular case is to update the URL in Toolhub to the un-encoded value of https://commons.wikimedia.org/wiki/File:Logo_Dzień_Nowego_Artykułu_Orem_version.svg. I have done this at https://toolhub.wikimedia.org/tools/dna/history/revision/46196/diff/48064 and the desired icon is now rendering.

Thanks!

The problem seems to be double URL encoding of the filename.

Interesting. I think most of the time it would be safe to assume the url is encoded. When you copy url from a browser (be it Firefox or Chrome) you should get an encoded URL.

The real tricky bit of this is figuring out if the input URL actually is URL encoded or not. One heuristic is to decode and then compare to see if the decoded URL is shorter than the original input or not.

As all urls on commons are Unicode I think it would safe to assume anything with double % is encoded. E.g.:

function isEncoded(url) {
  testRe = /%[0-9A-F]{2}%[0-9A-F]{2}/;
  return (url.search(testRe) > 0);
}
function test(url) {
  is = isEncoded(url);
  console.log({is, url});
}
test('https://commons.wikimedia.org/wiki/File:Logo_Dzie%C5%84_Nowego_Artyku%C5%82u_Orem_version.svg')

test('https://commons.wikimedia.org/wiki/File:Logo_Dzień_Nowego_Artykułu_Orem_version.svg')

test('https://commons.wikimedia.org/wiki/File:1872%E3%80%8A%E7%9B%A7%E5%85%AC%E6%98%8E%E8%8B%B1%E8%8F%AF%E8%90%83%E6%9E%97%E9%9F%BB%E5%BA%9C%E3%80%8B%E9%99%84%E9%8C%84.png')

As mentioned before, most of the time the url would be encoded (if users copy it from the browser).