The LicenseUrl element has a trailing '\n' element, making it an incorrect URL.
Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=65573
The LicenseUrl element has a trailing '\n' element, making it an incorrect URL.
Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=65573
Looking more around, '\n' are added to several values:
"Credit": {
"value": "\nSelf-photographed",
"source": "commons-desc-page",
"hidden": ""
},
"LicenseUrl": {
"value": "http://creativecommons.org/licenses/by-sa/3.0\n",
"source": "commons-desc-page",
"hidden": ""
},
"LicenseShortName": {
"value": "CC-BY-SA-3.0\n",
"source": "commons-desc-page",
"hidden": ""
},
"UsageTerms": {
"value": "Creative Commons Attribution-Share Alike 3.0\n",
"source": "commons-desc-page",
"hidden": ""
},
Change 97743 had a related patch set uploaded by Gergő Tisza:
Trim HTML-based metadata values
Change 97743 abandoned by Gergő Tisza:
Trim HTML-based metadata values
Reason:
Abandoning this change since InformationParser has been completely rewritten in the meantime.
This issue is occurring again. See e.g. https://commons.wikimedia.org/w/api.php?action=query&prop=imageinfo&format=json&iiprop=commonmetadata|extmetadata&iilimit=1&titles=File%3ALandsort%20Lighthouse%20August%202013%2009.jpg
where
"LicenseShortName": {
"value": "CC-BY-SA-3.0\n", "source": "commons-desc-page", "hidden": ""
},
"UsageTerms": {
"value": "Creative Commons Attribution-Share Alike 3.0\n", "source": "commons-desc-page", "hidden": ""
},
"LicenseUrl": {
"value": "http://creativecommons.org/licenses/by-sa/3.0\n", "source": "commons-desc-page", "hidden": ""
},
Looking at the html source of the example above [1] there is no trace of these newline characters. Hence it might not be a cleaning/trimming issue in the TemplateParser but rather inserted by it?
[1] https://commons.wikimedia.org/wiki/File:Landsort_Lighthouse_August_2013_09.jpg
As stated in bug 69497, these newlines are in the license template, and the code doing the HTML scraping there had better remove them.
The code to remove is in https://gerrit.wikimedia.org/r/#/c/120948/1/TemplateParser.php which at a glance seems correct to me. Also, Lokal_Profil is right that the newline is not always present in the HTML code. I'll test locally with the examples mentioned here.
This code does _not_ look good. '/^\s+(.*)\s+$/' is wrong. It fails to trim if there are no leading blanks (or no trailing blanks). And watch out for the greedy (.*), that also looks wrong.
(In reply to Tisza Gergő from comment #10)
Also, Lokal_Profil is right that the newline is
not always present in the HTML code. I'll test locally with the examples
mentioned here.
Not correct. See
Returns the same trailing newlines for UsageTerms and LicenseUrl.
Change 155901 had a related patch set uploaded by TheDJ:
TemplateParser: Fix whitespace trim
(In reply to Lupo from comment #11)
This code does _not_ look good. '/^\s+(.*)\s+$/' is wrong. It fails to trim
if there are no leading blanks (or no trailing blanks). And watch out for
the greedy (.*), that also looks wrong.
D'oh, that was stupid. Thanks for fixing, Lupo & TheDJ!