Page MenuHomePhabricator

Page titles are being double-encoded
Open, Needs TriagePublic

Description

It looks like there's some issues with double-encoding in some codes. For example, the centre plaque in
https://commons.wikimedia.org/wiki/File:Toodyaypedia_plates_stg_1_gnangarra_fs-1.jpg
resolves to
http://en.qrwp.org/St_John_the_Baptist_Church,_Toodyay_%25281963-_%2529
instead of
http://en.qrwp.org/St_John_the_Baptist_Church,_Toodyay_%281963-_%29
(i.e. the % was URL-encoded a second time).

Did a prior version of the QRpedia software handle this double-encoding, or was this code always incorrect?

(Initially reported at T209019#4761418)

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 21 2018, 7:45 AM

@ErrantX do you have any update on this?

@ErrantX
as this hasnt been resolved, would it be possible to incorporate the coding https://meta.wikimedia.org/wiki/Free_Knowledge_Portal which points to the WikiData identifier as a solution

@ErrantX: Where is the QRpedia source code? The rGQRP repository that's linked from the project's page on Meta has not been updated since Sep 2 2015. Is that the most recent? (Also pinging @Kelson @Addshore as you were involved back then it looks like.)

@Samwilson yes the qrpedia repo on Wikimedia source forge is the good one.

Jkbr added a subscriber: Jkbr.Mar 12 2019, 5:12 PM

@Samwilson yes the qrpedia repo on Wikimedia source forge is the good one.

Do you have a link to this?

I'm also working on this issue and looking to resolve the immediate issues on the site and creating a canonical codebase.

@Jkbr the repository is: rGQRP (it's hosted here in Phabricator).

Jkbr added a comment.Mar 13 2019, 11:49 AM

@Jkbr the repository is: rGQRP (it's hosted here in Phabricator).

Thanks. I'm trying to find the most recent version and this just looked like it hadn't been changed for a while.

Thanks. I'm trying to find the most recent version and this just looked like it hadn't been changed for a while.

I think that's just because it's been a while. :)

I'm happy to help review any patches.

Quarz added a subscriber: Quarz.Jun 7 2019, 7:33 PM

Because I am told that I wrote at a wrong place, I try once more and paste it here:

The issues still persist:
a) Double slashes in de.Wikipedia: https://de.qrwp.org/Berlin
The solution should be known at WMUK, because the bug has been fixed for other language versions.
b) Incorrect URL resolution for articles with non-letter character:
The issue already existed once in 2014, but was quickly fixed by a simple method at qrwp.org: With the incoming URL, do a regex_replace or string_replace, find '%25' replace with '%'.
https://en.wikipedia.org/w/index.php?title=User_talk:Victuallers&oldid=592634542#QRpedia_bug_-_Apostrophes._HELP!
I would like to know why the fix was undone.

Jkbr added a comment.Jun 10 2019, 12:54 PM

Because I am told that I wrote at a wrong place, I try once more and paste it here:

The issues still persist:
a) Double slashes in de.Wikipedia: https://de.qrwp.org/Berlin
The solution should be known at WMUK, because the bug has been fixed for other language versions.
b) Incorrect URL resolution for articles with non-letter character:
The issue already existed once in 2014, but was quickly fixed by a simple method at qrwp.org: With the incoming URL, do a regex_replace or string_replace, find '%25' replace with '%'.
https://en.wikipedia.org/w/index.php?title=User_talk:Victuallers&oldid=592634542#QRpedia_bug_-_Apostrophes._HELP!
I would like to know why the fix was undone.

Thanks. WMUK is now actively reviewing the QRPedia project including these issues and others reported here.

Gnangarra added a comment.EditedJul 8 2019, 11:54 AM

as at 5 July codes like;

https://commons.wikimedia.org/wiki/File:Connor%27s_Cottage_QRpedia.png looking for the en article Connor%2527s Cottage instead of Connor%27s_Mill

to me its reading/adding/returning the hex code of % which is 25 in the url and is then causing it to be unable to resolve %2527 rather then just reading the hex code as %27

actually since the url is https://en.wikipedia.org/wiki/Connor%27s_Cottage it shouldnt be trying to anything with the hex code

Jo Brook, working for WMUK, who I don't think is active on Phabricator, sent the following email today:

Hi all,
Firstly, I've made changes on our server so that URLs with wrongly encoded apostrophes and parentheses as in the 37 Toodyay URLs are now handled and redirected.
This change was at the webserver config rather than in the PHP code, which is only involved in generating new codes.
There is one exception to this:
https://en.wikipedia.org/wiki/St_John_the_Baptist_Church,_Toodyay_(1963-_) mentioned in https://phabricator.wikimedia.org/T210050
which fails because the English Wikipedia page itself does not now exist (I think it's maybe now https://en.wikipedia.org/wiki/St_John_the_Baptist_Church_(Toodyay) ?)
Fixing this would need someone to create a redirect page on wikipedia.org
Thanks for your patience on this -- during the call I was speaking on the basis mainly of newly created codes (which was fixed earlier this year) rather than redirecting ones which are already in use, but this email clarified that.
I've also done some further work on making this a more general solution to handle other double-encoded codes involving other characters not in the Toodyay set, but have some remaining special cases to deal with like https://en.wikipedia.org/wiki/We_are_the_99%25 which break some assumptions about applying decoding which works for other cases.
I will test any changes against your 37 codes to ensure they are fixed up correctly.
thanks and all the best,
Jo

Quarz added a comment.Jul 9 2019, 6:10 AM

Firstly, I've made changes on our server so that URLs with wrongly encoded apostrophes and parentheses as in the 37 Toodyay URLs are now handled and redirected.

Thanks a lot! Wouldn't it be better not to handle every single special character but to generally run an urldecode? In German, for example, we also need äöüÄÖÜß. Accents are also needed (French, Danish...).

Firstly, I've made changes on our server so that URLs with wrongly encoded apostrophes and parentheses as in the 37 Toodyay URLs are now handled and redirected.

Thanks a lot! Wouldn't it be better not to handle every single special character but to generally run an urldecode? In German, for example, we also need äöüÄÖÜß. Accents are also needed (French, Danish...).

I've forwarded your question to Jo via email; I don't know if she's monitoring this ticket.

Quarz added a comment.Jul 18 2019, 6:46 PM

The problem with the double slashes is fixed, very good! Now many QRpedia codes work stable again. But unfortunately the issue with the double urlencoded addresses persists, although the solution is not difficult at all.
Above it was pointed out that the code is on rGQRP. Apparently index.php is the important file here.
In line 47 the page title is loaded:

$request = stripslashes( $_GET['title'] );

Because the input is urldecoded by $_GET, everything is fine with normal (simple) urlencoded strings.
The issue is the processing of double urlencoded strings, which qrpedia.org has built into the QRpedia codes and which are now physically in the world as info boards for example.
The solution: Give $request an additional urldecode.

Change 530236 had a related patch set uploaded (by Putnik; owner: Putnik):
[qrpedia@master] Fix double-encoded page titles

https://gerrit.wikimedia.org/r/530236

putnik added a subscriber: putnik.Aug 14 2019, 10:14 PM

Change 530331 had a related patch set uploaded (by Putnik; owner: Putnik):
[qrpedia@master] Fix double-encoded page titles

https://gerrit.wikimedia.org/r/530331

Change 530331 abandoned by Putnik:
Fix double-encoded page titles

Reason:
Duplicate of https://gerrit.wikimedia.org/r/#/c/qrpedia/ /530236/

https://gerrit.wikimedia.org/r/530331

I think progress here is waiting on feedback from WMUK.

The above patch may fix the issue, but I'm not sure where we're at with ensuring that this repo is actually the one in production.

I think progress here is waiting on feedback from WMUK.

Who, at WMUK?

MichaelMaggs removed ErrantX as the assignee of this task.Aug 25 2019, 5:54 PM
MichaelMaggs added a subscriber: ErrantX.

Jkbr I believe.

MichaelMaggs assigned this task to Jkbr.Aug 25 2019, 5:55 PM

A patch was uploaded three months ago. I'm curious how it will go on. Is it accepted that many QRpedia codes worldwide have no function - so they are garbage? Or does WMUK still want to repair the software - when?
We just have the situation in Bremen that 70 codes on public information boards are affected by the bug. Due to a faulty QRpedia generator, do we now have to look for a sponsor for the renewal of the codes?

Fae added a subscriber: Fae.Mon, Nov 11, 11:35 AM