Page MenuHomePhabricator

site.unblockuser sends malformed/misencoded api request
Open, HighPublic

Description

Test code:

import pywikibot
from pywikibot.page import User
from pywikibot.site import APISite

site = pywikibot.Site()
summary = 'בדיקת בוט'
username = u'בלה'

currentUser = User(site, username)
data = site.unblockuser(currentUser, summary)
print data

Family file:

from pywikibot import family                                                    
                                                                                
class Family(family.Family):                                                    
    def __init__(self):                                                         
        family.Family.__init__(self)                                            
        self.name = 'kidipedia'                                                       
        self.langs = {                                                          
            'he': 'www.kidipedia.org.il',                                
        }
    def path(self, code):
        """Return the path to index.php for this family."""
        return '/index.php'
		
    def scriptpath(self, code):
        """Return the script path for this family."""
        return ''
		
    def apipath(self, code):
        """Return the path to api.php for this family."""
        return '/api.php'

The wiki correctly reports the encoding (utf-8), both in the content-type and in a <meta> tag.

The data sent over the wire to the wiki is actually

urlencode(u'בלה'.encode('utf-8').decode('latin-1').encode('utf-8'))

This always happens for the user parameter; the summary is correctly sent if provided as bytes, but not when provided as unicode (?!).

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptAug 25 2016, 8:49 PM
DekelE added a subscriber: Guycn2.Aug 25 2016, 8:51 PM
DekelE removed a subscriber: DekelE.Aug 25 2016, 9:33 PM
DekelE added a subscriber: DekelE.

Is there something new?

DekelE triaged this task as High priority.Aug 28 2016, 11:07 AM
DekelE added a subscriber: Xqt.
Mpaa added a subscriber: Mpaa.Aug 28 2016, 4:10 PM

How do you see that?
This is what I get in Python 2.

(Pdb) paramstring
'maxlag=5&format=json&assert=user&reason=%D7%91%D7%93%D7%99%D7%A7%D7%AA+%D7%91%D7%95%D7%98&user=%D7%91%D7%9C%D7%94&action=unblock

username = u'בלה'
username.encode('utf-8')
'\xd7\x91\xd7\x9c\xd7\x94'
username.encode('utf-8').decode('latin-1').encode('utf-8')
'\xc3\x97\xc2\x91\xc3\x97\xc2\x9c\xc3\x97\xc2\x94'

And same in Python3:

(Pdb) paramstring
'reason=%D7%91%D7%93%D7%99%D7%A7%D7%AA+%D7%91%D7%95%D7%98&assert=user&user=%D7%91%D7%9C%D7%94&maxlag=5&format=json&action=unblock

So user=%D7%91%D7%9C%D7%94 looks correct.
Or what am I missing?

I assume that where the code ran a .decode('latin-1') for DekelE, it runs a .decode('utf-8') for you. It's not clear to me whether that's related to the wiki or to some environmental setting (e.g. the console/windows encoding?). Nevertheless, running .encode().decode().encode() is something we shouldn't be doing in the first place.

Mpaa added a comment.Aug 28 2016, 9:14 PM

I assume that where the code ran a .decode('latin-1') for DekelE, it runs a .decode('utf-8') for you. It's not clear to me whether that's related to the wiki or to some environmental setting (e.g. the console/windows encoding?). Nevertheless, running .encode().decode().encode() is something we shouldn't be doing in the first place.

A lot of fall backs ... :-)

@DekelE, what do you have as config.console_encoding (you need to import 'from pywikibot import config2 as config')?

@DekelE, what do you have as config.console_encoding (you need to import 'from pywikibot import config2 as config')?

Sorry, I didn't understnad. you want me to import this libraray and then run the code again or you want me to print/look for something?

DekelE added a comment.EditedAug 29 2016, 7:06 AM

Updating:

When I tried to run the following code, the bot worked fine:

username = 'בלה'
username.decode('utf-8')

and on unblockuser function:
user=user.username.encode('utf-8'),

However, when I sent just username = u'בלה' and add the same code to the unblockuser function, I got an Error Message.

Mpaa added a comment.Aug 29 2016, 9:24 PM

@DekelE, what do you have as config.console_encoding (you need to import 'from pywikibot import config2 as config')?

Sorry, I didn't understnad. you want me to import this libraray and then run the code again or you want me to print/look for something?

I wanted you to put this in your script:

from pywikibot import config2 as config
print config.console_encoding