Page MenuHomePhabricator

API does not strip bidi characters (or trim whitespace) when validating IPs for 'user'-type parameters
Closed, ResolvedPublic

Description

Examples:

versus


Original report:

This started to happen before few days, but no for all IP addresses.

kizule@kizule:~/development/pywikibot-core$ python3 pwb.py patrol -usercontribs:‎'213.149.159.237' -ask
/home/kizule/development/pywikibot-core/pywikibot/config2.py:1138: _ConfigurationDeprecationWarning: "sysopnames" present in our user-config.py is no longer a supported configuration variable. Please inform the maintainers if you depend on it.
  'depend on it.'.format(_key), _ConfigurationDeprecationWarning)
Processing user: ‎213.149.159.237
Newpages:
Loading Корисник:Zoranzoki21/patrol_whitelist
WARNING: API error baduser_rcuser: Invalid value "‎213.149.159.237" for user parameter "rcuser".
Traceback (most recent call last):
  File "pwb.py", line 321, in <module>
    if not main():
  File "pwb.py", line 316, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "pwb.py", line 101, in run_python_file
    main_mod.__dict__)
  File "./scripts/patrol.py", line 505, in <module>
    main()
  File "./scripts/patrol.py", line 490, in main
    bot.run(feed)
  File "./scripts/patrol.py", line 286, in run
    for page in feed:
  File "./scripts/patrol.py", line 410, in api_feed_repeater
    for page in generator:
  File "/home/kizule/development/pywikibot-core/pywikibot/site.py", line 6627, in newpages
    for pageitem in gen:
  File "/home/kizule/development/pywikibot-core/pywikibot/data/api.py", line 2808, in __iter__
    self.data = self.request.submit()
  File "/home/kizule/development/pywikibot-core/pywikibot/data/api.py", line 2080, in submit
    raise APIError(**result['error'])
pywikibot.data.api.APIError: baduser_rcuser: Invalid value "‎213.149.159.237" for user parameter "rcuser". [help:See https://sr.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.]
CRITICAL: Exiting due to uncaught exception <class 'pywikibot.data.api.APIError'>
kizule@kizule:~/development/pywikibot-core$

Version of pywikibot:

kizule@kizule:~/development/pywikibot-core$ python3 pwb.py version
/home/kizule/development/pywikibot-core/pywikibot/config2.py:1138: _ConfigurationDeprecationWarning: "sysopnames" present in our user-config.py is no longer a supported configuration variable. Please inform the maintainers if you depend on it.
  'depend on it.'.format(_key), _ConfigurationDeprecationWarning)
Pywikibot: [ssh] pywikibot-core (e78bbf3, g11512, 2019/09/10, 11:30:54, ok)
Release version: 3.1.dev0
requests version: 2.21.0
  cacerts: /etc/ssl/certs/ca-certificates.crt
    certificate test: ok
Python: 3.7.3 (default, Aug 20 2019, 17:04:43) 
[GCC 8.3.0]
PYWIKIBOT_DIR: Not set
PYWIKIBOT_DIR_PWB: 
PYWIKIBOT_NO_USER_CONFIG: Not set
Config base dir: /home/kizule/development/pywikibot-core
Usernames for family "wikipedia":
	sr: Zoranzoki21 (also sysop)
kizule@kizule:~/development/pywikibot-core$

Event Timeline

Anomie subscribed.
kizule@kizule:~/development/pywikibot-core$ python3 pwb.py patrol -usercontribs:‎'213.149.159.237' -ask

The value you're passing there is not just an IP address. It's an IP address prefixed with U+200E (LEFT-TO-RIGHT MARK), which makes it an invalid value.

The Unicode bidi override characters (U+200E, U+200F, and U+202A–U+202E) are stripped from actual usernames during canonicalization (as part of passing it through the page title parser), but the API parameter validation does not strip them from IP addresses when validating those. At a glance it looks like other code in MediaWiki that handles IP usernames specially also generally doesn't strip these characters.

The API validation code for IP usernames also doesn't trim leading or trailing whitespace, as is done for page titles.

I'm undecided whether this code should be updated to do that stripping, or if the non-stripping is acceptable behavior.

Anomie renamed this task from API error baduser_rcuser in pywikibot (not for all IP addresses) to API does not strip bidi characters (or trim whitespace) when validating IPs for 'user'-type parameters.Sep 12 2019, 1:18 PM
Anomie updated the task description. (Show Details)
Anomie updated the task description. (Show Details)
kizule@kizule:~/development/pywikibot-core$ python3 pwb.py patrol -usercontribs:‎'213.149.159.237' -ask

The value you're passing there is not just an IP address. It's an IP address prefixed with U+200E (LEFT-TO-RIGHT MARK), which makes it an invalid value.

The Unicode bidi override characters (U+200E, U+200F, and U+202A–U+202E) are stripped from actual usernames during canonicalization (as part of passing it through the page title parser), but the API parameter validation does not strip them from IP addresses when validating those. At a glance it looks like other code in MediaWiki that handles IP usernames specially also generally doesn't strip these characters.

The API validation code for IP usernames also doesn't trim leading or trailing whitespace, as is done for page titles.

I'm undecided whether this code should be updated to do that stripping, or if the non-stripping is acceptable behavior.

I tried with 'IP address' and without it but same happening.

I tried with 'IP address' and without it but same happening.

Cannot follow. Can you explain your comment please.

I tried with 'IP address' and without it but same happening.

Cannot follow. Can you explain your comment please.

I mean to I tried to use this formating: '127.0.0.1', "127.0.0.1" and 127.0.0.1

But same effect.

But thanks @Anomie for contributing to this task and I will when I come home (because I am in bus currently) to type IP address no copy-paste from srwiki recent changes.

Ok, I tried to type IP address, not to copy-paste and works.

kizule@kizule:~/development/pywikibot-core$ python3 pwb.py patrol -usercontribs:109.122.123.39 -ask
/home/kizule/development/pywikibot-core/pywikibot/config2.py:1138: _ConfigurationDeprecationWarning: "sysopnames" present in our user-config.py is no longer a supported configuration variable. Please inform the maintainers if you depend on it.
  'depend on it.'.format(_key), _ConfigurationDeprecationWarning)
Processing user: 109.122.123.39
Newpages:
Loading Корисник:Zoranzoki21/patrol_whitelist
Recentchanges:
User 109.122.123.39 has created or modified page Односи Србије и Шведске
Do you want to mark page as patrolled? ([y]es, [n]o, [q]uit): # test
Do you want to mark page as patrolled? ([y]es, [n]o, [q]uit): q

User quit PatrolBot bot run.
0/0 patrolled
kizule@kizule:~/development/pywikibot-core$ python3 pwb.py version
/home/kizule/development/pywikibot-core/pywikibot/config2.py:1138: _ConfigurationDeprecationWarning: "sysopnames" present in our user-config.py is no longer a supported configuration variable. Please inform the maintainers if you depend on it.
  'depend on it.'.format(_key), _ConfigurationDeprecationWarning)
Pywikibot: [ssh] pywikibot-core (cce0d30, g11513, 2019/09/12, 16:13:38, ok)
Release version: 3.1.dev0
requests version: 2.21.0
  cacerts: /etc/ssl/certs/ca-certificates.crt
    certificate test: ok
Python: 3.7.3 (default, Aug 20 2019, 17:04:43) 
[GCC 8.3.0]
PYWIKIBOT_DIR: Not set
PYWIKIBOT_DIR_PWB: 
PYWIKIBOT_NO_USER_CONFIG: Not set
Config base dir: /home/kizule/development/pywikibot-core
Usernames for family "wikipedia":
	sr: Zoranzoki21 (also sysop)
kizule@kizule:~/development/pywikibot-core$

This is not pywikibot related because there where wrong chars in the command line and therefore this task looks invalid. Probably these invalid characters could be stripped during api call or the warning could be more informative. I don’t think that we should strip all command line arguments on pywikibot side but that would be an option too.

So probably the most consistent thing to do here is to pass it through Title::newFromText( $value, NS_USER ) (then get the IP back out with ->getText()), like User::getCanonicalName() does for registered user names, before running the regexes to determine if it's an IP.

Change 434718 merged by jenkins-bot:
[mediawiki/core@master] API: Use ParamValidator library

https://gerrit.wikimedia.org/r/434718