Page MenuHomePhabricator

Login retrieves token from incorrect location
Open, HighPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Use a private wiki
  • Configure a families file, user-config, and user-password file
  • Use pwb login

What happens?:

This command will fail with a readapidenied error.

What should have happened instead?:

The login should have succeeded

Software version (skip for WMF-hosted wikis like Wikipedia):

Python 3.11.1, pywikibot 8.0.0, MediaWiki 1.37.4

Other information (browser name/version, screenshots, etc.):

The correction is found in login.py around line 430:

login_request[
    self. Keyword('token')] = self.site.tokens['login']

should be changed to:

login_request[
    self. Keyword('token')] = response['token']

Making this change allowed the login process to continue. The problem is that self.site.token is an empty dict at this point in the login process. The response has the needed token.

Event Timeline

should be changed to:

login_request[
    self. Keyword('token')] = response['token']

this proposal does not work because no token is given with the response:

{'code': 'badtoken',
 'info': 'Invalid CSRF token.',
 'other': {'help': 'See https://de.wikipedia.org/w/api.php for API usage. '
                   'Subscribe to the mediawiki-api-announce mailing list at '
                   '<https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> '
                   'for notice of API deprecations and breaking changes.',
           'servedby': 'mw1376'},
 'unicode': 'badtoken: Invalid CSRF token.\n'
            '[servedby: mw1376;\n'
            ' help: See https://de.wikipedia.org/w/api.php for API usage. '
            'Subscribe to the mediawiki-api-announce mailing list at '
            '<https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> '
            'for notice of API deprecations and breaking changes.]'}

The currend code clears the token wallet in line 430. New tokens are loaded with Site.get_tokens() when self.site.tokens['login'] is used.

@chiefgeek157: Could you give the complete I/O during login for further investigation please. (Please redact private data).

Here is a hand-built "log" based on print statements I previously inserted in a local copy.

at line 406, here is the data:
    self.action = 'login'
    login_result = {
        'warnings': {
            'main': {
                '*': 'Subscribe to the mediawiki-api-announce mailing list at [deleted] for notice of API deprecations and breaking changes.'
            },
            'login': {
                '*': 'Fetching a token via "action=login" is deprecated. Use "action=query&meta=tokens&type=login" instead.'
            }
         },
        'login': {
            'result': 'NeedToken',
            'token': '2728dc3bff180ef48b90f3ee132e1cdc63dd8091+\\'
        }
    }

    and, significantly,

    self.site.tokens = {} (which I see is really a TokenWallet)

on line 430, this leads to an exception on my installation. I observed that the needed token is already in the response, which is why I made the change I did.

To be fair, I did not instrument the TokenWallet to figure out why it did not load the login token and return it. It does fail with an exception, but it is an odd apireaderror exception. This seems redundant since the user is attempting to login so will not yet have any permissions.

I did see an older issue where the mediawiki version number was being accessed during login and this was causing a read permission error, but I thought that was fixed already.

Turning on -v - v on the command line did not return anything additional that would be useful. It is the apparent exception in tokens['login'] that causes the problem. I can dig more if needed.

More info. Lines marked >>>RJC>>> are my debug prints.

% pwb login
>>>RJC>>> APISite.version()
>>>RJC>>> SiteInfo._get_general(key=generator, expiry=1 day, 0:00:00
>>>RJC>>> general not in self._cache
>>>RJC>>> forcing
>>>RJC>>> props: ['namespaces', 'namespacealiases', 'general']
>>>RJC>>> SiteInfo._get_siteinfo(prop=['namespaces', 'namespacealiases', 'general'], expiry=1 day, 0:00:00)
>>>RJC>>> request: /w/api.php?action=query&meta=siteinfo&siprop=namespaces|namespacealiases|general&continue=
WARNING: API error readapidenied: You need read permission to use this module.
>>>RJC>>> Caught APIError: readapidenied: You need read permission to use this module.
[help: See https://MYWIKI/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.]
>>>RJC>>> No read persmissions
ERROR: You have no API read permissions. Seems you are not logged in.
Logging in to my_family:en as me@me_bot
WARNING: API warning (main): Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.
WARNING: API warning (login): Fetching a token via "action=login" is deprecated. Use "action=query&meta=tokens&type=login" instead.
>>>RJC>>> self.action: login
>>>RJC>>> login_result: {'warnings': {'main': {'*': 'Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.'}, 'login': {'*': 'Fetching a token via "action=login" is deprecated. Use "action=query&meta=tokens&type=login" instead.'}}, 'login': {'result': 'NeedToken', 'token': 'f71af5fc300dbd72908ede7648441f0263e120a0+\\'}}
ERROR: Received incorrect login token. Forcing re-login.
>>>RJC>>> TokenWallet.__getitem__(login)
>>>RJC>>> self.site.user(): None
>>>RJC>>> self._currentuser: None
>>>RJC>>> self._tokens: {}
>>>RJC>>> calling self.site.get_tokens([])
>>>RJC>>> APISite.get_tokens([])
>>>RJC>>> not type or load_all not false
>>>RJC>>> ParamInfo.parameter(module=query+tokens, param_name=type
>>>RJC>>> ParamInfo.fetch({'query+tokens'}
>>>RJC>>> self._paraminfo: {}
>>>RJC>>> calling self._init()
>>>RJC>>> ParamInfo._init()
>>>RJC>>> self._modules: {}
>>>RJC>>> APISite.version()
>>>RJC>>> SiteInfo._get_general(key=generator, expiry=1 day, 0:00:00
>>>RJC>>> general not in self._cache
>>>RJC>>> forcing
>>>RJC>>> props: ['namespaces', 'namespacealiases', 'general']
>>>RJC>>> SiteInfo._get_siteinfo(prop=['namespaces', 'namespacealiases', 'general'], expiry=1 day, 0:00:00)
>>>RJC>>> request: /w/api.php?action=query&meta=siteinfo&siprop=namespaces|namespacealiases|general&continue=
WARNING: API error readapidenied: You need read permission to use this module.
>>>RJC>>> Caught APIError: readapidenied: You need read permission to use this module.

The last two lines are from site/_siteinfo.py:171 (give or take since I inserted prints)

What this seems to say is that SiteInfo attempts to go get tokens, but the API request includes requests for info that are behind the login wall for the API. Rather than only getting the needed token, other info is also requested, and that info requires the user to already be logged in. So it will never succeed from what I can tell.

I can reproduce for a WMF private wiki I have access to. Moreover if you delete the .lwp file hosting the cookie data, login.py seems unable to log the bot back in, repeatedly reporting a 'readapidenied' error.

What this seems to say is that SiteInfo attempts to go get tokens, but the API request includes requests for info that are behind the login wall for the API. Rather than only getting the needed token, other info is also requested, and that info requires the user to already be logged in. So it will never succeed from what I can tell.

That seems to be the case:

pwb.py login -v -debug
API Error: query=
("{'action': ['query'], 'meta': ['siteinfo', 'userinfo'], 'siprop': "
 "['namespaces', 'namespacealiases', 'general'], 'continue': [True], 'uiprop': "
 "['blockinfo', 'hasmsg'], 'maxlag': ['5'], 'format': ['json']}")
           response=
{'error': {'code': 'readapidenied', 'info': 'You need read permission to use this module.', 'servedby': 'mw1402', 'help': 'See https://xxxx.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1402'}
ERROR: You have no API read permissions. Seems you are not logged in.
ERROR: Username 'xxx@xxx' does not have read permissions on xxx:xxx
Not logged in on xxx:xxx.

@chiefgeek157: What is your Pywikibot version? Can you please run pwb version.

@chiefgeek157, @MarcoAurelio: What is the MediaWiki release of the private wiki?

Anyway I guess siteinfo or userinfo query cannot be be made without logged in. I'll try to find a better way.

Xqt triaged this task as High priority.Feb 26 2023, 5:13 PM

Hi you both,

can somone test login without the login script as follows

import pywikibot
from pywikibot import config
config.debug_log = ['']
site = pywikibot.Site(<your private wiki site>)
site.login()  # or site.login(user=<account name>)
In T328814#8647053, @Xqt (partially) wrote:

@MarcoAurelio: What is the MediaWiki release of the private wiki?

Hello @Xqt:

In my case, it's a WMF private wiki which may make things easier. Right now it's using MediaWiki 1.40.0-wmf.24.

PWB version data:

$ python pwb.py version -v
Pywikibot: [https] r-pywikibot-core.git (d703ece, g17791, 2023/02/23, 13:17:51, stable)
Release version: 8.0.0
setuptools version: 67.3.2
mwparserfromhell version: 0.6.4
wikitextparser version: n/a
requests version: 2.28.2
  cacerts: (...)\cacert.pem
    certificate test: ok
Python: 3.11.2

Hope that this helps.

Hi you both,

can somone test login without the login script as follows

import pywikibot
from pywikibot import config
config.debug_log = ['']
site = pywikibot.Site(<your private wiki site>)
site.login()  # or site.login(user=<account name>)

Sure, trying this using py pwb.py shell. What should I use for your private wiki site? I'm using an autofamily right now in user-config.py.

Self-answer: Site URL but without https:// otherwise Pywikibot complains that there's no https family.

@Xqt : Tested your code via pwb.py shell, with the following result: P44758.

Change 892495 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [bugfix] Get a token for private wiki

https://gerrit.wikimedia.org/r/892495

It is difficult to fix this issue without having a private wiki for tests. The patch above is more or less a guess using your api response. Are you able to test it?

It is difficult to fix this issue without having a private wiki for tests. The patch above is more or less a guess using your api response. Are you able to test it?

Tested, left some comments in the gerrit patch.