Page MenuHomePhabricator

Frequent "Invalid CSRF token" errors on Wikimedia projects using Pywikibot since August 2020
Open, HighPublic

Description

Since August 2020 I often seem to be getting "Invalid CSRF token" errors in Pywikibot. At first I thought I messed something up in my own code, but I get a whole bunch of them using touch.py. I've only seen this error on Commons, but that might be because I've not run a lot of stuff on Wikidata. I'm using the latest git version from this month. I'm pretty sure I haven't seen this error before or at least not as often as now. It seems to be coming in waves (intermittent problem). It's unclear to me if this is caused by the Mediawiki-api, Pywikibot or the two of them not properly communicating.

I wonder if it's visible in logstash.wikimedia.org/ if a user is hitting this error and if so, if an increase is visible.

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript

@Xqt what is the correct way to force the bot to get a new token? Just site.get_tokens() or something else? I noticed some of my bots now stuck with 100s of errors in a row.

Unfortunately I am not familiar with it. I haven't seen that problem for a long time. But we have no CI tests for commons as default site, I guess we should have some.

eprodromou added a subscriber: eprodromou.

Accepting this for clinic duty. Not sure what's going on here, but we have a couple of ideas.

Pinging @Krinkle in case it's a main stash thing.

What login method are you using?

Pinging @Krinkle in case it's a main stash thing.

The main stash is not involved with CSRF tokens, it's either a SessionManager thing or a Kask thing (assuming it's on our side).

What login method are you using?

Classic login

Pinging @Krinkle in case it's a main stash thing.

The main stash is not involved with CSRF tokens, it's either a SessionManager thing or a Kask thing (assuming it's on our side).

I was unable to find any relevant logging in logstash.

If the error is easy enough to reproduce, you could try setting the X-Wikimedia-Debug header to get some pretty detailed logs.

https://travis-ci.org/github/wikimedia/pywikibot/jobs/721977220 (large page) got this warning once and then on every next attempt:

WARNING: API error badtoken: Invalid CSRF token.
ERROR: Received incorrect login token. Forcing re-login.

If the error is easy enough to reproduce, you could try setting the X-Wikimedia-Debug header to get some pretty detailed logs.

No, it's not, it seems to be intermittent. I added a bit more debugging code to my bot. When it hits the exception, it will print the current token, grab a new one and print the new one. The new and old tokens are the same so either the bot is not actually getting a new token or the api is returning the same one. I'll investigate a bit more.

I think I found what is going wrong. site.get_tokens(['csrf']) gets you a new token, but doesn't update the internal token state:

>>> import pywikibot
>>> site = pywikibot.Site(u'commons', u'commons')
>>> print (site.tokens['csrf'])
1ade99bef6b895e188692438b12cbd9e5f4aacac+\
>>> site.get_tokens(['csrf'])
{'csrf': '113400e3b87ae1d1d53ec4ebee9bde105f4aacb8+\\'}
>>> print (site.tokens['csrf'])
1ade99bef6b895e188692438b12cbd9e5f4aacac+\

Looks like a site.tokens.load_tokens(['csrf']) does refresh the token, all quite confusing.

site.tokens property is the container object which holds tokens. It uses site.get_tokens() to load tokens.

I think this happens:

  • Bot fires up and gets some tokens
  • Bot does a lot of edits all with the same token
  • For some reason after a long time the token is not valid anymore
  • The bot continues to try to edit with the old invalid token instead of getting a new one

How long is a token supposed to be valid? Forever? The Pywikibot code seems to assume that it can use the same token for all edits in a one run. One run can take days or weeks. As a work around I now force the site to get a new token when I run into a problem. That seems to be stable.

So I think Pywikibot needs to be updated in the framework itself that if it runs into a token error, the a new token gets retrieved, updated in the token cache and the edit should be retried. Agree @Xqt ?

Most relevant code is in https://github.com/wikimedia/pywikibot/blob/master/pywikibot/site/__init__.py btw.

Tokens never expire, but the session that holds them can. That behavior changed recently with the switch to Kask (less recently than August, though): before we had theoretically-infinite sessions which could get evicted, now we have a set session lifetime (1 day, IIRC?).

The session also gets terminted when authentication fails, so possibly the bot actually gets logged out and the CSRF error is an effect of that.

@Multichill: do you have any tracebacks for investigation?

JJMC89 renamed this task from Frequent "Invalid CSRF token" errors on Wikimedia Commons using Pywikibot since August 2020 to Frequent "Invalid CSRF token" errors on Wikimedia projects using Pywikibot since August 2020.EditedSep 2 2020, 6:03 AM
JJMC89 removed a project: Commons.
JJMC89 added a subscriber: JJMC89.

This isn't just on Commons.

First Invalid CSRF token error from a job that started 2020-09-01 05:59:32.
2020-09-02 02:43:21             api.py, 2025 in             submit: WARNING  API error badtoken: Invalid CSRF token.
2020-09-02 02:43:21             api.py, 2074 in             submit: VERBOSE  API Error: query=
("{'assert': ['user'], 'nocreate': [True], 'text': ['{{possibly empty "
 'category}} {{Tracking category}} {{polluted category}}[[Category:Infobox '
 "road instances in Australia', 'Australian Capital Territory]]'], 'summary': "
 "['Removing [[:Category:Empty categories with no backlinks]] per "
 '[[Wikipedia:Categories for discussion/Log/2020 August 9#Category:Empty '
 "categories with no backlinks]]'], 'action': ['edit'], 'createonly': [False], "
 "'format': ['json'], 'minor': [True], 'token': "
 "['<token_goes_here>'], 'basetimestamp': "
 "[Timestamp(2020, 8, 22, 10, 26)], 'bot': [True], 'recreate': [True], "
 "'title': [Category('Category:Infobox road instances in the Australian "
 "Capital Territory')], 'notminor': [False], 'maxlag': ['5']}")
2020-09-02 02:43:21             api.py, 2076 in             submit: VERBOSE             response=
{'error': {'info': 'Invalid CSRF token.', 'code': 'badtoken', 'help': 'See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw2368'}
2020-09-02 02:43:21        __init__.py,  108 in             handle: VERBOSE  Error saving page [[Category:Infobox road instances in the Australian Capital Territory]] (badtoken: Invalid CSRF token.
[help: See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.])
Traceback (most recent call last):
  File "/shared/pywikibot/core_stable/pywikibot/page/__init__.py", line 99, in handle
    func(self, *args, **kwargs)
  File "/shared/pywikibot/core_stable/pywikibot/page/__init__.py", line 1315, in _save
    watch=watch, bot=botflag, **kwargs)
  File "/shared/pywikibot/core_stable/pywikibot/site/_decorators.py", line 94, in callee
    return fn(self, *args, **kwargs)
  File "/shared/pywikibot/core_stable/pywikibot/site/__init__.py", line 5202, in editpage
    result = req.submit()
  File "/shared/pywikibot/core_stable/pywikibot/data/api.py", line 2078, in submit
    raise APIError(**result['error'])
pywikibot.data.api.APIError: badtoken: Invalid CSRF token.
[help: See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.]
2020-09-02 02:43:22             bot.py, 1251 in         _save_page: ERROR    Skipping Category:Infobox road instances in the Australian Capital Territory because of a save related error: Edit to page [[Category:Infobox road instances in the Australian Capital Territory]] failed:
badtoken: Invalid CSRF token.
[help: See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.]

Based on Multichill's analysis, I assume this is working as designed on the MediaWiki side, and needs to be fixed in pywikibot.

However, MediaWiki documentations should be more clear on session lifetime (and whether it's counted against periods of inactivity, or in total). Documentation for the token API should also be improved in that respect.

I'm dropping this off the clinic duty board, since I don't see anything for PET to do here.

I can confirm that badtoken error is not handled by pywikibot.data.api.py like other API exceptions.

Multichill raised the priority of this task from Medium to High.Sep 7 2020, 4:14 PM
Multichill added a subscriber: Schlurcher.

Bumping this to high because this makes longer pywikibot bot runs crash.

However, MediaWiki documentations should be more clear on session lifetime (and whether it's counted against periods of inactivity, or in total). Documentation for the token API should also be improved in that respect.

Do you know which pieces of documentation should be updated?

Bumping this to high because this makes longer pywikibot bot runs crash.

Very strange! My bots are working since weeks without any problem.

@Multichill: In front of this raising API exception

File "/shared/pywikibot/core_stable/pywikibot/data/api.py", line 2078, in submit
  raise APIError(**result['error'])

a log entry should be made in api.py line 2040:

pywikibot.log('API Error: query=\n%s'
              % pprint.pformat(param_repr))
pywikibot.log('           response=\n%s'
              % result)

Anyway there is a badtoken handling in line 2014/1871. No idea why this fails but there is also some hints witten to the log file (lines 1886 and 1904). This would be very helpfull to investigate in this matter.

Change 625720 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [IMPR] Print login status to log fil if 'badtoken' API error occures

https://gerrit.wikimedia.org/r/625720

Looks like LoginStatus is IN_PROGRESS during that 'badtoken' error.

Change 625720 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Print login status to log file if 'badtoken' API error occures

https://gerrit.wikimedia.org/r/625720

I guess after T263010 is merged this error should write 'Login status: IN_PROGRESS' into the logfile every time. @Multichill could you confirm?

Is this bug related to T261066?

The problematic loop is inside api.LoginManager.getCookie():
Status in line 3112 is 'badtoken', the error is printed in line 3122. After that session cookies are deleted in line 3125/51. After that a new 'login' token is retrieved in line 3090/3150

See also: T224712

Just noting that I was experiencing this issue while running touch.py, so I updated my PWB installation to the latest version (6.4.0) and no longer experience it. This may have been incidentally fixed at some point, if anyone knows what change that might have been and/or wants to confirm whether they are still experiencing this issue.

The error is still an issue. It appears to have decreased in frequency though. I am using OAuth if that matters.

2021-08-06 11:47:04             bot.py,  466 in     writelogheader: VERBOSE  === Pywikibot framework v6.5.0 -- Logging header ===
2021-08-06 11:47:04             bot.py,  469 in     writelogheader: VERBOSE  COMMAND: ['/data/project/jjmc89-bot/repos/JJMC89_bot/enwiki/cfdw.py', '-dir:.pywikibot_cfdwl', '-page:WP:Categories_for_discussion/Working/Large', '-page:WP:Categories_for_discussion/Working']
2021-08-06 11:47:04             bot.py,  472 in     writelogheader: VERBOSE  DATE: 2021-08-06 11:47:04.201385 UTC
2021-08-06 11:47:04             bot.py,  477 in     writelogheader: VERBOSE  VERSION: pywikibot/__init__.py (, -1 (unknown), 2021/08/05, 18:30:39, n/a)
2021-08-06 11:47:04             bot.py,  483 in     writelogheader: VERBOSE  SYSTEM: posix.uname_result(sysname='Linux', nodename='cfdwl-1628250420-dzr66', release='4.19.0-14-amd64', version='#1 SMP Debian 4.19.171-2 (2021-01-30)', machine='x86_64')
2021-08-06 11:55:17             api.py, 1815 in             submit: VERBOSE             headers=
{'Date': 'Fri, 06 Aug 2021 11:55:14 GMT', 'Server': 'mw2372.codfw.wmnet', 'X-Content-Type-Options': 'nosniff', 'Mediawiki-Api-Error': 'badtoken', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Disposition': 'inline; filename=api-result.json', 'Cache-Control': 'private, must-revalidate, max-age=0', 'Content-Type': 'application/json; charset=utf-8', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'Age': '6', 'X-Cache': 'cp1077 pass, cp1087 pass', 'X-Cache-Status': 'pass', 'Server-Timing': 'cache;desc="pass", host;desc="cp1087"', 'Strict-Transport-Security': 'max-age=106384710; includeSubDomains; preload', 'Report-To': '{ "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }', 'NEL': '{ "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}', 'Permissions-Policy': 'interest-cohort=()', 'X-Client-IP': '172.16.1.189', 'Accept-Ranges': 'bytes', 'Content-Length': '242', 'Connection': 'keep-alive'}
2021-08-06 11:55:17             api.py, 1695 in         _bad_token: VERBOSE  Login status: IN_PROGRESS
2021-08-06 11:55:17             api.py, 1865 in             submit: VERBOSE  API Error: query=
("{'action': ['edit'], 'title': [Category('Category:Clean-up categories from "
 "January 2017')], 'text': ['{{Monthly clean-up category}}'], 'token': "
 "['<redacted>'], 'summary': "
 "['[[Wikipedia:Categories for discussion/Log/2021 June 17#Category:Clean-up "
 "categories]] closed as no consensus'], 'bot': [True], 'recreate': [True], "
 "'createonly': [False], 'nocreate': [False], 'minor': [True], 'notminor': "
 "[False], 'basetimestamp': [Timestamp(2021, 7, 30, 5, 0, 27)], 'assert': "
 "['user'], 'maxlag': ['5'], 'format': ['json']}")
2021-08-06 11:55:17             api.py, 1866 in             submit: VERBOSE             response=
{'error': {'code': 'badtoken', 'info': 'Invalid CSRF token.', 'help': 'See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw2372'}
2021-08-06 11:55:17     _decorators.py,   42 in             handle: VERBOSE  Error saving page [[Category:Clean-up categories from January 2017]] (badtoken: Invalid CSRF token.
[help: See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.])
Traceback (most recent call last):
  File "/data/project/jjmc89-bot/.venvs/jjmc89-bot/lib/python3.7/site-packages/pywikibot/page/_decorators.py", line 33, in handle
    func(self, *args, **kwargs)
  File "/data/project/jjmc89-bot/.venvs/jjmc89-bot/lib/python3.7/site-packages/pywikibot/page/__init__.py", line 1224, in _save
    watch=watch, bot=botflag, **kwargs)
  File "/data/project/jjmc89-bot/.venvs/jjmc89-bot/lib/python3.7/site-packages/pywikibot/site/_decorators.py", line 92, in callee
    return fn(self, *args, **kwargs)
  File "/data/project/jjmc89-bot/.venvs/jjmc89-bot/lib/python3.7/site-packages/pywikibot/site/_apisite.py", line 1633, in editpage
    result = req.submit()
  File "/data/project/jjmc89-bot/.venvs/jjmc89-bot/lib/python3.7/site-packages/pywikibot/data/api.py", line 1868, in submit
    raise pywikibot.exceptions.APIError(**result['error'])
pywikibot.exceptions.APIError: badtoken: Invalid CSRF token.
[help: See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.]
2021-08-06 11:55:17        __init__.py, 1332 in             _flush: VERBOSE  Dropped throttle(s).
2021-08-06 11:55:17            http.py,   88 in              flush: VERBOSE  Closing network session.
2021-08-06 11:55:17            http.py,   92 in              flush: CRITICAL Exiting due to uncaught exception <class 'pywikibot.exceptions.OtherPageSaveError'>
2021-08-06 11:55:17            http.py,   94 in              flush: VERBOSE  Network session closed.

This error sometimes occurs when using page.save(), sometimes not

Here is the error log of one of my scripts:

ERROR: Logged in as '172.16.7.13' instead of 'Alertlivebot'. Forcing re-login.
WARNING: No user is logged in on site wikipedia:zh
Logging in to wikipedia:zh as Alertlivebot@alertlive
WARNING: API warning (login): Fetching a token via "action=login" is deprecated. Use "action=query&meta=tokens&type=login" instead.
WARNING: API warning (main): Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes. Use [[Special:ApiFeatureUsage]] to see usage of deprecated features by your application.
ERROR: Received incorrect login token. Forcing re-login.
WARNING: API error badtoken: Invalid CSRF token.
Sleeping for 19.8 seconds, 2021-10-15 02:37:31
Page [[User:Alertlivebot/Status]] saved
ERROR: Logged in as '172.16.7.13' instead of 'Alertlivebot'. Forcing re-login.
WARNING: No user is logged in on site wikipedia:zh
Logging in to wikipedia:zh as Alertlivebot@alertlive
WARNING: API warning (login): Fetching a token via "action=login" is deprecated. Use "action=query&meta=tokens&type=login" instead.
WARNING: API warning (main): Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes. Use [[Special:ApiFeatureUsage]] to see usage of deprecated features by your application.
ERROR: Received incorrect login token. Forcing re-login.
WARNING: API error badtoken: Invalid CSRF token.
Sleeping for 19.9 seconds, 2021-10-15 04:42:56
Following jobs do not exist: 
alertlive-beta
Page [[User:Alertlivebot/Status]] saved
ERROR: Logged in as '172.16.7.13' instead of 'Alertlivebot'. Forcing re-login.
WARNING: No user is logged in on site wikipedia:zh
Logging in to wikipedia:zh as Alertlivebot@alertlive
WARNING: API warning (login): Fetching a token via "action=login" is deprecated. Use "action=query&meta=tokens&type=login" instead.
WARNING: API warning (main): Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes. Use [[Special:ApiFeatureUsage]] to see usage of deprecated features by your application.
ERROR: Received incorrect login token. Forcing re-login.
WARNING: API error badtoken: Invalid CSRF token.
Sleeping for 9.9 seconds, 2021-10-15 05:58:21
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
Sleeping for 9.6 seconds, 2021-10-15 14:17:44
Page [[User:Alertlivebot/Status]] saved
Page [[User:Alertlivebot/Status]] saved
ERROR: Logged in as '172.16.1.211' instead of 'Alertlivebot'. Forcing re-login.
WARNING: No user is logged in on site wikipedia:zh
Logging in to wikipedia:zh as Alertlivebot@alertlive
WARNING: API warning (main): Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes. Use [[Special:ApiFeatureUsage]] to see usage of deprecated features by your application.
WARNING: API warning (login): Fetching a token via "action=login" is deprecated. Use "action=query&meta=tokens&type=login" instead.
ERROR: Received incorrect login token. Forcing re-login.
WARNING: API error badtoken: Invalid CSRF token.
Sleeping for 9.9 seconds, 2021-10-15 22:13:08
Page [[User:Alertlivebot/Status]] saved

My script: https://github.com/shizhao/alertlive/blob/main/alertstatus.py, the relevant code is in line 50-54

On Toolforge, I only have this script as continuous jobs, and found this error. The other scripts are one-off job, and this error did not occur