Page MenuHomePhabricator

Implement badtoken detection and recovery
Closed, ResolvedPublic

Description

Every once in a while I get a badtoken exception. This is probably because I have multiple bots running on the same site at the same time (race condition).

  • Bot A requests token -> 123
  • Bot B requests token -> 123
  • Bot A edits with token 123 -> ok
  • Bot B edits with token 123 -> poof

We could of course implement very difficult synchronization, but it doesn't happen very often so it's probably better handle it like a collision in ethernet.

  • Detect the badtoken
  • Back off for a random number of seconds
  • Get a new token
  • Do the edit

Max tries should be respected so the bot can't get into a infinite retry loop.


Version: core-(2.0)
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=54311

Details

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:19 AM
bzimport set Reference to bz59678.
bzimport added a subscriber: Unknown Object (????).

Example:

badtoken

  • '''Sorry! We could not process your edit due to a loss of session data.'''

Please try again.
If it still does not work, try [[Special:UserLogout|logging out]] and logging back in.

  • There seems to be a problem with your login session;

this action has been canceled as a precaution against session hijacking.
Go back to the previous page, reload that page and then try again.

{u'messages': {u'1': {u'type': u'error', u'name': u'sessionfailure'}, u'0': {u't
ype': u'error', u'name': u'session_fail_preview'}, u'html': {u'*': u'<ul>\n<li>
<b>Sorry! We could not process your edit due to a loss of session data.</b>\n</l
i>\n</ul>\n<p>Please try again.\nIf it still does not work, try <a href="/wiki/S
pecial:UserLogout" title="Special:UserLogout">logging out</a> and logging back i
n.\n</p>\n<ul>\n<li> There seems to be a problem with your login session;\n</li>
\n</ul>\n<p>this action has been canceled as a precaution against session hijack
ing.\nGo back to the previous page, reload that page and then try again.\n</p>'}
}}

OK, so this is slightly more complicated than it seems.

There are two obvious methods:

  • handle the BadToken error in data/api.py. We can just self.sleep() and then get a new edit token
  • handle the BadToken error in data/page.py, in editpage()

Both options have their problems.

data/api.py:
good: we can also handle other types of token problems
bad: edit tokens also serve to detect edit conflicts, and we cannot handle those at the data/api.py level...

data/page.py:
good: the logic for getting tokens & handling edit conflicts is already here!
bad: the retry logic is in the data/api.py layer, and it doesn't cover other token issues

Wikidata is very unstable today so I keep running into:

File "C:\pywikibot\coredev\pywikibot\data\api.py", line 458, in submit
  raise APIError(code, info, **result["error"])

pywikibot.data.api.APIError: badtoken: <strong>Sorry! We could not process your
edit due to a loss of session data.</strong>
Please try again.
If it still does not work, try [[Special:UserLogout|logging out]] and logging ba
ck in.
<class 'pywikibot.data.api.APIError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

Marking this as a bug. The bot shouldn't crash on this.

We have a changeset pending to overhaul token management in site.py
https://gerrit.wikimedia.org/r/#/c/139372/

It adds caching of tokens so, with badtoken now appearing more regularly, the cache needs better management of how long these tokens are useful for.

data/api.py:
good: we can also handle other types of token problems
bad: edit tokens also serve to detect edit conflicts, and we cannot handle
those at the data/api.py level...

Maybe I'm wrong but edit conflict hasn't been detected by tokens, it has been detected by basetimestamp in mediawiki [1] and if edit conflict happens it raises editconflict error not badtoken error. See the error table.
[1]: https://www.mediawiki.org/wiki/API:Edit

If we want to avoid undetected edit conflicts the only thing we need to do is adding basetimestamp to action=edit api calls.

(In reply to John Mark Vandenberg from comment #4)

We have a changeset pending to overhaul token management in site.py
https://gerrit.wikimedia.org/r/#/c/139372/

It adds caching of tokens so, with badtoken now appearing more regularly,
the cache needs better management of how long these tokens are useful for.

The entire problem is /caching/ the tokens. They are not valid for a fixed time, they are valid of /one edit/. Basically it's a race condition, so there's two options:

  1. the 'nice' way: implement locking. Requires some sort of interprocess communication,
  1. the 'hacky' way: reduce the prevalence of the condition (by reducing the time between getting a token and using it), and retrying -- effectively using the remote MW instance as lock.

(In reply to Amir Ladsgroup from comment #5)

Maybe I'm wrong but edit conflict hasn't been detected by tokens, it has
been detected by basetimestamp in mediawiki [1] and if edit conflict happens
it raises editconflict error not badtoken error. See the error table.
[1]: https://www.mediawiki.org/wiki/API:Edit

Yes, you are right. So we can just implement this at the api.php level.

(In reply to Merlijn van Deen from comment #6)

The entire problem is /caching/ the tokens. They are not valid for a fixed
time, they are valid of /one edit/.

https://lists.wikimedia.org/pipermail/mediawiki-api-announce/2014-August/000063.html

«All tokens may be cached as long as the session is valid; none are
dependent on factors such as the page being edited or the user being
targeted.»

And some of them are always the same (e.g. editToken & protectToken). They will be merged with the change announced above.

However, since we want to be able to work with multiple account on the same wiki, we need better caching.

There is another patch going through review, which will help organise the framework for this.

https://gerrit.wikimedia.org/r/#/c/159394/

Adding bug 35925 to proper track this issue (is causing issues in a Wikisource specific gadget. More on the issue: https://fr.wikisource.org/w/index.php?oldid=4780982#Match_.26_Split. Further info related to the gadget: https://en.wikisource.org/wiki/Help:Match_and_split)

This (today) is the first time I have remembered it appearing in travis builds:
https://travis-ci.org/wikimedia/pywikibot-core/jobs/40487338

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

Do you think this is a good candidate for the next GSoC/Outreachy round?

Also, is this really a High priority task?

Mentioning here for reference, the recent change seems to have caused T89702: Edits fail with "badtoken: Invalid token" after script runs for a while.
Village pump (technical)#How long does it take for a session timeout?:

The test fetching the edit token at the beginning and end of the run should certainly have not given you the same token each time; even fetching two tokens one second apart should give you two different tokens (but both valid) since the end of October. Since you got the same token hours apart, whatever you're using to fetch the tokens is apparently caching them rather than fetching a new one the second time.
Ideally, your bot should be able to attempt the edit, and if it gets a badtoken error it should automatically fetch a fresh token and retry the edit.
As for the time it takes, the configuration is currently using the default of 1 hour for $wgObjectCacheSessionExpiry. I don't see any recent changes to the timeout, but I do see there was this recent change to session handling that probably made page views no longer reset the session timer. Anomie⚔ 13:05, 3 March 2015 (UTC)

I'm wondering if we should ditch the token cache. They are only valid for a certain time (which we probably can't determine from the outside) and we also don't handle different sysop and non-sysop tokens (though if we ditch that difference as well this is not such of a big concern).

This is also a considerable problem for me. My bot operates with a relatively slow edit rate (once every few hours) and it's hitting the badtoken exception on every edit. I can work around it with a wrapper to check for a badtoken error and retry the edit, but it's silly that the application should have to implement that.

I'm still digging through the pywikibot code to find the best place to fix it, but I'm inclined to agree that if it's too difficult to cache tokens reliably then they shouldn't be cached at all. Better to take an efficiency hit than to compromise functionality.

One quick work around would be to do my_site.tokens._tokens.clear() which removes all cached tokens. It's not pretty but until there is a sensible fix that should work.

Maybe @Ricordisamoa with T78393: Load token types needed for each API module from the API can help there. When a badtoken error happens their patch can determine which tokens need to be invalidated. As long as T78393 is not fixed the handler could just invalidate all tokens or it could invalidate all tokens which appear as values in the parameters (which would work as good as the patch for T78393).

Change 201967 had a related patch set uploaded (by XZise):
[FEAT] Request: Recover from bad tokens

https://gerrit.wikimedia.org/r/201967

fwiw, badtoken moved from ApiDelete to ApiBase in v1.12 70b5fdd2 , and the keyword to search the source is 'sessionfailure' . The checking of tokens moved to ApiMain in 1.24 fdddf945 .

A question to people who got this error: after you got such a "badtoken" error, have you ever tried to just resubmit the edit using the same edittoken? Because in my code (which does not use the Pywikibot framework) I also encounter the "badtoken" error once in a while. If I then just resubmit the same edit again (or another edit) _w/o getting a new edit token_, the edit suddenly gets accepted.

Change 201967 merged by jenkins-bot:
[FEAT] Request: Recover from bad tokens

https://gerrit.wikimedia.org/r/201967

This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

jayvdb assigned this task to XZise.

There may be some corner cases still causing badtoken, but the main cases were solved in April 2015