Page MenuHomePhabricator

Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session`
Closed, ResolvedPublic

Description

On 2020-01-18 (build 525), selenium-daily-beta-MediaWiki (targeting https://en.wikipedia.beta.wmflabs.org) started failing with error message:

There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Please resubmit the form.

On 2020-01-20 (build 199), selenium-daily-betacommons-MediaWiki (targeting https://commons.wikimedia.beta.wmflabs.org) started failing too.

@Cparle and I have tried to log in to https://en.wikipedia.beta.wmflabs.org manually several times, and it works sometimes, sometimes we get the above error message.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
zeljkofilipin renamed this task from selenium-daily-beta-MediaWiki fails with `There seems to be a problem with your login sesion` to Login to en.wikipedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login sesion`.Jan 20 2020, 10:35 AM
zeljkofilipin updated the task description. (Show Details)

I'm not sure if it's related, but I just got The provided authentication token is either expired or invalid. error message after logging in.

I've tried to log in/out a few more times, and now I got: No active login attempt is in progress for your session.

I'm not sure if it's related, but I just got The provided authentication token is either expired or invalid. error message after logging in.

+1 - also just ran into this one

zeljkofilipin renamed this task from Login to en.wikipedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login sesion` to Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login sesion`.Jan 22 2020, 10:41 AM
zeljkofilipin updated the task description. (Show Details)

I'm not sure if it's related, but I just got The provided authentication token is either expired or invalid. error message after logging in.

+1 selenium-daily-beta-TwoColConflict seems to fail due to that also since 2020-01-20. ( at least I get the errors when I try to reproduce the issue targeting the beta cluster )

https://integration.wikimedia.org/ci/job/selenium-daily-beta-TwoColConflict/114/

zeljkofilipin renamed this task from Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login sesion` to Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session`.Jan 22 2020, 5:08 PM
Eevans lowered the priority of this task from Unbreak Now! to Medium.Jan 22 2020, 11:23 PM

It looks like Cassandra queries from Kask have been intermittently timing out. Both Kask and Cassandra are co-located on the same VM, and it is pretty resource constrained, but AFAIK it has been working OK to this point; We can probably begin with a restart and go from there

In the meantime, I'm going to use this as an opportunity to see if Kask can be made more resilient to these sorts of timeouts, and/or better report on them.

It seems to be working OK at the moment so I'm going to downgrade the priority. Feel free to bump it back up if something changes.

The default timeouts in the Cassandra Go driver, both Timeout and ConnectTimeout are 600ms. This seems quite low, by comparison the Java and NodeJS drivers both use 12s and 5s respectively. I propose we make these values configurable in Kask (with defaults of 12s and 5s).

Change 566912 had a related patch set uploaded (by Eevans; owner: Eevans):
[mediawiki/services/kask@master] Configurable query and connect timeouts

https://gerrit.wikimedia.org/r/566912

Change 566912 merged by jenkins-bot:
[mediawiki/services/kask@master] Configurable query and connect timeouts

https://gerrit.wikimedia.org/r/566912

Kask has been updated with higher (default) Cassandra timeouts, and deployment-prep has been updated. I'm going to close this, feel free to re-open if this happens again.

The job is failing again (535) with the same error message (png, mp4).

Sooo this issue is still responsible for the Two-Column-Edit-Conflict-Merge daily selenium test failing every day for some time (125). The build artifacts are not very helpful, but when I run the tests from my local machine facing beta I still get a couple of login errors and lost sessions at random points during the test run. - Is anyone working on this or feels responsible at least?

Looking at the tests linked by @zeljkofilipin that do not fail every day, I could add that the Two-Column-Edit-Conflict-Merge tests create an additional account in the background that uses the MWBot to login and edit a page. The tests also need to login once for every spec to enable the beta feature of the extension.

With a successfully logged in users @Jakob_WMDE and I found that we were intermittently regularly getting CSRF token failures for just minted tokens that should have been valid. Sounds like it could be directly related to this (session storage) too.

I just got the error message while trying to log in to a local mediawiki-vagrant mediawiki instance.

FWIW, this is not happening on a local, non-vagrant, MediaWiki installation, using the current MW core master.

Not happening on my local vagrant either :/ At least not often enough that I've noticed

I don't think this is sessionstore, (at least, it's not the timeout issue with Cassandra that we saw before).

I just got the error message while trying to log in to a local mediawiki-vagrant mediawiki instance.

Is mediawiki-vagrant even setup to use sessionstore for sessions?

@Eevans I don't know much about MediaWiki-Vagrant internals. I've been useing MediaWiki-Vagrant for years and I don't think I've ever seen that error. I've assumed it's the same problem, because the error message was the same. If nobody else has that problem, maybe it's something specific to my machine.

@Eevans I don't know much about MediaWiki-Vagrant internals. I've been useing MediaWiki-Vagrant for years and I don't think I've ever seen that error. I've assumed it's the same problem, because the error message was the same. If nobody else has that problem, maybe it's something specific to my machine.

Would probably good to know which roles you used on your local environment when this happened to you locally. It's probably an issue specific to a configuration.

I don't remember which roles I had enabled at the time, sorry. I've either had no roles, or minerva, or templatewizard, because that's what I've been working on yesterday. The error happened only once.

Pywikibot tests are failing due to this issue too. Since February 10, 2020 login to en.wikisource.beta.wmflabs.org fails all the time. The issue might be older and also might sometimes happen in production wikis, see T224712 for more details.

With a successfully logged in users @Jakob_WMDE and I found that we were intermittently regularly getting CSRF token failures for just minted tokens that should have been valid. Sounds like it could be directly related to this (session storage) too.

I think this might have something to do with CSRF/login tokens

This comment was removed by Dvorapa.

Is this correct? It doesn't seem so
en.wikipedia.beta.wmflabs.org


en.wikisource.beta.wmflabs.org

I think that did the trick. I just successfully logged out of Beta Commons.

Yes, Beta Wikisource login works again, either through API, or through web interface!

Yeah, log in to Commons beta is possible using the Commons Android app (beta flavour). Thanks for fixing this 😄

I am still getting this issue but while running with Cypress, is this still an issue or does it have something to do with protocols since Selenium tests run fine?

I think it's broken again, I can't login anywhere on the beta cluster...

never mind, fixed by restarting cassandra.