Page MenuHomePhabricator

Can not log in, log out, or save edits to the beta cluster (session failures)
Closed, ResolvedPublic

Description

There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Please resubmit the form.

beta.png (2×1 px, 1 MB)

Event Timeline

bd808 renamed this task from Can not log in to the beta cluster to Can not log in, log out, or save edits to the beta cluster (session failures).Nov 30 2022, 6:31 PM
bd808 added subscribers: Aklapper, Urbanecm, Daimona, Tigerzeng.

From deployment-mwlog01.deployment-prep.eqiad1.wikimedia.cloud:/srv/mw-logobjectcache.log

2022-11-30 18:37:45.509493 [157d7dc9746d33541496eca2] deployment-mwmaint02 jawiki 1.40.0-alpha objectcache ERROR: DBError: Cannot access the database: Access denied for user 'wikiadmin'@'172.16.%' to database 'mainstash' (172.16.0.238:3306)

There do not seem to be any MediaWiki user grants for the "mainstash" database. Is this a new thing that is only partially setup?

root@BETA[mysql]> select host, db from db where user = 'wikiadmin';
+----------+-------------+
| host     | db          |
+----------+-------------+
| 10.%     | %wik%       |
| 10.%     | centralauth |
| 172.16.% | %wik%       |
| 172.16.% | centralauth |
+----------+-------------+
4 rows in set (0.001 sec)

root@BETA[mysql]> select host, db from db where user = 'wikiuser';
+----------+-------------+
| host     | db          |
+----------+-------------+
| 10.%     | %a%         |
| 10.%     | %wik%       |
| 10.%     | centralauth |
| 172.16.% | %a%         |
| 172.16.% | %wik%       |
+----------+-------------+
5 rows in set (0.001 sec)

There do not seem to be any MediaWiki user grants for the "mainstash" database. Is this a new thing that is only partially setup?

The GRANT SELECT, INSERT, UPDATE, DELETE ON '%a%'.* TO 'wikiuser'@'172.16.%' grant covers "mainstash".

Eevans claimed this task.
Eevans added a project: Cassandra.
Eevans subscribed.

This was my bad™, a misconfiguration of the sessionstore VM (profile::java::java_packages not set correctly) caused Cassandra to be down.

This is happening again.

Cassandra was down, OOM-killed by the kernel. I've resized the VM (from 2G to 4G memory) and the service is back up. The VM was (is!) very memory constrained, but nonetheless I wouldn't have expected this, so hopefully there isn't something else going on. I'll continue to monitor it.

Eevans triaged this task as High priority.Dec 6 2022, 7:34 PM

Looks good; (Re)closing.

Silvan_WMDE subscribed.

Looks like the issue is back since 13.01.2023, 18:03:00 - WikibaseLexeme selenium tests are failing due to this.

Looks like the issue is back since 13.01.2023, 18:03:00 - WikibaseLexeme selenium tests are failing due to this.

It seems as though Cassandra continues to be killed by the kernel (OOM), so I've opened T327521 to look into this more properly. What seems to be different this time is that after 17 instances of Cassandra being killed, and 17 subsequent restarts by puppet-agent (since December 6), Kask failed to recover due to errors associated with its connection pool. I've opened T327524 to track this.

It's back up now; I'll try to keep an eye on it, but if you notice any more failures, let me know.

noarave reopened this task as Open.EditedFeb 13 2023, 9:40 AM
noarave subscribed.

This seems to be happening again - the WikibaseLexeme selenium tests are failing to login again.
We also can't seem to be able to log in to Beta at the moment.

Logins are currently working, hence resolving this task.