Page MenuHomePhabricator

Catchpoint tests failing under Toolforge availability product
Closed, ResolvedPublic

Description

Some were old and I removed them:

grid-start-precise
Labs puppetmaster eqiad
labsdb1002
labsdb1002rw
ToolLabs webservice - lighttpd on precise

Some are failing for unknown reasons:

labsdb1001rw
labsdb1003rw
webservice start and stop
labsdb1005

Event Timeline

chasemp added a subscriber: madhuvishy.

@madhuvishy is it possible the labsdb* failing tests relates back to the rewrite for account handling? I'm wondering if these checks use creds that got clobbered.

I deactivated the failing tests for now until we debug them.

Change 381885 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] toolschecker: Fix labsdb1005 test to use a current DB

https://gerrit.wikimedia.org/r/381885

For the labsdb1001 & 1003 tests, the error was:

root@tools-checker-01:~# curl localhost/labsdb/labsdb1001rw ; echo
Caught exception: (1030, u'Got error 176 "Read page with wrong checksum" from storage engine Aria')
root@tools-checker-01:~# curl localhost/labsdb/labsdb1003rw ; echo
Caught exception: (1030, u'Got error 176 "Read page with wrong checksum" from storage engine Aria')

It looks like the server crashed at some point, and with the Aria engine, we need to manually go repair it. This wouldn't happen if the engine is InnoDB. I fixed these tests by doing repair table test; and alter table test engine=InnoDB; on the toolschecker specific databases on labsdb1001 and 1003.

Change 381885 merged by Madhuvishy:
[operations/puppet@production] toolschecker: Fix labsdb1005 test to use a current DB

https://gerrit.wikimedia.org/r/381885

Change 381887 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] toolschecker: Fix sudo options in webservice start command

https://gerrit.wikimedia.org/r/381887

Change 381887 merged by Madhuvishy:
[operations/puppet@production] toolschecker: Fix sudo options in webservice start command

https://gerrit.wikimedia.org/r/381887

Change 381889 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] toolschecker: Fix sudo options typo in more places

https://gerrit.wikimedia.org/r/381889

Change 381889 merged by Madhuvishy:
[operations/puppet@production] toolschecker: Fix sudo options typo in more places

https://gerrit.wikimedia.org/r/381889

Change 381891 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] toolschecker: Remove uwsgi-python check from grid webservice test

https://gerrit.wikimedia.org/r/381891

Change 381891 merged by Madhuvishy:
[operations/puppet@production] toolschecker: Remove uwsgi-python check from grid webservice test

https://gerrit.wikimedia.org/r/381891

The webservice tests should be fixed too! I'll let @chasemp verify and resolve this.

looks good now to me, thank you!