Page MenuHomePhabricator

Make PHD run on the backup phabricator server (phab2001, currently)
Closed, DeclinedPublic

Description

What needs to happen:

  • make sure that the database connection works
  • make sure outgoing email works
  • test run phd
  • make puppet run phd (currently disabled by check for active server via hiera setting)

https://secure.phabricator.com/book/phabricator/article/cluster_daemons/

Event Timeline

Change 536669 had a related patch set uploaded (by 20after4; owner: 20after4):
[operations/puppet@production] Phabricator: Make a separate hiera option to ensure phd stopped/running

https://gerrit.wikimedia.org/r/536669

Change 536669 merged by Dzahn:
[operations/puppet@production] Phabricator: Make a separate hiera option to ensure phd stopped/running

https://gerrit.wikimedia.org/r/536669

I am removing the DBA tag from here as there is nothing for us to do. I will remain subscribed to the task though just in case you need me to check the database connection.

thcipriani triaged this task as Medium priority.Sep 18 2019, 2:28 PM
thcipriani subscribed.

Patches are moving, looks like this is currently in progress, doing the workboard fiddling accordingly

mmodell reopened this task as Open.
mmodell updated the task description. (Show Details)

Change 549906 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] phabricator: enable phd service on phab2001 and monitor it

https://gerrit.wikimedia.org/r/549906

Whether phd is running is actually not directly controlled by the "active_server" setting. Monitoring for it is though.

The former is controlled by "profile::phabricator::main::phd_service_ensure" in separate hosts files, one each for phab1001, phab1003 and phab2001.

Then the monitoring part says "don't monitor if not active_server" though. See change above. So that would enable it on 2001 but also cause an alert on 1001.

Let's change the monitoring to also use "profile::phabricator::main::phd_service_ensure" to decide whether to monitor the process and forget the active_server setting.

Change 549913 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] phabricator: monitor the PHD process if PHD is set to running in Hiera

https://gerrit.wikimedia.org/r/549913

Change 549913 merged by Dzahn:
[operations/puppet@production] phabricator: monitor the PHD process if PHD is set to running in Hiera

https://gerrit.wikimedia.org/r/549913

Change 549906 merged by Dzahn:
[operations/puppet@production] phabricator: enable phd service on phab2001

https://gerrit.wikimedia.org/r/549906

Change 551943 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] Revert "phabricator: enable phd service on phab2001"

https://gerrit.wikimedia.org/r/551943

Change 551943 merged by Dzahn:
[operations/puppet@production] Revert "phabricator: enable phd service on phab2001"

https://gerrit.wikimedia.org/r/551943

  • It's now possible to start the phd service with a a change like this (separate from the "active_server" setting mentioned above.
  • When i merged and puppet tried to start phd it got this error below with "The MariaDB server is running with the --read-only option ... so I reverted.

1Nov 19 22:54:38 phab2001 systemd[1]: Starting phabricator-phd...
2Nov 19 22:54:38 phab2001 phd[10326]: Freeing active task leases...
3Nov 19 22:54:38 phab2001 phd[10326]: [2019-11-19 22:54:38] EXCEPTION: (AphrontQueryException) #1290: The MariaDB server is running with the --read-only option so it cannot execute this statement at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:386]
4Nov 19 22:54:38 phab2001 phd[10326]: arcanist(), ava(), phabricator(), phutil(), security(), sprint(), translations(), wmf-ext-misc()
5Nov 19 22:54:38 phab2001 phd[10326]: #0 AphrontBaseMySQLDatabaseConnection::throwQueryCodeException(integer, string) called at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:320]
6Nov 19 22:54:38 phab2001 phd[10326]: #1 AphrontBaseMySQLDatabaseConnection::throwQueryException(mysqli) called at [<phabricator>/src/infrastructure/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:216]
7Nov 19 22:54:38 phab2001 phd[10326]: #2 AphrontBaseMySQLDatabaseConnection::executeQuery(PhutilQueryString) called at [<phabricator>/src/infrastructure/storage/xsprintf/queryfx.php:8]
8Nov 19 22:54:38 phab2001 phd[10326]: #3 queryfx(AphrontMySQLiDatabaseConnection, string, string) called at [<phabricator>/src/applications/daemon/management/PhabricatorDaemonManagementWorkflow.php:522]
9Nov 19 22:54:38 phab2001 phd[10326]: #4 PhabricatorDaemonManagementWorkflow::freeActiveLeases() called at [<phabricator>/src/applications/daemon/management/PhabricatorDaemonManagementWorkflow.php:306]
10Nov 19 22:54:38 phab2001 phd[10326]: #5 PhabricatorDaemonManagementWorkflow::executeStartCommand(array) called at [<phabricator>/src/applications/daemon/management/PhabricatorDaemonManagementStartWorkflow.php:37]
11Nov 19 22:54:38 phab2001 phd[10326]: #6 PhabricatorDaemonManagementStartWorkflow::execute(PhutilArgumentParser) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:457]
12Nov 19 22:54:38 phab2001 phd[10326]: #7 PhutilArgumentParser::parseWorkflowsFull(array) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:349]
13Nov 19 22:54:38 phab2001 phd[10326]: #8 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/daemon/manage_daemons.php:24]
14Nov 19 22:54:38 phab2001 systemd[1]: phd.service: Control process exited, code=exited, status=255/EXCEPTION
15Nov 19 22:54:38 phab2001 systemd[1]: phd.service: Failed with result 'exit-code'.
16Nov 19 22:54:38 phab2001 systemd[1]: Failed to start phabricator-phd.
17
18Error: /Stage[main]/Phabricator/Systemd::Service[phd]/Service[phd]/ensure: change from 'stopped' to 'running' failed: Systemd start for phd failed!
19journalctl log for phd:

  • When i merged and puppet tried to start phd it got this error below with "The MariaDB server is running with the --read-only option ... so I reverted.

I mentioned in the patchset that the databases in codfw are on RO mode, so that is sort of expected.

Yes, it was. But it was not expected that phd fails to start entirely with a readonly DB.

@20after4 per recent talks we had.. is it still our goal?

Dzahn changed the task status from Open to Stalled.Dec 20 2019, 4:20 AM

@Dzahn: As long as repositories are not clustered (using phabricator's clustering) I think that running daemons on the backup host will be problematic. It's been tested and it works, that's the important part for disaster recovery / high availability.