Page MenuHomePhabricator

The corto service fails to start after the alert hosts failover
Closed, ResolvedPublic

Description

While working on T372418 I noticed that after failing over to the new Alert* instances the corto systemd service is failing with the following message:

× corto.service - Assist SREs during incidents
     Loaded: loaded (/lib/systemd/system/corto.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Wed 2024-09-18 20:57:51 UTC; 19min ago
   Duration: 105ms
    Process: 2735204 ExecStart=/usr/bin/corto --config /etc/corto/config.yaml (code=exited, status=1/FAILURE)
   Main PID: 2735204 (code=exited, status=1/FAILURE)
        CPU: 83ms

Sep 18 20:57:51 alert1002 systemd[1]: Started corto.service - Assist SREs during incidents.
Sep 18 20:57:51 alert1002 corto[2735204]: Unable to create Phabricator client: unable to connect to https://phabricator.wmcloud.org: invalid character '<' looking for beginning of value
Sep 18 20:57:51 alert1002 systemd[1]: corto.service: Main process exited, code=exited, status=1/FAILURE
Sep 18 20:57:51 alert1002 systemd[1]: corto.service: Failed with result 'exit-code'.

Event Timeline

I'm guessing this is due to some sort of access control. Are there any IP restrictions set in Conduit?

Hi @BCornwall , I'm not familiar with the Corto service and I was unable to find documentation regarding it and it's usage on Wikitech.
What is Conduit?

I tried the following to debug the issue:

  1. Accessing Phabricator with curl: curl -I https://phabricator.wmcloud.org
https://phabricator.wmcloud.org
HTTP/2 200
server: nginx/1.18.0
date: Mon, 23 Sep 2024 18:55:25 GMT
content-type: text/html; charset=UTF-8
x-powered-by: PHP/7.4.33
backend-timing: D=20526 t=1727117725412588
strict-transport-security: max-age=31622400
x-clacks-overhead: GNU Terry Pratchett
permissions-policy: browsing-topics=()
  1. Verify that the Phabricator API Token is valid and has the required permissions with curl -H "Authorization: Bearer {{TOKEN}}" https://phabricator.wmcloud.org/api/project.query:
<br />
<b>Warning</b>:  mysqli::__construct(): (HY000/1698): Access denied for user 'app_user'@'localhost' in <b>/srv/deployment/phabricator/deployment-cache/revs/89f50144d5cfc66257fc2d23e6dba189539d21ef/phabricator/support/preamble.php</b> on line <b>31</b><br />
  1. Test the token using a different endpoint with curl "https://phabricator.wmcloud.org/api/user.whoami" -d api.token={{TOKEN}}:
<br />
<b>Warning</b>:  mysqli::__construct(): (HY000/1698): Access denied for user 'app_user'@'localhost' in <b>/srv/deployment/phabricator/deployment-cache/revs/89f50144d5cfc66257fc2d23e6dba189539d21ef/phabricator/support/preamble.php</b> on line <b>31</b><br />
  1. Accessing the Phabricator URL from my web browser:
Warning: mysqli::__construct(): (HY000/1698): Access denied for user 'app_user'@'localhost' in /srv/deployment/phabricator/deployment-cache/revs/89f50144d5cfc66257fc2d23e6dba189539d21ef/phabricator/support/preamble.php on line 31

After those tests I'm wondering if the issue may reside in the WMCloud Phabricator instance and not in the corto service itself. What do you think?

The corto service is now operational. I was able to successfully access the Phabricator WMCloud instance via web browser, confirming that it is also functioning correctly.

This suggests that the issue was caused by the WMCloud Phabricator instance's inability to connect to its database, rather than any problem with Corto itself.

Marking this as resolved.