Page MenuHomePhabricator

install a service on phab1005
Open, MediumPublic

Description

phab1005 (previously gerrit1004 T372817) is new hardware that was requested in T354688, procured in T368918 and racked in T369671.

But it's still just using the "insetup" role and needs a real service (likely phabricator:)) installed on it.

Details

Other Assignee
Dzahn
Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+1 -2
operations/puppetproduction+3 -3
operations/puppetproduction+1 -1
operations/puppetproduction+4 -1
labs/privatemaster+4 -0
operations/puppetproduction+13 -5
operations/puppetproduction+2 -1
operations/puppetproduction+85 -0
operations/puppetproduction+5 -0
operations/puppetproduction+19 -0
operations/puppetproduction+8 -0
operations/puppetproduction+7 -7
operations/puppetproduction+7 -7
operations/puppetproduction+1 -0
operations/puppetproduction+41 -4
operations/puppetproduction+1 -1
operations/puppetproduction+38 -1
operations/puppetproduction+2 -0
operations/puppetproduction+4 -0
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedFeatureAklapper
ResolvedFeatureAklapper
ResolvedFeatureAklapper
OpenNone
Resolvedvalerio.bozzolan
ResolvedBUG REPORTvalerio.bozzolan
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedFeatureAklapper
ResolvedAklapper
ResolvedFeatureAklapper
ResolvedBUG REPORTAklapper
Resolvedbrennen
StalledNone
OpenNone
ResolvedJclark-ctr
OpenNone
ResolvedDzahn
ResolvedMarostegui
ResolvedABran-WMF
ResolvedDzahn

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
[phab1005:~] $ sudo /usr/local/bin/bootstrap-scap-target.sh deploy1003.eqiad.wmnet /var/lib/scap
ERROR: Scap distribution dir "/root/bookworm" is missing. Maybe this is a primary deploy server? Please check usage. Aborting
Dzahn changed the task status from Open to Stalled.Apr 21 2025, 5:31 PM

I found this comment in code the scap puppet class:

if $enable_bootstrapping and !$is_master {
    # This dir needs to match the home of the user defined in class scap::user
    $scap_home = '/var/lib/scap'

While the command that puppet tried to execute was:

sudo /usr/local/bin/bootstrap-scap-target.sh deploy1003.eqiad.wmnet /var/lib/scap

This was built from:

"/usr/local/bin/bootstrap-scap-target.sh ${deployment_server} ${scap_home}

But the $scap_home here is NOT /var/lib/scap, the home of the deploy user is /var/lib/phab-deploy.

grep phab /etc/passwd
phab-deploy:x:498:498::/var/lib/phab-deploy:/bin/bash

So the summary is that scap hard codes the $scap_home and this breaks scap bootstrapping for any machine where the deploy user isn't using the standard home.

And the comment shows that it is known it has to match.

Deployment servers are apparently not affected because # Deployment servers/masters are bootstrapped in profile::mediawiki::deployment::server.

I manually ran the command with the right scap home:

root@phab1005:/home/dzahn# sudo -u phab-deploy /usr/local/bin/bootstrap-scap-target.sh deploy1003.eqiad.wmnet /var/lib/phab-deploy
Processing ./bookworm/pip-25.0.1-py3-none-any.whl
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-25.0.1
Processing ./bookworm/annotated_types-0.7.0-py3-none-any.whl
Processing ./bookworm/anyio-4.9.0-py3-none-any.whl
Processing ./bookworm/certifi-2022.9.24-py3-none-any.whl
Processing ./bookworm/chardet-5.2.0-py3-none-any.whl
Processing ./bookworm/charset_normalizer-2.1.1-py3-none-any.whl
Processing ./bookworm/click-8.1.8-py3-none-any.whl
Processing ./bookworm/dnspython-2.7.0-py3-none-any.whl
Processing ./bookworm/email_validator-2.2.0-py3-none-any.whl
Processing ./bookworm/exceptiongroup-1.2.2-py3-none-any.whl
Processing ./bookworm/fastapi-0.115.4-py3-none-any.whl
Processing ./bookworm/fastapi_cli-0.0.7-py3-none-any.whl
Processing ./bookworm/greenlet-3.0.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Processing ./bookworm/h11-0.14.0-py3-none-any.whl
Processing ./bookworm/httpcore-1.0.7-py3-none-any.whl
Processing ./bookworm/httptools-0.6.4-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./bookworm/httpx-0.28.1-py3-none-any.whl
Processing ./bookworm/idna-2.8-py2.py3-none-any.whl
Processing ./bookworm/itsdangerous-2.2.0-py3-none-any.whl
Processing ./bookworm/Jinja2-2.11.2-py2.py3-none-any.whl
Processing ./bookworm/lxml-5.3.2-cp311-cp311-manylinux_2_28_x86_64.whl
Processing ./bookworm/markdown_it_py-3.0.0-py3-none-any.whl
Processing ./bookworm/markupsafe-1.1.0-py3-none-any.whl
Processing ./bookworm/mdurl-0.1.2-py3-none-any.whl
Processing ./bookworm/packaging-24.2-py3-none-any.whl
Processing ./bookworm/pip-25.0.1-py3-none-any.whl
Processing ./bookworm/prettytable-3.7.0-py3-none-any.whl
Processing ./bookworm/pydantic-2.10.6-py3-none-any.whl
Processing ./bookworm/pydantic_core-2.27.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./bookworm/pygments-2.17.2-py3-none-any.whl
Processing ./bookworm/PyJWT-2.10.1-py3-none-any.whl
Processing ./bookworm/pyotp-2.9.0-py3-none-any.whl
Processing ./bookworm/pyparsing-3.0.9-py3-none-any.whl
Processing ./bookworm/python_cas-1.6.0-py2.py3-none-any.whl
Processing ./bookworm/python_dotenv-1.1.0-py3-none-any.whl
Processing ./bookworm/python_multipart-0.0.20-py3-none-any.whl
Processing ./bookworm/pyyaml-5.1-cp311-cp311-linux_x86_64.whl
Processing ./bookworm/requests-2.31.0-py3-none-any.whl
Processing ./bookworm/rich-14.0.0-py3-none-any.whl
Processing ./bookworm/rich_toolkit-0.14.1-py3-none-any.whl
Processing ./bookworm/scap-4.153.0-py3-none-any.whl
Processing ./bookworm/setuptools-68.0.0-py3-none-any.whl
Processing ./bookworm/shellingham-1.5.4-py2.py3-none-any.whl
Processing ./bookworm/six-1.17.0-py2.py3-none-any.whl
Processing ./bookworm/sniffio-1.3.1-py3-none-any.whl
Processing ./bookworm/SQLAlchemy-2.0.32-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./bookworm/starlette-0.41.3-py3-none-any.whl
Processing ./bookworm/typer-0.15.2-py3-none-any.whl
Processing ./bookworm/typing_extensions-4.13.1-py3-none-any.whl
Processing ./bookworm/urllib3-2.0.4-py3-none-any.whl
Processing ./bookworm/uvicorn-0.34.0-py3-none-any.whl
Processing ./bookworm/uvloop-0.21.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./bookworm/watchfiles-1.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./bookworm/wcwidth-0.2.5-py2.py3-none-any.whl
Processing ./bookworm/websockets-15.0.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
pip is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
Installing collected packages: wcwidth, websockets, watchfiles, uvloop, uvicorn, urllib3, typing-extensions, typer, starlette, SQLAlchemy, sniffio, six, shellingham, setuptools, scap, rich-toolkit, rich, requests, pyyaml, python-multipart, python-dotenv, python-cas, pyparsing, pyotp, PyJWT, pygments, pydantic-core, pydantic, prettytable, packaging, mdurl, markupsafe, markdown-it-py, lxml, Jinja2, itsdangerous, idna, httpx, httptools, httpcore, h11, greenlet, fastapi-cli, fastapi, exceptiongroup, email-validator, dnspython, click, charset-normalizer, chardet, certifi, anyio, annotated-types
  Attempting uninstall: setuptools
    Found existing installation: setuptools 66.1.1
    Uninstalling setuptools-66.1.1:
      Successfully uninstalled setuptools-66.1.1
Successfully installed Jinja2-2.11.2 PyJWT-2.10.1 SQLAlchemy-2.0.32 annotated-types-0.7.0 anyio-4.9.0 certifi-2022.9.24 chardet-5.2.0 charset-normalizer-2.1.1 click-8.1.8 dnspython-2.7.0 email-validator-2.2.0 exceptiongroup-1.2.2 fastapi-0.115.4 fastapi-cli-0.0.7 greenlet-3.0.3 h11-0.14.0 httpcore-1.0.7 httptools-0.6.4 httpx-0.28.1 idna-2.8 itsdangerous-2.2.0 lxml-5.3.2 markdown-it-py-3.0.0 markupsafe-1.1.0 mdurl-0.1.2 packaging-24.2 prettytable-3.7.0 pydantic-2.10.6 pydantic-core-2.27.2 pygments-2.17.2 pyotp-2.9.0 pyparsing-3.0.9 python-cas-1.6.0 python-dotenv-1.1.0 python-multipart-0.0.20 pyyaml-5.1 requests-2.31.0 rich-14.0.0 rich-toolkit-0.14.1 scap-4.153.0 setuptools-68.0.0 shellingham-1.5.4 six-1.17.0 sniffio-1.3.1 starlette-0.41.3 typer-0.15.2 typing-extensions-4.13.1 urllib3-2.0.4 uvicorn-0.34.0 uvloop-0.21.0 watchfiles-1.0.5 wcwidth-0.2.5 websockets-15.0.1

INFO: Scap for "bookworm" successfully installed at /var/lib/phab-deploy/scap

STILL broken, now with:

Notice: /Stage[main]/Scap/Exec[bootstrap-scap-target]/returns: mv: cannot move '/var/lib/scap/scap' to '/tmp/scap.vht/scap': Permission denied
Error: '/usr/local/bin/bootstrap-scap-target.sh deploy1003.eqiad.wmnet /var/lib/scap' returned 1 instead of one of [0]
Error: /Stage[main]/Scap/Exec[bootstrap-scap-target]/returns: change from 'notrun' to ['0'] failed: '/usr/local/bin/bootstrap-scap-target.sh deploy1003.eqiad.wmnet /var/lib/scap' returned 1 instead of one of [0] (corrective)

There are a bunch of scap dirs in /tmp and they are all scap:scap owned with nobody else having any permissions.

Which tmp dir it tries to use changes on the next puppet run.

4.0K drwx------  2 scap scap 4.0K Apr 21 19:21 scap.0YT
4.0K drwx------  2 scap scap 4.0K Apr 21 19:18 scap.844
4.0K drwx------  2 scap scap 4.0K Apr 21 17:26 scap.F2G
4.0K drwx------  2 scap scap 4.0K Apr 21 17:51 scap.KUV
4.0K drwx------  2 scap scap 4.0K Apr 21 18:22 scap.Trk
4.0K drwx------  2 scap scap 4.0K Apr 21 19:30 scap.vht
4.0K drwx------  2 scap scap 4.0K Apr 21 18:51 scap.WoE
4.0K drwx------  2 scap scap 4.0K Apr 21 17:22 scap.ZA5

Change #1137818 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] scap: stop hardcoding scap user home to fix puppet breakage

https://gerrit.wikimedia.org/r/1137818

Change #1137827 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator/scap: disable scap bootstrapping on phab1005

https://gerrit.wikimedia.org/r/1137827

Change #1137827 merged by Dzahn:

[operations/puppet@production] phabricator/scap: disable scap bootstrapping on phab1005

https://gerrit.wikimedia.org/r/1137827

Change #1137830 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator: comment out scap::target in migration class

https://gerrit.wikimedia.org/r/1137830

Change #1137830 merged by Dzahn:

[operations/puppet@production] phabricator: comment out scap::target in migration class

https://gerrit.wikimedia.org/r/1137830

So @thcipriani figured out what was broken with scap install here, we'll file a followup task to fix that.

scap is installed, we need the credentials for the test db in /etc/phabricator and then I think I should be able to:

  • scap deploy the wmf/stable branch
  • scap deploy the merged upstream and test db migration time

So @thcipriani figured out what was broken with scap install here, we'll file a followup task to fix that.

Cool!:))

we need the credentials for the test db in /etc/phabricator

The password for all databases is in /home/brennen/phab_db_test on phab1005 itself.

The hostname is db1176.eqiad.wmnet.

T390034#10718752

The user names and data base names are like with prod phabricator: phuser, phadmin, phstats, phabricatorphd,..

T390034#10717117

scap is installed, we need the credentials for the test db in /etc/phabricator and then I think I should be able to:

I looked at the code and it would have required quite a few big changes to make puppet do this. So I did it manually.

I created /etc/phabricator/config.yaml that is just like in production, with the same permissions (root:phab-deploy) and the
same contents, except with the different DB host name and password.

Out of an abundance of caution I also replaced the gitlab_api_key so that is not going to be valid.

The base_uri, alternate_file_domain, redirects -> field_index, aphlict_host etc ..are all original values also used in prod.

Wanna take a look as well if you see anything that seems critical to leave at prod values?

Dzahn changed the task status from Stalled to Open.May 28 2025, 10:47 PM
Dzahn removed Dzahn as the assignee of this task.

I looked at the code and it would have required quite a few big changes to make puppet do this. So I did it manually.

Right on - makes sense. I'll see if I can fire off a deploy.

Wanna take a look as well if you see anything that seems critical to leave at prod values?

Looked it over, think it all ought to be ok. Going to see what happens with a scap deploy.

Mentioned in SAL (#wikimedia-operations) [2025-05-28T22:59:31Z] <brennen@deploy1003> Started deploy [phabricator/deployment@99aa712]: test deploy to phab1005 for T377889

Mentioned in SAL (#wikimedia-operations) [2025-05-28T23:03:46Z] <brennen@deploy1003> deploy aborted: test deploy to phab1005 for T377889 (duration: 04m 14s)

Mentioned in SAL (#wikimedia-operations) [2025-05-28T23:03:59Z] <brennen@deploy1003> Started deploy [phabricator/deployment@99aa712]: test deploy to phab1005 for T377889

Mentioned in SAL (#wikimedia-operations) [2025-05-28T23:08:37Z] <brennen@deploy1003> Finished deploy [phabricator/deployment@99aa712]: test deploy to phab1005 for T377889 (duration: 04m 38s)

Well, SSH times out. I'm guessing this is something fairly trivial, although I don't remember what at the moment.

23:03:46 brennen@deploy1003 /srv/deployment/phabricator/deployment (wmf/stable *% u=) $ scap deploy -v -l phab1005.eqiad.wmnet 'test dep
loy to phab1005 for T377889'
23:03:58 Using key: /etc/keyholder.d/phabricator
23:03:58 Running ['git', 'rev-parse', '--show-prefix'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'rev-parse', '--verify', 'HEAD'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'merge-base', 'HEAD', 'origin'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Command exited with code 128
23:03:58 Running ['git', 'rev-parse', '--show-prefix'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'rev-parse', '--verify', 'HEAD'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'show', '-s', '--format=%ct', '99aa71250108713553c6c4942d042049271621b9'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'ls-remote', '--get-url'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Started deploy [phabricator/deployment@99aa712]
23:03:58 Running ['git', 'rev-parse', '--show-prefix'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'tag', '--list', '--sort=-version:refname', 'scap/sync/2025-05-28/*'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'rev-parse', '--show-prefix'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'rev-parse', '--verify', 'HEAD'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Deploying Rev: HEAD = 99aa71250108713553c6c4942d042049271621b9
23:03:58 Prepare config deploy
23:03:58 Config deploy file: /srv/deployment/phabricator/deployment/scap/config-files.yaml
23:03:58 Update DEPLOY_HEAD
23:03:58 Running ['git', 'rev-parse', '--show-prefix'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Creating /srv/deployment/phabricator/deployment/.git/DEPLOY_HEAD
23:03:58 Running ['git', 'tag', '-a', '-muser brennen', '-mtimestamp 2025-05-28T23:03:58.897601', '--', 'scap/sync/2025-05-28/0002', '99aa71250108713553c6c4942d042049271621b9'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'rev-parse', '--show-prefix'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'tag', '-d', 'scap/sync/2024-12-17/0002'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Update server info
23:03:58 Running ['git', 'update-server-info'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:58 Running ['git', 'submodule', 'foreach', '--recursive', 'git update-server-info'] with {'cwd': '/srv/deployment/phabricator/deployment', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
23:03:59 Started deploy [phabricator/deployment@99aa712]: test deploy to phab1005 for T377889
23:03:59 
== DEFAULT ==
:* phab1005.eqiad.wmnet
23:03:59 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'phabricator/deployment', '-g', 'default', 'fetch', '--refresh-config']
23:03:59 phabricator/deployment: fetch stage(s):   0% (in-flight: 1; ok: 0; fail: 0; left: 0) \


23:08:21 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'phabricator/deployment', '-g', 'default', 'fetch', '--refresh-config'] (ran as phab-deploy@phab1005.eqiad.wmnet) returned [255]: OpenSSH_8.4p1 Debian-5+deb11u5, OpenSSL 1.1.1w  11 Sep 2023
debug1: Reading configuration data /dev/null
debug1: Connecting to phab1005.eqiad.wmnet [2620:0:861:102:10:64:16:125] port 22.
debug1: connect to address 2620:0:861:102:10:64:16:125 port 22: Connection timed out
debug1: Connecting to phab1005.eqiad.wmnet [10.64.16.125] port 22.
debug1: connect to address 10.64.16.125 port 22: Connection timed out
ssh: connect to host phab1005.eqiad.wmnet port 22: Connection timed out

Change #1160204 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] Revert "phabricator: comment out scap::target in migration class"

https://gerrit.wikimedia.org/r/1160204

Well, SSH times out.

It's firewalling. scap::target includes scap::ferm which opens the firewall on port 22 for deployment servers.

But scap::target was commented out because it broke puppet, we knew the host wasn't in production yet and we knew about the previous scap bootstrap issues that still had to be solved.

Testing what happens if we include scap::target again at this point.

Change #1160204 merged by Dzahn:

[operations/puppet@production] Revert "phabricator: comment out scap::target in migration class"

https://gerrit.wikimedia.org/r/1160204

Info: Applying configuration version '(53672bf49c) Dzahn - Revert "phabricator: comment out scap::target in migration class"'
Error: Execution of '/usr/bin/scap deploy-local --repo phabricator/deployment -D log_json:False' returned 1: 17:14:53 Fetch from: http://deploy1003.eqiad.wmnet/phabricator/deployment/.git
17:14:54 Update submodules
17:14:54 Updating .gitmodule: /srv/deployment/phabricator/deployment-cache/cache
17:15:12 Checkout rev: f8d7b38df39896e5e2792d70fa04b81f3b6450e3
17:15:12 Updating .gitmodule: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/local.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/local.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/mail.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/mail.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/phd.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/phd.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/vcs.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/vcs.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/www.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/www.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/preamble.php using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/preamble.php: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/redirect_config.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/redirect_config.json: [rendered]
17:15:31 Executing check 'config_deploy'
17:15:31 Check 'config_deploy' failed: /usr/local/sbin/phab_deploy_config_deploy: 8: .: cannot open /etc/phabricator/script-vars: No such file


Error: /Stage[main]/Profile::Phabricator::Migration/Scap::Target[phabricator/deployment]/Package[phabricator/deployment]/ensure: change from 'absent' to 'present' failed: Execution of '/usr/bin/scap deploy-local --repo phabricator/deployment -D log_json:False' returned 1: 17:14:53 Fetch from: http://deploy1003.eqiad.wmnet/phabricator/deployment/.git
17:14:54 Update submodules
17:14:54 Updating .gitmodule: /srv/deployment/phabricator/deployment-cache/cache
17:15:12 Checkout rev: f8d7b38df39896e5e2792d70fa04b81f3b6450e3
17:15:12 Updating .gitmodule: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/local.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/local.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/mail.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/mail.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/phd.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/phd.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/vcs.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/vcs.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/www.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/conf/local/www.json: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/preamble.php using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/preamble.php: [rendered]
17:15:31 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/redirect_config.json using /etc/phabricator/config.yaml
17:15:31 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/redirect_config.json: [rendered]
17:15:31 Executing check 'config_deploy'
17:15:31 Check 'config_deploy' failed: /usr/local/sbin/phab_deploy_config_deploy: 8: .: cannot open /etc/phabricator/script-vars: No such file

BUT also:

Notice: /Stage[main]/Profile::Phabricator::Migration/Scap::Target[phabricator/deployment]/Ssh::Userkey[phab-deploy]/File[/etc/ssh/userkeys/phab-deploy]/ensure: defined content as '{sha256}0965c405d7f4d3b0b470328a0d4cb9e8dc6b4440a6b11608a09caf10016c6d64'
Info: Scap::Target[phabricator/deployment]: Unscheduling all events on Scap::Target[phabricator/deployment]
Notice: /Stage[main]/Scap::Ferm/Firewall::Service[deployment-ssh]/Nftables::Service[deployment-ssh]/File[/etc/nftables/input/10_deployment-ssh.nft]/ensure: defined content as '{sha256}e686be75ea00532efa5deab346ddfb9f59905bac3c3cbdc2e0d84cc7d82ef475'
Info: /Stage[main]/Scap::Ferm/Firewall::Service[deployment-ssh]/Nftables::Service[deployment-ssh]/File[/etc/nftables/input/10_deployment-ssh.nft]: Scheduling refresh of Service[nftables]
Notice: /Stage[main]/Nftables/Systemd::Service[nftables]/Service[nftables]: Triggered 'refresh' from 1 event
Info: Stage[main]: Unscheduling all events on Stage[main]
Notice: Applied catalog in 59.19 seconds

On second puppet run, after that scap::target was created and the phab-deploy SSH key was written.. and the firewall hole was opened.

Still have this though:

17:24:46 Check 'config_deploy' failed: /usr/local/sbin/phab_deploy_config_deploy: 8: .: cannot open /etc/phabricator/script-vars: No such file

Mentioned in SAL (#wikimedia-operations) [2025-06-17T17:27:48Z] <brennen@deploy1003> Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889

Mentioned in SAL (#wikimedia-operations) [2025-06-17T17:28:11Z] <brennen@deploy1003> Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (duration: 00m 23s)

This is like T378769 and T257317 once again, or similar at least.

Change #1160217 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: add /etc/phabricator/script-vars for scap

https://gerrit.wikimedia.org/r/1160217

Change #1160217 merged by Dzahn:

[operations/puppet@production] phabricator::migration: add /etc/phabricator/script-vars for scap

https://gerrit.wikimedia.org/r/1160217

After my latest merge, puppet created that missing /etc/phabricator/script-vars file now.

Now we are at:

fatal: destination path '/srv/deployment/phabricator/deployment-cache/cache' already exists and is not an empty directory.

18:14:40 deploy-local failed: <FailedCommand> Command 'git clone --jobs 46 --config lfs.url=https://gitlab.wikimedia.org/repos/phabricator/deployment.git/info/lfs http://deploy1003.eqiad.wmnet/phabricator/deployment/.git /srv/deployment/phabricator/deployment-cache/cache' failed with exit code 128;

Did a manual rm -rf /srv/deployment/phabricator/deployment-cache/cache and trying again.

After this, the "fatal: destination path .. already exists" part is gone.

Now at:

Error: Execution of '/usr/bin/scap deploy-local --repo phabricator/deployment -D log_json:False' returned 1: 18:17:29 Fetch from: http://deploy1003.eqiad.wmnet/phabricator/deployment/.git

When trying to run this manually.. it's back to the previous issue before.

Mentioned in SAL (#wikimedia-operations) [2025-06-17T18:24:13Z] <brennen@deploy1003> Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)

Mentioned in SAL (#wikimedia-operations) [2025-06-17T18:24:31Z] <brennen@deploy1003> Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 18s)

Deleted that .cache dir one more time. Ran the command manually as phab-deploy user, before puppet ran again..and:

[phab1005:~] $ sudo -u phab-deploy /usr/bin/scap deploy-local --repo phabricator/deployment -D log_json:False
18:23:14 Fetch from: http://deploy1003.eqiad.wmnet/phabricator/deployment/.git
18:23:14 Update submodules
18:23:14 Updating .gitmodule: /srv/deployment/phabricator/deployment-cache/cache
..
18:23:33 Rendering config_file: /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/redirect_config.json using /etc/phabricator/config.yaml
18:23:33 /srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/phabricator/support/redirect_config.json: [unchanged]
18:23:33 Executing check 'config_deploy'
18:23:33 Check 'config_deploy' failed: $SCAP_REV_PATH is not defined.
Note: This script is only intended to run as a scap deploy check

Looking much greener now :)

Change #1160231 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: add missing sudo_defaults file for scap

https://gerrit.wikimedia.org/r/1160231

Change #1160231 merged by Dzahn:

[operations/puppet@production] phabricator::migration: add missing sudo_defaults file for scap

https://gerrit.wikimedia.org/r/1160231

The mssing scap_sudo_defaults file has been created:

Notice: /Stage[main]/Profile::Phabricator::Migration/File[/etc/sudoers.d/scap_sudo_defaults]/ensure

Now the puppet run finishes without any issues :)

@brennen Please try again now.

Mentioned in SAL (#wikimedia-operations) [2025-06-17T19:12:36Z] <brennen@deploy1003> Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)

Mentioned in SAL (#wikimedia-operations) [2025-06-17T19:12:46Z] <brennen@deploy1003> Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 10s)

Much closer:

119:12:46 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'phabricator/deployment', '-g', 'default', 'finalize', '--refresh-config'] (ran as phab-deploy@phab1005.eqiad.wmnet) returned [1]: Registering scripts in directory '/srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/scap/scripts'
2registered script? '/srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/scap/scripts/deploy-perms.sh' True
3Registering scripts in directory '/srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/scap/scripts'
4registered script? '/srv/deployment/phabricator/deployment-cache/revs/f8d7b38df39896e5e2792d70fa04b81f3b6450e3/scap/scripts/deploy-perms.sh' True
5Executing check 'finalize'
6Check 'finalize' failed:
7 ->Running puppet...
8Notice: Skipping run of Puppet configuration client; administratively disabled (Reason: 'phabricator deployment - phab-deploy');
9Use 'puppet agent --enable' to re-enable.
10
11 ->Applying storage migrations
12/usr/local/sbin/phab_deploy_finalize: line 24: /phabricator/bin/storage: No such file or directory
13
14 ->Restarting PHD
15Failed to start phd.service: Unit phd.service not found.
16
17 ->Reloading apache
18Failed to reload apache2.service: Unit apache2.service not found.
19
20 ->Enabling puppet agent
21
22 ->Verifying database status
23
24<13>Jun 17 19:12:46 root: >>>ERROR: Phabricator storage is in a bad state.
25
26
2719:12:46 phabricator/deployment: finalize stage(s): 100% (in-flight: 0; ok: 0; fail: 1; left: 0) |
2819:12:46 1 targets had deploy errors
2919:12:46 1 targets failed
3019:12:46 default deploy successful
3119:12:46 Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 10s)
3219:12:46 Finished deploy [phabricator/deployment@f8d7b38] (duration: 00m 10s)

Mentioned in SAL (#wikimedia-operations) [2025-06-17T21:26:57Z] <brennen@deploy1003> Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)

Mentioned in SAL (#wikimedia-operations) [2025-06-17T21:27:04Z] <brennen@deploy1003> Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 07s)

Mentioned in SAL (#wikimedia-operations) [2025-06-17T21:29:29Z] <brennen@deploy1003> Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)

Mentioned in SAL (#wikimedia-operations) [2025-06-17T21:29:36Z] <brennen@deploy1003> Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 07s)

Change #1160270 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: include phabricator::httpd, add /srv/phab dir

https://gerrit.wikimedia.org/r/1160270

Change #1160270 merged by Dzahn:

[operations/puppet@production] phabricator::migration: include phabricator::httpd, add /srv/phab dir

https://gerrit.wikimedia.org/r/1160270

Change #1160280 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: install PHP

https://gerrit.wikimedia.org/r/1160280

Change #1160280 merged by Dzahn:

[operations/puppet@production] phabricator::migration: install PHP

https://gerrit.wikimedia.org/r/1160280

Mentioned in SAL (#wikimedia-operations) [2025-06-17T22:25:36Z] <brennen@deploy1003> Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)

Mentioned in SAL (#wikimedia-operations) [2025-06-17T22:25:43Z] <brennen@deploy1003> Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 07s)

Mentioned in SAL (#wikimedia-operations) [2025-06-17T22:26:59Z] <brennen@deploy1003> Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)

Change #1160310 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: ensure /srv/phab is the correct symlink

https://gerrit.wikimedia.org/r/1160310

Mentioned in SAL (#wikimedia-operations) [2025-06-17T23:30:03Z] <brennen@deploy1003> Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 63m 03s)

Change #1160310 merged by Dzahn:

[operations/puppet@production] phabricator::migration: ensure /srv/phab is the correct symlink

https://gerrit.wikimedia.org/r/1160310

A scap phab deployment has been working now! :)

I re-enabled puppet after we had applied some manual hacks. (like commenting out apache restart before apache was installed, re-enabling puppet by deployment script, creating missing symlink,)..

The symlink is now puppetized; apache can be restarted, puppet can be enabled and disabled; just one thing is left: variable values in /etc/phabricator/script-vars are not expanded. The file itself is created by puppet from a template.

Also: PHP8.2 is installed..with all the same modules a prod phab host has.. and if this works then Phorge is just fine with bookworm default version for T372619.

Change #1161042 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: add parameters phabdir,storage_user,deploy_user

https://gerrit.wikimedia.org/r/1161042

Change #1161042 merged by Dzahn:

[operations/puppet@production] phabricator::migration: add parameters phabdir,storage_user,deploy_user

https://gerrit.wikimedia.org/r/1161042

Change #1161048 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: puppetize password for testdb in script-vars

https://gerrit.wikimedia.org/r/1161048

Change #1161051 had a related patch set uploaded (by Dzahn; author: Dzahn):

[labs/private@master] add fake password for phab test db admin user

https://gerrit.wikimedia.org/r/1161051

Change #1161051 merged by Dzahn:

[labs/private@master] add fake password for phab test db admin user

https://gerrit.wikimedia.org/r/1161051

Change #1161048 merged by Dzahn:

[operations/puppet@production] phabricator::migration: puppetize password for testdb in script-vars

https://gerrit.wikimedia.org/r/1161048

Change #1161053 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: fix variable name used for testdb storage pass

https://gerrit.wikimedia.org/r/1161053

Change #1161053 merged by Dzahn:

[operations/puppet@production] phabricator::migration: fix variable name used for testdb storage pass

https://gerrit.wikimedia.org/r/1161053

Change #1161059 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: fix /srv/phab symlink, /srv/repos dir

https://gerrit.wikimedia.org/r/1161059

Change #1161059 merged by Dzahn:

[operations/puppet@production] phabricator::migration: fix /srv/phab symlink, /srv/repos dir

https://gerrit.wikimedia.org/r/1161059

The symlink is now puppetized; apache can be restarted, puppet can be enabled and disabled; just one thing is left: variable values in /etc/phabricator/script-vars are not expanded. The file itself is created by puppet from a template.

This is also fixed now. The password for the test db is now properly puppetized in the private repo in it's own class.. without touching prod phab db passwords.

And puppet ensures /srv/phab is the proper symlink to the deployment dir.

Change #1137818 merged by Dzahn:

[operations/puppet@production] scap: stop hardcoding scap user home to fix puppet breakage

https://gerrit.wikimedia.org/r/1137818