Page MenuHomePhabricator

Trixie bacula-fd package incompatible with our bacula installation
Closed, ResolvedPublic

Description

Client: bacula-fd (15.0.3-3)
Server: bacula-sd (9.6.7-7)

09-Sep 13:36 backup1014.eqiad.wmnet JobId 650230: Start Backup JobId 650230, Job=people2004.codfw.wmnet-Monthly-1st-Thu-productionEqiad-srv-org-wikimedia.2025-09-09_13.35.09_42
09-Sep 13:37 backup1014.eqiad.wmnet JobId 650230: Using Device "FileStorageProductionEqiad" to write.
09-Sep 13:37 people2004.codfw.wmnet-fd JobId 650230: Fatal error: hello.c:191 Bad caps from SD: auth cram-md5 <1671439558.1757425044@backup1009.eqiad.wmnet-fd> ssl=2
.
09-Sep 13:37 backup1014.eqiad.wmnet JobId 650230: Fatal error: Bad response to Storage command: wanted 2000 OK storage
, got 2800 End Job TermCode=102 JobFiles=0 ReadBytes=0 JobBytes=0 Errors=1 VSS=0 Encrypt=1 CommBytes=133 CompressCommBytes=133

09-Sep 13:37 backup1014.eqiad.wmnet JobId 650230: Fatal error: No Job status returned from FD.
09-Sep 13:37 backup1014.eqiad.wmnet JobId 650230: Error: Bacula backup1014.eqiad.wmnet 9.6.7 (10Dec20):
  Build OS:               x86_64-pc-linux-gnu debian bookworm/sid
  JobId:                  650230
  Job:                    people2004.codfw.wmnet-Monthly-1st-Thu-productionEqiad-srv-org-wikimedia.2025-09-09_13.35.09_42
  Backup Level:           Full
  Client:                 "people2004.codfw.wmnet-fd" 15.0.3 (25Mar25) x86_64-pc-linux-gnu,debian,trixie/sid;bpipe(2)
  FileSet:                "srv-org-wikimedia" 2013-08-27 22:09:41
  Pool:                   "productionEqiad" (From Job resource)
  Catalog:                "production" (From Client resource)
  Storage:                "backup1009-FileStorageProductionEqiad" (From Pool resource)
  Scheduled time:         09-Sep-2025 13:35:08
  Start time:             09-Sep-2025 13:36:53
  End time:               09-Sep-2025 13:37:24
  Elapsed time:           31 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Comm Line Compression:  None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               no
  Volume name(s):         
  Volume Session Id:      341
  Volume Session Time:    1757324145
  Last Volume Bytes:      451,054,879,141 (451.0 GB)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Waiting on FD
  Termination:            *** Backup Error ***

Event Timeline

jcrespo triaged this task as High priority.Sep 9 2025, 4:54 PM
jcrespo added subscribers: MoritzMuehlenhoff, Dzahn.

High because it blocks many Debian upgrades.

I built a forward port of Bacula 9.6.7 for trixie-wikimedia (which only builds bacula-fd and bacula-common) and installed the packages on people1005/2004. Please verify if that makes backups work again. If these are confirmed, then I'll make a followup patch to install these to a separate repository component and install them on trixie hosts until the actual backup servers are moved to Bacula 15.

Thank you, that was fast! I sent a command, it may take some time to process it, as we are still dealing with the aftermath (backup re-runs) of some issues that happened during the weekend.

Let me help you with the patch, which would be here: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/bacula/manifests/client.pp$39 but we probably want it as a flag at the profile: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/manifests/backup/host.pp

The backup went through according to logs, so that's good. But I would like to double confirm the restore operation also works finely (let me know @Dzahn when you have some time today or tomorrow to do a recovery on people1005 so you help me check it went ok).

Once that is done, we can start work on the puppet patch.

Change #1187019 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Create component/bacula9

https://gerrit.wikimedia.org/r/1187019

Change #1187020 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] bacula::client: On Trixie hosts install the FD from component/bacula9

https://gerrit.wikimedia.org/r/1187020

The backup went through according to logs, so that's good. But I would like to double confirm the restore operation also works finely (let me know @Dzahn when you have some time today or tomorrow to do a recovery on people1005 so you help me check it went ok).

Once that is done, we can start work on the puppet patch.

Great news! The patches are rather simple, I've also gone ahead and submitted them to Gerrit, we can review and merge when your restore tests are complete.

@jcrespo I tested a restore on people1005. Just selected 3 image files from my own home dir from "latest backup" and restored to the host itself. They showed up in /var/tmp/bacula-restores almost instantly. Worked for me.

Change #1187019 merged by Muehlenhoff:

[operations/puppet@production] Create component/bacula9

https://gerrit.wikimedia.org/r/1187019

Mentioned in SAL (#wikimedia-operations) [2025-09-11T07:52:31Z] <moritzm> upload bacula 9.6.7-7+wmf13u1 to component/bacula9 for trixie-wikimedia T404114

Change #1187020 merged by Jcrespo:

[operations/puppet@production] bacula::client: On Trixie hosts install the FD from component/bacula9

https://gerrit.wikimedia.org/r/1187020

Change #1187490 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] bacula: Fix repo configuration for bacula9

https://gerrit.wikimedia.org/r/1187490

Change #1187490 merged by Jcrespo:

[operations/puppet@production] bacula: Fix repo configuration for bacula9

https://gerrit.wikimedia.org/r/1187490

Configuration looks as intended:

root@people1005:/etc/apt/sources.list.d$ cat component-bacula9-apt.wikimedia.org-wikimedia-trixie-wikimedia.sources
# SPDX-License-Identifier: Apache-2.0
#
# This file is managed by puppet.
# Any local changes will be swiftly overwritten
#
# Most cloud-vps projects can make persistent changes to apt sources
# by adding a new .list file in /etc/apt/sources.list.d.
#
# Some cloud-vps projects have 'cloud.yaml:profile::apt::purge_sources'
# set to 'true', in which case apt sources can only be managed
# via puppet.
#
Types: deb deb-src
URIs: http://apt.wikimedia.org/wikimedia
Suites: trixie-wikimedia
Components: component/bacula9
Signed-By: /etc/apt/keyrings/wikimedia-archive-keyring.gpg
✔ root@people1005:/etc/apt/sources.list.d$ apt show bacula-fd -a
Package: bacula-fd
Version: 15.0.3-3
Priority: optional
Section: admin
Source: bacula
Maintainer: Debian Bacula Team <pkg-bacula-devel@lists.alioth.debian.org>
Installed-Size: 463 kB
Pre-Depends: init-system-helpers (>= 1.54~)
Depends: bacula-common (= 15.0.3-3), sysvinit-utils (>= 3.05-1), ucf, libacl1 (>= 2.2.23), libc6 (>= 2.38), liblzo2-2 (>= 2.02), libstdc++6 (>= 5), zlib1g (>= 1:1.1.4)
Breaks: bacula-common (<< 9.6.4-1)
Replaces: bacula-common (<< 9.6.4-1)
Homepage: https://www.bacula.org/
Tag: admin::backup, interface::daemon, network::client, network::service,
 role::program, use::storing, works-with::file
Download-Size: 146 kB
APT-Sources: http://mirrors.wikimedia.org/debian trixie/main amd64 Packages
Description: network backup service - file daemon
 Bacula is a set of programs to manage backup, recovery and verification of
 data across a network of computers of different kinds.
 .
 The file daemon has to be installed on the machine to be backed up. It is
 responsible for providing the file attributes and data when requested by
 the Director and also for the file system-dependent part of restoration.

Package: bacula-fd
Version: 9.6.7-7+wmf13u1
Priority: optional
Section: admin
Source: bacula
Maintainer: Debian Bacula Team <pkg-bacula-devel@lists.alioth.debian.org>
Installed-Size: 542 kB
Pre-Depends: init-system-helpers (>= 1.54~)
Depends: bacula-common (= 9.6.7-7+wmf13u1), sysvinit-utils (>= 3.05-1), ucf, libacl1 (>= 2.2.23), libc6 (>= 2.38), liblzo2-2 (>= 2.02), libstdc++6 (>= 5), zlib1g (>= 1:1.1.4)
Breaks: bacula-common (<< 9.6.4-1)
Replaces: bacula-common (<< 9.6.4-1)
Homepage: https://www.bacula.org/
Download-Size: 221 kB
APT-Manual-Installed: yes
APT-Sources: http://apt.wikimedia.org/wikimedia trixie-wikimedia/component/bacula9 amd64 Packages
Description: network backup service - file daemon
 Bacula is a set of programs to manage backup, recovery and verification of
 data across a network of computers of different kinds.
 .
 The file daemon has to be installed on the machine to be backed up. It is
 responsible for providing the file attributes and data when requested by
 the Director and also for the file system-dependent part of restoration.

✔ root@people1005:/etc/apt/sources.list.d$ apt list --installed bacula-fd
bacula-fd/trixie-wikimedia,now 9.6.7-7+wmf13u1 amd64 [installed]

Backup ran successfully (recovery was tested too):

650661  Incr           3    2.416 K  OK       11-Sep-25 16:15 people1005.eqiad.wmnet-Monthly-1st-Wed-productionEqiad-home

Change #1187492 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] Revert "bacula: Ignore backup failures from people1005 & people2004"

https://gerrit.wikimedia.org/r/1187492

Change #1187492 merged by Dzahn:

[operations/puppet@production] Revert "bacula: Ignore backup failures from people1005 & people2004"

https://gerrit.wikimedia.org/r/1187492

jcrespo claimed this task.