Fri, Sep 16
Thu, Sep 15
I'm not sure if you are still thinking about solving this, we can continue trying to fix it or if not an issue we should try to avoid the alert.
Wed, Sep 14
Yep, it does :), so that part is done, the next one is deploying it on toolsbeta.
btw. backups are also failing as they pull from the secondary (labstore1005)
Tue, Sep 13
It seems that the network cards on labstore1005 overheated and decided to turn themselves off:
[10704668.955975] bnx2x: [bnx2x_attn_int_deasserted0:4170(enp133s0f0)]SPIO5 hw attention [10704668.964747] bnx2x 0000:85:00.0 enp133s0f0: Fan Failure on Network Controller has caused the driver to shutdown the card to prevent permanent damage. Please contact OEM Support for assistance [10704669.040066] bnx2x: [bnx2x_attn_int_deasserted0:4170(enp133s0f1)]SPIO5 hw attention [10704669.048824] bnx2x 0000:85:00.1 enp133s0f1: Fan Failure on Network Controller has caused the driver to shutdown the card to prevent permanent damage. Please contact OEM Support for assistance
This actually was not related (I think).
@Andrew I'm not sure if you are still working on these, let me know otherwise.
Fri, Sep 9
Wed, Sep 7
Done, you should have access
Oh! let's fix that then!
Nothing, the patch just needs merging if it looks good to everyone.
What is left here? What is blocking you?
Just created https://gitlab.wikimedia.org/repos/cloud/toolforge/packaging-pack/-/tree/main/ to package the pack cli, the tekton cli is just a binary too so you might be able to reuse all the scripts there (it was surprisingly complicated :/)
Tue, Sep 6
Closing as it's solved, but pinging @Andrew as he was playing with this at some point.
Manually rebased the /var/lib/git/labs/private repositories to latest master, there was a conflict in hieradata/common.yaml due to having added an entry at the bottom of it (and there's a local patch that does the same).
Mon, Sep 5
Sat, Sep 3
It's going down already, started at ~8:00UTC until ~9:00UTC
Fri, Sep 2
It seems that the runbook did not cleanup puppetdb or it was repopulated right after, as the host still shows there:
Thu, Sep 1
Wed, Aug 31
It seems to be an issue with sqlalchemy>1.4 and backy2 when cleaning up: https://github.com/wamdam/backy2/pull/93