Page MenuHomePhabricator

LibUp hasn't run since 5 June 2023
Closed, ResolvedPublic

Description was the last run. Would be great to get it back up.

Event Timeline

Does anyone know what needs to happen to get it running again?

On the technical side: I have no clue.
On the organizational side: Someone to step up, take a look at the logs, write some fixes and get it back up again (it's written in python and decent code quality so it shouldn't be that hard, I've read and copied parts of that code for LSC). Anyone can be that someone but in in the words of esteemed software engineering coach, Selena Gomez, It ain't me.

Reedy triaged this task as High priority.Jan 30 2024, 9:01 PM indicates that it was restarted yesterday by @bd808, but looking at the most recent log on, it does not seem like that resolved this issue.

The Cloud-VPS Grafana logs seem to only start Jan 1st of this year? seems to be 500 in general? Though I'm not sure if there would be more than on wikitech.

Looking at - it seems the database has some issue?

From looking at the docs, these things seem to be the main moving parts:

A systemd timer triggers the script, which gathers a list of repositories, and queues jobs for them in our celery instance (backed by rabbitmq). celery is a job runner (currently running with a concurrency of 2), and will spawn docker containers that executes


Do we have a way to access to any of the logs there?

@Legoktm do you have interest in maintaining LibUp? If not, we could make a Code-Stewardship-Reviews task to see if we can find an owner.

This comment was removed by Michael.

LibUp was turned off because there was some bug (which I don't remember but probably has a ticket somewhere) and because I was adding GitLab support (there's a branch on GitLab) and I thought I'd have it back running in a few days, which obviously didn't happen. So my apologies for not communicating that properly and then not being a responsible maintainer and adding backups and well, getting it back running. I've rectified the maintainer issue by giving @Ladsgroup, @Jdforrester-WMF and @Reedy access (you all are welcome to add others as well). The only thing I haven't shared is the Gerrit password + SSH passphrase, happy to do that over some secure channel (e.g. Signal) or y'all have access to the email address via Toolforge and can trigger a reset and add a new SSH key.

@Legoktm do you have interest in maintaining LibUp? If not, we could make a Code-Stewardship-Reviews task to see if we can find an owner.

Interest yes, time, not so much. I've never really been satisfied with the libup code, despite rewriting it twice I think it still sucks because of a number of things! It's also basically impossible to run locally, which causes all sorts of problems. I can write a more detailed analysis if it would be useful.

I also think it's worth spending real time investigating to see whether it meets our needs.

Update: In FOSDEM @taavi has been working on freeing up space in both db and the VM. The VM disk was filled due to a cache directory being full (memleak but on disk?) and db is full because of the logs table being 9GB. I'm not sure what we can do there, can we drop really old runs? I suggest taking a snapshot with mariadb backup and the drop all the old runs. Otherwise, we probably need to normalize or some other things there but not sure.

We reset the ssh key. It's doing "something"

Ladsgroup assigned this task to taavi.

it's back up again. I will file a task to make sure it stops growing without bound.

Note: Added Taavi to the project

Mentioned in SAL (#wikimedia-cloud) [2024-02-05T17:21:03Z] <taavi> add James_F, Amir1, Reedy and myself to labs-libraryupgrader Gerrit group T345930