Jan 4 2018

Dfko added a comment to T183970: wikidumpparse is using 1.2TB of 5T available NFS misc storage.

Hi, I am looking around for the offending files to delete them, but it has been a long while since I worked on any of this and I don't recall how to find them.
Can you provide the exact path of the offending files?

Jan 4 2018, 8:11 PM · cloud-services-team, Cloud-VPS

Aug 28 2016

Dfko added a comment to T140110: Packages to be installed in Toolforge Kubernetes Images (Tracking).

I need libxml2 (probably also libxml2-dev in kubernetes for recitation-bot since it uses the command-line xsltproc to run xslt processing. -dev will allow me to convert it to use python's lxml wrapper to do same. (python3 container)

Aug 28 2016, 8:01 PM · User-bd808, Toolforge, Kubernetes, Tracking-Neverending

May 8 2015

Dfko added a comment to T91979: Audit redis usage on toollabs.

Thanks

May 8 2015, 9:50 PM · Cloud-Services, Toolforge

Dfko added a comment to T91979: Audit redis usage on toollabs.

Interesting...
Getting back on it.
Can you post that dump for me somewhere?

May 8 2015, 9:44 PM · Cloud-Services, Toolforge

May 1 2015

Dfko added a comment to T91979: Audit redis usage on toollabs.

We have workers on the failed queue already. We were getting things going from failed -> failed forever due to this but that should be resolved.

May 1 2015, 7:57 AM · Cloud-Services, Toolforge

Dfko added a comment to T91979: Audit redis usage on toollabs.

Do you think the default of 15 minutes will be too long? >20 events per second is rare, so a very conservative estimate of queue load would be 18,000 events in flight.

May 1 2015, 7:56 AM · Cloud-Services, Toolforge

Dfko added a comment to T91979: Audit redis usage on toollabs.

I've run for about 20 minutes, seems like queue has plateaued at about 2 gigs and is stable. Will keep on keeping an eye on it for a bit longer.

May 1 2015, 7:50 AM · Cloud-Services, Toolforge

Dfko added a comment to T91979: Audit redis usage on toollabs.

It happens whenever I pass anything via the command line parameter of worker, whether it can be parsed as an int or not, so I am guessing they just forgot to ever call int() on that parameter. Haven't yet gone to track down where that should be happening though.
I think we will have to live with dropping events under load vs. filling queues but 15 minutes seems plenty to keep up with bursts we've seen so far as the final stage in the pipeline does not take long.

May 1 2015, 7:47 AM · Cloud-Services, Toolforge

Dfko added a comment to T91979: Audit redis usage on toollabs.

I was not able to find an explicit TTL format that would not trigger this bug, but not specifying it at all seems not to trigger it. I am going to try to start things up again and see if it works that way. The default TTL is 500 seconds.

May 1 2015, 7:27 AM · Cloud-Services, Toolforge

Dfko added a comment to T91979: Audit redis usage on toollabs.

Considered early on and this is making me consider considering it again, though there might not remain enough time in the project for a big move like that.

May 1 2015, 7:17 AM · Cloud-Services, Toolforge

Dfko added a comment to T91979: Audit redis usage on toollabs.

Still figuring out the significance of this, but there is a failure in a line that is dealing with TTLs in the RQ library registry.py that is resulting in churn from failure queue back onto the failure queue which may be behind this:

May 1 2015, 7:14 AM · Cloud-Services, Toolforge

Apr 30 2015

Dfko added a comment to T91979: Audit redis usage on toollabs.

We started it up again to get a key dump (see previous 3 comments)
@yuvipanda

Apr 30 2015, 5:02 AM · Cloud-Services, Toolforge

Apr 29 2015

Dfko added a comment to T91979: Audit redis usage on toollabs.

@valhallasw We're up to about a gigabyte, can I get a dump now?

Apr 29 2015, 11:36 PM · Cloud-Services, Toolforge

Dfko added a comment to T91979: Audit redis usage on toollabs.

Great, I was going to ask you about that anyway.

Apr 29 2015, 8:50 AM · Cloud-Services, Toolforge

Dfko added a comment to T91979: Audit redis usage on toollabs.

Yeah, I've been trying to figure this out, my best guess from the documentation so far is the failed jobs may not get timeouts set the same way originally submitted jobs do, need to work with this hypothesis more and maybe dig into the source of rq.

Apr 29 2015, 8:31 AM · Cloud-Services, Toolforge

Apr 28 2015

Dfko added a comment to T91979: Audit redis usage on toollabs.

Can you specify exactly what action you took to remove these jobs from the queue? Our service has been down while I've been hunting for the bug to no avail so far so I'd like to start it up again but with an additional task that periodically manually clears the queue so we can stay up until a better solution is found.

Apr 28 2015, 10:05 PM · Cloud-Services, Toolforge

Dfko (Dfko)
User

Projects

Calendar

Today

Tomorrow

Saturday

User Details

Recent Activity
View All

Jan 4 2018

Aug 28 2016

May 8 2015

May 1 2015

Apr 30 2015

Apr 29 2015

Apr 28 2015

Dfko (Dfko)User

Projects

Calendar

Today

Tomorrow

Saturday

User Details

Recent ActivityView All

Jan 4 2018

Aug 28 2016

May 8 2015

May 1 2015

Apr 30 2015

Apr 29 2015

Apr 28 2015

Dfko (Dfko)
User

Recent Activity
View All