Maniphest T220423

Show "timed out" error to the user when an event update has been running for over an hour
Open, LowPublicBUG REPORT
Actions

Assigned To

None

Authored By

	MusikAnimal
	Apr 8 2019, 4:43 PM

Description

Currently if an update (aka job) has been running for over an hour, the system just assumes the worst and deletes it, without giving the user any indication as to what happened. We now have different states for jobs (internally there's "queued", "started", "failed timeout" and "failed unknown"). Instead of deleting the job, we can set the state to "failed timeout". Then we can show the same timeout error that you see when individual queries timeout.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved	BUG REPORT	None	T219835 [Spike: 3hrs] Event Metrics 'crunched numbers' for hours then stopped, but showed neither results nor an error msg.
		Open	BUG REPORT	None	T220423 Show "timed out" error to the user when an event update has been running for over an hour

Event Timeline

MusikAnimal created this task.Apr 8 2019, 4:43 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 8 2019, 4:43 PM

MusikAnimal mentioned this in T219835: [Spike: 3hrs] Event Metrics 'crunched numbers' for hours then stopped, but showed neither results nor an error msg..Apr 8 2019, 4:44 PM

• jmatazzoni added a project: Community-Tech-Sprint.Apr 8 2019, 5:26 PM

• jmatazzoni moved this task from New & TBD Tickets to In Sprint 🏃‍♀️🏃‍♂️ on the Community-Tech board.

• jmatazzoni moved this task from Backlog to In Sprint on the Event Metrics board.Apr 8 2019, 5:28 PM

MaxSem claimed this task.Apr 8 2019, 9:26 PM

MaxSem moved this task from Ready to In Development on the Community-Tech-Sprint board.

Are we sure that after an hour it's not going to work? I.e., is that the right interval to declare defeat?

• jmatazzoni added a parent task: T219835: [Spike: 3hrs] Event Metrics 'crunched numbers' for hours then stopped, but showed neither results nor an error msg..Apr 8 2019, 10:20 PM

Ready for review: https://github.com/wikimedia/eventmetrics/pull/280

MaxSem mentioned this in rGEVM5b6c97bf4993: Mark long running jobs as timed out.Apr 8 2019, 10:41 PM

MaxSem mentioned this in T220463: Make jobs not time out in 30 seconds.Apr 9 2019, 12:15 AM

MaxSem moved this task from Needs Review/Feedback to QA on the Community-Tech-Sprint board.Apr 9 2019, 8:19 PM

Waiting for T220463 to be resolved to test this. Currently, we time out long before the 24 hour limit that this change introduces.

@MaxSem, to test I started https://eventmetrics-dev.wmflabs.org/programs/150/events/364 This morning at 8:30. It was still crunching at 11:30 today. When I refreshed at that point, it reverted back to original state, with no message and no metrics. So is the standard that nothing should run over an hour? I'm not sure what we're aiming at...

Starting with plan B then!

• jmatazzoni changed the subtype of this task from "Task" to "Bug Report".Apr 15 2019, 6:16 PM

@MaxSem has tried what he was going to try and it seems like this problem is intermittent. @aezell I'm going to pull this task off the board, and we'll monitor to see how things progress.

• jmatazzoni removed projects: Community-Tech-Sprint, Community-Tech.Apr 15 2019, 11:03 PM

• jmatazzoni moved this task from In Sprint to High value ideas on the Event Metrics board.

Restricted Application added a project: Community-Tech. · View Herald TranscriptApr 15 2019, 11:03 PM

Thanks. Max did notice that the cron jobs on staging were not configured correctly. That could have contributed to some of the confusion around what was going on with this. I don't think we can consider it complete though.

MBinder_WMF moved this task from In Sprint 🏃‍♀️🏃‍♂️ to Resolved 2018-19 Q4 on the Community-Tech board.May 28 2019, 3:58 PM

MBinder_WMF edited projects, added Community-Tech (Resolved 2018-19 Q4); removed Community-Tech.

I assume we didn't mean to start working on EM again? :P

Low priority.

MaxSem removed MaxSem as the assignee of this task.May 20 2020, 7:44 PM

MaxSem subscribed.

Show "timed out" error to the user when an event update has been running for over an hourOpen, LowPublicBUG REPORTActions

Description

Related ObjectsSearch...

Event Timeline

Show "timed out" error to the user when an event update has been running for over an hour
Open, LowPublicBUG REPORT
Actions

Related Objects
Search...