Page MenuHomePhabricator

[WikiStreamService] 30-Minute SseEmitter Timeout with Event Resumption
Closed, ResolvedPublicFeature

Description

📝 Feature Summary

Implement a 30-minute strict timeout limit for client SSE (Server-Sent Events) connections. Upon reaching the 30-minute threshold, the connection should be safely terminated. Additionally, the system and client must support seamless reconnection to resume the event stream exactly where it left off, utilizing the Last-Event-ID to prevent any loss of monitored events.

🎯 Motivation / Use Case

Currently, SseEmitter instances in WikiStreamService are configured with an infinite timeout (Long.MAX_VALUE). Allowing indefinite connections can lead to stale connections, memory leaks, and excessive resource consumption on the server. Enforcing a 30-minute rotation ensures that "zombie" connections are cleared out. However, to maintain a seamless monitoring experience, users must not lose edits/events during the brief reconnection window.

📋 Implementation Plan

  1. Update Emitter Timeout: Modify WikiStreamService.subscribe(Principal) to instantiate SseEmitter with a 30-minute timeout (30 * 60 * 1000L) instead of Long.MAX_VALUE.
  2. Handle Client Disconnect: Ensure the client-side JavaScript listens for the connection closure/timeout event and automatically triggers a reconnection attempt.
  3. Capture Last Event ID: The client must capture the ID of the last successfully processed RecentChange event.
  4. Resumption Logic:
    • Modify the client's reconnection request to include the Last-Event-ID header.
    • Architectural Note: Since the server currently maintains a single global EventSource connection to Wikimedia and broadcasts to all SseEmitter clients, handling individual client catch-ups requires either caching recent events in memory/Redis or allowing clients to fetch missed events via a separate REST endpoint before joining the live stream again.

🛠️ Technical Details

  • File to modify: src/main/java/org/qrdlife/wikiconnect/wikimonitor/service/WikiStreamService.java
  • Current Code: SseEmitter emitter = new SseEmitter(Long.MAX_VALUE);
  • Proposed Code: SseEmitter emitter = new SseEmitter(1800000L); // 30 minutes
  • Challenges: Implementing per-user event resumption. The backend must reconcile the global lastEventId with the user's specific Last-Event-ID to replay missed flagged events specifically for that user's active filters.

Event Timeline

Gerges triaged this task as High priority.Apr 2 2026, 4:53 PM
Praffq subscribed.

I'll like to try to solve it

@Praffq, What will you do about that?

Architectural Note: Since the server currently maintains a single global EventSource connection to Wikimedia and broadcasts to all SseEmitter clients, handling individual client catch-ups requires either caching recent events in memory/Redis or allowing clients to fetch missed events via a separate REST endpoint before joining the live stream again.

@Gerges I'm adding caching for this, a dequeue , through which our relevant event passes, and then sent to the client for processing. Now when the client goes to timeout , the events keep getting added to the dequeue , now when the client comes back up we check the header for Last-Event-ID , finds on the queue then process events ahead of it using client.

Choosing this bcz we have single server, so we can keep the cache as a container variable with fixed length (default to 1000, but can be changes using env var).

If in future we want scale to multiple server we can replace it with redis, but currently I thinking using a memory cache with reasonable size is perfect for wikimonitor

Your thoughts?

@Praffq, You can use via Toolforge: wikitech:Help:Toolforge/Redis
It’s a better fit because it provides a shared, persistent queue, supports list operations, and works well if we scale to multiple servers. It also handles reconnections more reliably using Last-Event-ID.
Additionally, this task is mainly about reducing load on the JVM container, and moving the event buffer to Redis helps offload memory and connection pressure from the application itself.

@Gerges thanks for suggestion, I'll go through the toolforge redis docs and will try to implement using it

Hi @Praffq, Are you still planning to submit a new pull request instead of reopening the closed one?

@Gerges yeah I'm working on a new PR according to all the suggestions, will like to try it one more time

Gerges renamed this task from 30-Minute SseEmitter Timeout with Event Resumption to [WikiStreamService] 30-Minute SseEmitter Timeout with Event Resumption.Wed, Apr 22, 6:25 PM

Mentioned in SAL (#wikimedia-cloud) [2026-05-03T09:36:21Z] <wmbot~gergesshamon@tools-bastion-15> [DEPLOY] Starting deployment | ref=v1.5.5-beta (T422194)

Mentioned in SAL (#wikimedia-cloud) [2026-05-03T09:37:31Z] <wmbot~gergesshamon@tools-bastion-15> [DEPLOY] Build triggered successfully | ref=v1.5.5-beta (T422194)

Mentioned in SAL (#wikimedia-cloud) [2026-05-03T09:37:32Z] <wmbot~gergesshamon@tools-bastion-15> [DEPLOY] Restarting buildservice | ref=v1.5.5-beta (T422194)

Mentioned in SAL (#wikimedia-cloud) [2026-05-03T09:37:37Z] <wmbot~gergesshamon@tools-bastion-15> [DEPLOY] Deployment completed successfully | ref=v1.5.5-beta (T422194)

Mentioned in SAL (#wikimedia-cloud) [2026-05-06T18:42:25Z] <wmbot~gergesshamon@tools-bastion-15> A pull request (https://github.com/wiki-connect/wikimonitor/pull/56) has been submitted by [200~praffq-dev has been merged (T422194)

Mentioned in SAL (#wikimedia-cloud) [2026-05-06T18:42:47Z] <wmbot~gergesshamon@tools-bastion-15> A pull request (https://github.com/wiki-connect/wikimonitor/pull/53) has been submitted by praffq-dev has been merged (T422194)