Page MenuHomePhabricator

Replace Redis queue with custom http solution
Closed, ResolvedPublicFeature

Description

As hinted in T360860#9656485 I have been thinking for a while about replacing Redis in the wikibugs2 stack with a custom service. After doing a bit of proof of concept work I think I have the outline for a reasonable solution. The solution piggybacks off of the implementation for T360860: Reimagine channel configuration (re)loading to avoid need for git pull by using the wikibugs2 web component as the storage and distribution service for the work queue.

  • PUT /api/event - Push a new event into the RPC queue watched by wikibugs2 irc. Callable by any wikibugs2 component.
  • GET /api/eventstream - Establish a Server Sent Events (SSE) session that will receive server push notifications of newly enqueued events. Callable by wikibugs2 irc.

The initial events supported by this system will be:

  • irc send event - A pre-formatted IRC message & list of channels to send it to. This can be used by any component that has something to say on irc, but most typically will be used by wikibugs2 gerrit to notify channels of code review events.
  • phorge event - Data collected from Phorge about a Manifest task transaction. These will be produced by wikibugs2 phorge.
  • ping event - A event injected into the event stream by wikibugs2 web periodically to help ensure that the push path from the server to its attached client is functioning.

This design makes the wikibugs2 web webservice a newly stateful system that may lose data previously sent by a client upon crash/restart. It is currently believed that this potential data loss will not be more disruptive than the current stateful wikibugs2 gerrit and wikibugs2 phorge data collectors have proven to be. The working hypothesis is that the overall stability of the system will be improved by removing the currently untrustworthy Redis queue which is backed by stateful disk storage, but also requires traversing the Kubernetes<->Cloud VPS network boundary to both store and retrieve events.

Details

TitleReferenceAuthorSource BranchDest Branch
Replace Redis queue with custom http solutiontoolforge-repos/wikibugs2!28bd808work/bd808/ssemain
Customize query in GitLab

Event Timeline

bd808 changed the task status from Open to In Progress.Apr 1 2024, 8:26 PM
bd808 claimed this task.
bd808 triaged this task as Medium priority.

I have code running in my local environment for the whole stack without Redis anywhere! It needs a bit more polish before I push to gitlab and start testing it at scale in the wikibugs-testing deployment, but that should happen very soon.

Mentioned in SAL (#wikimedia-cloud) [2024-04-05T23:52:57Z] <wmbot~bd808@tools-sgebastion-10> Build new container based on MR!28 and restarted web, irc, gerrit, and phorge tasks (T361518)

Mentioned in SAL (#wikimedia-cloud) [2024-04-08T16:29:17Z] <wmbot~bd808@tools-bastion-12> Built new image from git hash 0c4ecb64. (T361518)

Mentioned in SAL (#wikimedia-cloud) [2024-04-08T16:36:55Z] <wikibugs> Restarted web, irc, gerrit, and phorge tasks to pick up new image. (T361518)