Page MenuHomePhabricator

[EPIC] Develop a JobQueue backend based on EventBus
Closed, ResolvedPublic

Description

After the discussion at the developer summit (T149408) it’s been decided to create a Kafka-based backend for Mediawiki JobQueue

Simple overview of the JobQueue system.

The JobSpecification in the interface for the jobs. The job is a unit of work sent my MediaWiki for later backend processing. Each job has a type and parameters. The jobs are posted to the queue using the JobQueueGroup class, which maintains a mapping between job types and individual queues. Each job type has it’s own queue. The job type, parameters and other info is then serialised and sent to Redis.

On the consumer side there are JobRunners - they pop the jobs out of the queue, deserialise them and execute them.

There’s plenty of other featured built on top of this - individual job de-duplication, de-duplication based on the root job, delayed execution. The new system should match all of these features.

Outline of the new solution

In order to utilise Kafka as a medium for the job processing, we need to implement the producer side API of the JobQueue that would serialise a Job into a schema-ed JSON and send it to EventBus.

Event-Platform Value Stream will post event to a kafka topic. There will be a topic per the job type, so that different event types do not interfere.

ChangeProp would pick the events from those topics, do all the deduplication and management work, guarantee delivery using kafka commits. For the sake of backwoods compatibility the final destination would be the existing JobRunner.

The JobRunner would be converted to expose a run.php endpoint, where the jobs would be posted, json-deserialised and executed. The HTTP status codes would be used to communicate success/error of job execution for ChangeProp to manage retries.

The Wikimedia Developer Summit presentation regarding the proposal is available here.

This would be an umbrella task for all the things that need to happen before this is done.

Related Objects

StatusSubtypeAssignedTask
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
ResolvedOttomata
DeclinedOttomata
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
DeclinedOttomata
DeclinedRobH
ResolvedPapaul
Resolved Pchelolo
Resolved Pchelolo
ResolvedJoe
Resolved Pchelolo
Opendaniel
Resolved Clarakosi
Resolved Pchelolo
Resolved Clarakosi
OpenNone
ResolvedEBernhardson
ResolvedKrinkle
Resolved Pchelolo
ResolvedUmherirrender
OpenNone
Openhnowlan
OpenNone
Resolvedhnowlan
Resolved mobrovac
Declined Pchelolo
Resolved Pchelolo
Declined Pchelolo
OpenNone
Declined Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved mobrovac
Resolveddemon
Resolved Pchelolo
Declined Pchelolo
ResolvedTgr
Declined Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
DeclinedOttomata
Resolved Pchelolo
ResolvedOttomata
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Declined Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
Resolved Pchelolo
ResolvedEBernhardson
ResolvedEBernhardson
Resolved Pchelolo
ResolvedOttomata
Resolved Pchelolo
Resolved Pchelolo
ResolvedSBisson
ResolvedPRODUCTION ERRORSBisson
Resolvedmatthiasmullie
ResolvedLadsgroup
ResolvedPRODUCTION ERROR mobrovac
InvalidNone
Resolved Pchelolo
ResolvedNikerabbit
ResolvedNikerabbit
Resolved mobrovac
Resolved Pchelolo
Resolvedfgiunchedi
ResolvedLadsgroup
Declined Pchelolo
ResolvedJoe

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Ping @Ottomata and @Pchelolo seems like this work needs to happen after the work to move kafka to tier-1 in two datacenters, right? let me know otherwise

Correct. We have a yearly goal for next FY to move the JobQueue onto EventBus/ChangeProp.

Yo dudes! We are ordering more Kafka hardware (see T152015). We've decided not to provision a beefy aggregate Kafka cluster in codfw, and instead, focus on the existing main Kafka clusters for all cross-DC production use cases. As such, there is some extra budget for this FY. Do you think we should spend this money expanding the main Kafka clusters in anticipation of this JobQueue project?

Luca and I poked around in JobQueue stats yesterday, and we think that the existing main Kafka hardware can handle the intake of JobQueue (I say that with low confidence). But, I'm not so sure about eventbus as it is deployed there. Will eventbus as deployed now handle a template change that causes a huge spike in enqueued jobs? How much more capacity will eventbus need to have in order to handle such a spike? I can try to answer such questions, but I need some help understanding the potential increase in usage.

We should do our best to answer this question ASAP, as IF we do want to expand main Kafka clusters, we really need to get a quote and get a hardware order placed yesterday. My understanding is that if we don't receive the new hardware before the end of June, the extra money for this will dry up wither away. Mark says: don't come asking us for this next year! (Unless y'all have already budgeted for such an expansion in prep for this project? If so, then this won't be such a hurry.)

FYI, we could double the size of each main Kafka cluster (with the same server profiles) in each DC with this remainder budget.

Yo dudes! We are ordering more Kafka hardware (see T152015). We've decided not to provision a beefy aggregate Kafka cluster in codfw, and instead, focus on the existing main Kafka clusters for all cross-DC production use cases. As such, there is some extra budget for this FY. Do you think we should spend this money expanding the main Kafka clusters in anticipation of this JobQueue project?

Luca and I poked around in JobQueue stats yesterday, and we think that the existing main Kafka hardware can handle the intake of JobQueue (I say that with low confidence).

Petr & I briefly discussed this today here in Vienna. After a full job queue migration, we expect the number of Kafka events to roughly double relative to what we have right now. Some of those job events might contain significant data, so disk usage is set to grow. However, right now it is very low.

But, I'm not so sure about eventbus as it is deployed there. Will eventbus as deployed now handle a template change that causes a huge spike in enqueued jobs?

While template edits should only enqueue a constant number of small jobs directly, I suspect that the expansion into "leaf" sub-jobs will initially still happen during job processing itself. This means that those would hit eventbus, rather than following the changeprop model of producing directly. @Pchelolo, is this correct?

How much more capacity will eventbus need to have in order to handle such a spike? I can try to answer such questions, but I need some help understanding the potential increase in usage.

I think the current job execution rate of about 1000/s provides a good initial estimate for the total expected extra traffic. It might make sense to set up a dedicated eventbus instance for job processing. This would provide isolation between current eventbus users & new, experimental job queue usage. Once we know about the load & are confident, we can then merge the instances. In any case, a purely cpu bound service like eventbus could live anywhere; there is no real need to expand the Kafka cluster just to scale eventbus. We could for example consider to run eventbus on the misc-cpu-bound-service SCB cluster.

We should do our best to answer this question ASAP, as IF we do want to expand main Kafka clusters, we really need to get a quote and get a hardware order placed yesterday. My understanding is that if we don't receive the new hardware before the end of June, the extra money for this will dry up wither away. Mark says: don't come asking us for this next year! (Unless y'all have already budgeted for such an expansion in prep for this project? If so, then this won't be such a hurry.)

FYI, we could double the size of each main Kafka cluster (with the same server profiles) in each DC with this remainder budget.

Looking at current usage, I would expect iops on rotating disks to become the next bottleneck on the Kafka nodes, especially with many clients reading from different offsets through eventstream. SSDs would have a lot more headroom in that regard. Perhaps it would be worth considering using some of the budget for replacing the current rotating disks with SSDs? If more money is left then, perhaps we could spend it on more SCB (misc cpu bound services) nodes, and then use some bandwidth on SCB for eventbus?

a purely cpu bound service like eventbus could live anywhere; there is no real need to expand the Kafka cluster just to scale eventbus. We could for example consider to run eventbus on the misc-cpu-bound-service SCB cluster.

+1, cool, so if we need more eventbus capacity, we will find space somewhere other than main Kafka clusters. I like.

I would expect iops on rotating disks to become the next bottleneck on the Kafka nodes, especially with many clients reading from different offsets through eventstream.

You mean eventbus?! AHHHh https://wikitech.wikimedia.org/wiki/Event* :)

Yeah. I was wondering what the average size of these job messages are. We might be able to reason about how long events will remain in RAM with the average size and message rate. If we can be reasonably sure that job queue workers will be able to power through the jobs before the messages get pushed out of Kafka's RAM, we might not see much increase in disk iops.

Perhaps it would be worth considering using some of the budget for replacing the current rotating disks with SSDs? If more money is left then, perhaps we could spend it on more SCB (misc cpu bound services) nodes, and then use some bandwidth on SCB for eventbus?

Also +1. We are using about 1 out of 7 TB on the main-eqiad Kafka brokers right now. If we can replace their disks with SSDs and still have enough headroom to grow, I'm all for it.

Since we have to spend this budget super soon, perhaps we should just do it? I can create a task to get quotes for SSDs, if y'all create the ones for extra SCB hosts. Even if we don't need to expand eventbus capacity, it'd be nice to move eventbus off of the Kafka broker nodes.

yes/no?

While template edits should only enqueue a constant number of small jobs directly, I suspect that the expansion into "leaf" sub-jobs will initially still happen during job processing itself. This means that those would hit eventbus, rather than following the changeprop model of producing directly. @Pchelolo, is this correct?

In the beginning of the project it's not correct. The leaf job posting is handled by the job runner and it's done from the PHP land, so leaf jobs will be posted via EvenBus proxy service. When (if) we start migrating some jobs to be solo change-prop handled that will change, but in the beginning all the jobs will go via EvenBus.

However, I wouldn't worry about the sudden spikes in load, because JobQueue follows the same pattern as change-prop - first a bulk of leaf jobs is processed and only afterwards the continuation job is posted. In ChangeProp we see very gradual changes in event rates due to this algorithm.

Yeah. I was wondering what the average size of these job messages are.

Although I've spent a significant amount of time trying to figure this out I still don't have a definitive answer to this question. I guess we could invest some time into it and dump the contents of the reeds job store and do some analysis on this. I'll do that after the hackathon.

Since we have to spend this budget super soon, perhaps we should just do it? I can create a task to get quotes for SSDs, if y'all create the ones for extra SCB hosts. Even if we don't need to expand eventbus capacity, it'd be nice to move eventbus off of the Kafka broker nodes.

How pressing is the issue? Can we have a meeting after the hackaton, discuss it and set the plan or should we decide right now?

How pressing is the issue? Can we have a meeting after the hackaton, discuss it and set the plan or should we decide right now?

From the ops point of view we'd need to take a decision during the next couple of days since the time to place the order (allowing also time to get quotes and review them) is limited. If you guys could have a chat with Faidon during the Vienna's hackathon it would be really great, so we'll have a shared plan with ops.

Ya, if we are going to do this, we should get a procurement ticket up early next week. Thanks!

GWicke triaged this task as Medium priority.Aug 8 2017, 5:52 PM