Page MenuHomePhabricator

Introduce dependency injection into jobs
Open, Needs TriagePublic

Description

This task is currently a draft. I would like to hear opinions:

  • Does it make sence?
  • Is this something we think we want to do?
  • Should this be an RFC?

Problem

The current system for submitting and executing jobs has the following issues:

  • The interface for job submission is unclear. JobQueue::push accepts any instance of IJobSpecification, which could be an instance of JobSpecification, a simple wrapper for the job parameters, or an instantiated Job. The latter is an artifact from before IJobSpecification existed.
  • Job construction is overly flexible. Before each job was bound to an instance of Title. However, for a lot of jobs that is not the case, so GenericParameterJob was introduced. The constructor of the former accepts a title, while the latter does not. This makes it hard to create a generic factory for all the jobs.
  • There is no way to inject service classes into Jobs.
  • Some of the initialization done in job constructors is unnecessary on the submission side, while some is unnecessary on the execution side.

Proposal

  • Jobs could only be submitted as a JobSpecification.
  • Title has no special treatment within jobqueue infracture. Jobs bound to a page title should set it via parameters just like any other parameter, so all jobs become GenericParameterJob.
  • Static Job::factory is removed, jobs are constructed via JobFactory service class.
  • $wgJobClasses configuration is removed or it's semantics is changed. It's replaced with a series of object specifications for ObjectFactory with required service classes. All service classes are injected into Job instance.
  • Job instances are only constructed on the execution side when they are actually needed.

Steps

  1. Pushing instances of a Job into the JobQueue becomes deprecated in favor of JobSpecification.
  2. Once no Job instances are pushed into the queue anymore, Job stops being IJobSpecification - this will completely decouple pushing and popping the jobs. Also, prevents from calling Job constructor and Job::factory from random places.
  3. JobSpecification::getTitle becomes deprecated, the constructor still supports providing the title for convenience, but namespace/dbkey are passed through params. This ensures that all the pushed jobs have all the necessary information for execution within the parameters.
  4. All the static helper methods for constructing specific kinds of jobs start returning a JobSpecification. Prior to that relying on the fact it's a Job is deprecated.
  5. Now the individual job constructor is called from a single place so special constructors with Title can be deprecated and removed. All jobs become a GenericParameterJob
  6. Job::factory is deprecated, JobFactory service class is created to encapsulate Job deserialization.
  7. $wgJobClasses is deprecated, or just the format is changed to accept ObjectFactory spec. All services needed for the job are declared there. The spec is modified in the JobFactory in order to include ‘params’.

Steps 1-6 are needed for:

  1. Prohibit construction of full Job instances just for submitting them. In DI world this would be wasteful since all the service objects are needed only during job execution and not during job submission.
  2. Consolidate all the Job construction and get all the constructor in line, so that they could be built with object spec.

These are moving us towards step 7, however each of these steps would be an improvement over the current system.

Event Timeline

Pchelolo renamed this task from Path to dependency injection in jobs to Introduce dependency injection into jobs.Feb 25 2020, 4:41 PM
daniel added a subscriber: daniel.

This all makes sense to me, but perhaps we can take this one or two steps further:

We could turn jobs into events (in the general sense). The only conceptual difference I see between jobs and events is that with a job, there is only one handler, and that handler is known to the code that triggers the job/event. This seems more like a problem than a feature. In my mind, it makes much more sense to have code emit events that other code can subscribe to. This reduces coupling. Event handlers for core would be registered in a kind of "listener wiring" similar to "service wiring".

To take it two steps further: Extensions should be able to register their own handlers into the "listener wiring" (and define their own events). What's the difference then between an "event" and an "action hook"? They seem to be the same thing. Can we have a unified system of "code that reacts asynchronously to events triggered by other code"?

If we finish the decoupling between JobSpecification and Job, we're pretty much done changing the concept to event. Rename JobSpecification into EventSpecification and Job into EventHandler - and you're pretty much done. And allow the handler specification to hold and array instead of of a single value.

But that's on the coding level. Conceptually, currently the boundaries are a bit incorrect in the jobs - JobSpecification and job types are much more about what needs to be done, than about what happened. Moving the boundaries so that the job spec only contained info about what happened in the system and the handler figured out what needs to be done is an even further step on this road.

But that's on the coding level. Conceptually, currently the boundaries are a bit incorrect in the jobs - JobSpecification and job types are much more about what needs to be done, than about what happened. Moving the boundaries so that the job spec only contained info about what happened in the system and the handler figured out what needs to be done is an even further step on this road.

That's a very nice way of putting it, thank you!

To me, the question is: do we transition to JobDescription first, and then transition again to events, or should we do both at the same time? Which option is better depends on how complex (and thus, how risky) the change is. We'd have to survey about 50 Job classes to find out.

We could turn jobs into events (in the general sense). […] it makes much more sense to have code emit events that other code can subscribe to. This reduces coupling. […] What's the difference then between an "event" and an "action hook"?

An "action hook" (as defined in T212482) indeed acts as an event. The primary characteristics of hooks (or events) are I think that they are naturally decoupled. There might be no users of a hook (event listeners) and that is perfectly valid.

However as "event" was introduced here without definition, I think the better question is: What is the difference between a "job" and an "action hook". There, I think we are starting to see an important disctinction.

For example, when an edit is being saved then the action is "edit". After the action is completed, an action hook (or event) could let additional logic execute which is not part of the "edit" action itself, but rather in response to it, based on the domain interests of whomever subscribes to that event.

A job on the other hand is often critically part of the user action itself. The action may even be considered incomplete until the job has run. For example, if I send an e-mail, or delete a page - the implementation of that should not be an event listener to some dummy user action that by itself does nothing. When this code queues the job, it explicitly requests its own code to run. If there were no listeners, or only unrelated listeners, that must be a fatal error. Of course, if the logic is meant to be swappable, it could be backed by service class. Or if we want to allow others to also partake, it could also run an action hook once the job has completed. But, I think it is important that we not treat these all the same. That would I think remove important business logic and contracts currently expressed and clearly present in the code.

Having said that, I do think there many jobs that could be converted to simple action hook callers (event listeners). If my proposal for having action hooks be asynchronous is accepted, then there would be no need for an extension to register an action hook that merely queues a job or deferred update. Rather, it would register its callback or class directly with the action hook, which core then internally naturally runs on the job queue somehow (perhaps using a generic "Event" or "RunActionHook" job).

I agree that jobs are not always "events" or "event handler hooks" (what about jobs the subdivide or are added by maintenance scripts and so on?). There is probably a lot of stuff that can indeed be moved in that direction though.

A job on the other hand is often critically part of the user action itself. The action may even be considered incomplete until the job has run. For example, if I send an e-mail, or delete a page - the implementation of that should not be an event listener to some dummy user action that by itself does nothing. When this code queues the job, it explicitly requests its own code to run. If there were no listeners, or only unrelated listeners, that must be a fatal error.

I think that this kind of tight coupling between the code that triggers the action, and the code that implements it, tends to be an architectural flaw. We should not encourage it, but rather try to avoid it. We have quite a few places in the code where one subsystem reacts to an event in another (e.g. when a page is edited, the search index needs to be updated). Implementing this in a way that has the event source know about the receiving subsystem introduces undesirable dependencies.

I see your point in the email example, and perhaps we'd need the concept of "consuming" or "acknowledging" an event to make sure it is indeed handled. And in some cases, it does make conceptual sense to defer the execution of code that is tightly coupled. But it seems to me that this is rare, and most usages of jobs in our system should not work that way. Cache puring, for instance, should not be something the code that triggers the event knows about. The caching subsystem would register a listener that facilitates purging. The application logic shouldn't know or care.

This architectural discussion is not something that needs to be resolved before this ticket can go forward. I agree that it's a good thing as proposed. But we should have it at the back of our heads when thinking about this, so we don't further commit to a pattern that we would rather want to replace, at least for most use cases.