Page MenuHomePhabricator

Decouple AbuseFilterVariableHolder and AFComputedVariable
Closed, ResolvedPublic

Description

AbuseFilterVariableHolder has knowledge about several AFComputedVariable objects, and that's fine because they represent single variables. However, we also have a dependency the other way around: AFComputedVariable requires an AbuseFilterVariableHolder to be passed in when lazy-loading a variable, because computing a variable usually requires having the values of some other variables.

There are various possible solutions to this (require only an array of variables to be passed in, as opposed to a whole AbuseFilterVariableHolder object; use closures to lazy-load variables, creating the closures in VariableGenerator and passing them around; moving the compute method to a separate service; other solutions that I already forgot). However, attention should be paid because this code is currently very fragile. Some things that make this harder:

  • We have many different variables, each requiring its own set of other variables
  • Variable A may require variable B, which in turn might require variable C and so on
  • Whenever a variable is lazy-loaded, the AbuseFilterVariableHolder object it belongs to should be updated with the newly-computed value

I had already analyzed this problem some time ago, but couldn't come up with a perfect solution. r568530 is a related experiment. Either way, we CANNOT change the AFComputedVariable class before T213006 is done (and this will introduce other problems, because we'd also have to wait for at least one release before dropping BC code, see r482518 and relation chain).

Event Timeline

I looked into this again, and wrote a quick note with a possible solution.


We have to two problems here:

  1. the AFComputedVariable class is a smart object: it knows how to compute itself (and it knows about its VariableHolder parent object)
  2. the VariableHolder class is a smart object, because it has AFComputedVariable objects as members

This can be resolved with the two steps outlined below, where the first step addresses problem 2, and the second step addresses problem 1.

First of all, an optional (but recommended, as we're going to need something like this anyway) change: encapsulate the current AFComputedVariable and AFPData classes into a new Variable class that would act as a wrapper for these two objects, providing a common interface to the VariableHolder.

First step

The first real step is to convert the AFComputedVariable class into a real value object, called e.g. LazyVariable (perhaps inheriting from Variable). The compute method and its dependencies would be moved to a new LazyVariableComputer service, and the method would then take a VariableHolder and a LazyVariable and update the Holder with the newly-computed value (note, the Holder would also be used to grab any other required variable). The VariableHolder::getVar would poll the service locator for a LazyVariableComputer (no point in injecting it right now) and use that when requested the value of a lazy variable.
At this point, LazyVariable would be a real value object, and the business logic would have been moved up a level. Note that there'd still be a cyclic dependency between VariableHolder and LazyVariableComputer, so we can think of these objects as being the same. Hence, we can think of the situation as if we just pulled the business logic from AFComputedVariable to VariableHolder. However, this cyclic dependency is easier to break now, as it's not between two "bad" objects, but between a bad object and a good one.

Second step

The first step focused on breaking the dependency on the AFComputedVariable side; the second step focuses on the VariableHolder instead. There are two ways to do this, pretty similar, and both of them end up with VariableHolder becoming a real value object, and to be precise, a wrapper around an array of Variable/LazyVariable objects [side note: even if we don't have to do that, it should be roughly equivalent to an ArrayIterator].

Variant 1

The code using VariableHolder's would get a LazyVariableComputer service injected. The code for updating the VariableHolder would be moved to the callers, and VariableHolder::getVar() would really just return a value, so it might be something like

if ( $myVar->isLazy() ) {
  $lazyVariableComputer->computeLazyVar( $varHolder, $myVar );
}
$value = $varHolder->getVar( $myVarName );

Now the VariableHolder wouldn't need to know about LazyVariableComputer, which is great, and the cyclic dependency is resolved.

Variant 2

This starts exactly like the first variant, but a new service is created, e.g. VariableRetriever. In turn, this service would have a LazyVariableComputer injected, and if using a factory, we can inject a VariableHolder as well. The callers of VariableHolder::getVar would then take just a VariableRetriever, and they'd use something like

$variableRetriever->getVar( $myVarName );

and VariableRetriever::getVar would be responsible for checking whether the variable is lazy, using the LazyVariableComputer if that's the case, and then returning the value. Important note: the code for updating the Holder with the new value might be moved inside VariableRetriever, but LazyVariableComputer would still need a VariableHolder to retrieve any other variable it needs (and actually, it'd also need a VariableRetriever to update the Holder; this doesn't seem doable this way).


Unless I missed something, this should be an acceptable way to resolve the problem; perhaps not perfect, but way better than the status quo. Obviously, it's not improbable that I missed something, given that this code is quite complicated.

Change 630307 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Add a LazyVariableComputer service

https://gerrit.wikimedia.org/r/630307

As noted in the task description, T213006 must be completed first.

This can be resolved with the two steps outlined below, where the first step addresses problem 2, and the second step addresses problem 1.

Patch above is for step 1.

First of all, an optional (but recommended, as we're going to need something like this anyway) change: encapsulate the current AFComputedVariable and AFPData classes into a new Variable class that would act as a wrapper for these two objects, providing a common interface to the VariableHolder.

Thinking about it again, this will be easier to implement after the migration.

and update the Holder with the newly-computed value (note, the Holder would also be used to grab any other required variable).

I was wrong, the value is updated by VariableHolder::getVar, so we only need a Holder for the second part.

The VariableHolder::getVar would poll the service locator for a LazyVariableComputer (no point in injecting it right now) and use that when requested the value of a lazy variable.

This was implemented in AFComputedVariable::compute for now, which is almost the same.

Important note: the code for updating the Holder with the new value might be moved inside VariableRetriever, but LazyVariableComputer would still need a VariableHolder to retrieve any other variable it needs (and actually, it'd also need a VariableRetriever to update the Holder; this doesn't seem doable this way).

Again, this is already inside the Holder, so no problem here (but LazyVariableComputer would still need a Holder).

The VariableHolder::getVar would poll the service locator for a LazyVariableComputer (no point in injecting it right now) and use that when requested the value of a lazy variable.

This was implemented in AFComputedVariable::compute for now, which is almost the same.

Uhm, not really the same, keeping it in AFComputedVariable makes testing quite hard, so it was moved to the VariableHolder as per original plan.

That said, I did miss something. First of all, "Variant 1" above doesn't look good, because we have too many callers, so let's assume "Variant 2" from now on. As explained above, we can create a VariableRetriever, that would have roughly the same methods of the current VariableHolder. However, at that point LazyVariableComputer would also need a VariableRetriever (and not a VariableHolder!) to read the other variables it's missing. This would be a cyclical dependency between two "good" objects, but still a cyclical dependency. I'm not really sure how to fix that. A possible is solution would be:

  1. Make LazyVariableComputer have its own method for each variable
  2. These methods would not require a VariableHolder/VariableRetriever, but just the variable(s) they need (as either AFPData or AFComputedVariable, here it'd be useful to have the common interface described as "introductory" step above)
  3. VariableRetriever (or, more generically, every caller of LazyVariableComputer) would need to find out the right method to call (depending on which variable it wants), and pass its dependencies

Note, in theory (1.) is not needed, as we might just make compute() variadic, taking an arbitrary amount of Variable's. However, the caller would still have to figure out the dependencies [1], so we may as well ask it to figure out the method name, too.

At any rate, this requires more effort than planned above, and given that this task is blocked anyway, I'm pausing the work here.


[1] - To be precise, this code might also live in the Computer, e.g.

LazyVariableComputer
public function getDependenciesForVar( string $varName ) { // Or $varMethod, or $afComputedVariable
  switch ( $varName ) {
    case 'first-var':
       return [ 'second-var', 'third-var' ]
    default:
      return [];
  }
}
VariableRetriever
$dependencies = $lazyComputer->getDependenciesForVar( $varName );
$args = [];
foreach ( $dependencies as $dep ) {
  $args[] = $variableHolder->getVar( $dep );
}
$value = $lazyComputer->compute( $varName, ...$args );

Change 634831 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Create a VariableManager

https://gerrit.wikimedia.org/r/634831

Change 630307 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add a LazyVariableComputer service

https://gerrit.wikimedia.org/r/630307

Change 634831 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Create a VariablesManager service

https://gerrit.wikimedia.org/r/634831

Memo: this is basically done, using callbacks is the final solution. The remaining thing to do is migrate external callers away from VariableHolder::getVar, which has a fallback to VariableManager, and then we can call this resolved.

Change 653823 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Remove deprecated VariableHolder::getVar

https://gerrit.wikimedia.org/r/653823

Change 653823 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Remove deprecated VariableHolder::getVar

https://gerrit.wikimedia.org/r/653823