Page MenuHomePhabricator

Add ability to create derivative features in ltr plugin
Closed, ResolvedPublic

Description

It would be useful to be able to combine learning to rank features in novel ways. Some plausible examples:

  • title match / num_terms
  • title match * text match
  • title match * popularity
  • etc.

There are two main options for doing this, not sure which is better:

  • Turn feature computation into a graph: Essentially model the dependencies between features with the graph so that the necessary features are computed before the derivative feature is computed
  • Split feature computation into 2 phases. First a phase which does all the index queries, and then a second phase which utilizes those scores to compute derivative features

Event Timeline

This one is tricky.

1st option is certainly better:

  • when building the featureset we'll need to detect dependencies and add feature in an order that guarantee that dependent feature have been set in the feature vector before. It should fail on cyclic deps

The second approach might work but will disallow us to use dependencies between derived features.

Implementation wise I still don't know how to achieve this:

  • how to describe the math operation? Can we reuse the expression script language? This will have to be extremely abstract because we cannot directly depend on other modules/plugins, only classes from core are available at compile time.
  • How to write a scorer that is dependent on the featureVector?
  • Ideally dependencies should be resolved to ordinals early, when accessing a value in the featureVector it'd be good not to resolve the feature ordinal from the featureNames HashMap on every doc*feature*deps.
debt triaged this task as Medium priority.Aug 22 2017, 5:36 PM
debt moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
debt subscribed.

This has been merged

debt added a project: Discovery-ARCHIVED.