Experiment with different grouping of queries that get fed into the DBN
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	EBernhardson
	Oct 5 2017, 5:00 PM

Description

To improve the quality of training data we could try a few things:

For the most part we don't care about ordering of terms. Currently we pre-group queries that have exact matches on the stemmed query string, but it seems we could try sorting the terms within the stemmed query as well

[NO TASK] Experiment with different thresholds for minimum group size to be fed into the DBN. Currently we filter out groups with less than 10 sessions, but we could experiment with both larger and smaller groups to see if it improves the training data.

Unfortunately evaluating these changes to the training data is difficult. Best might be to simply train up models and run AB tests with them, as long as the results don't look particularly bad.

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T174064 [FY 2017-18 Objective] Implement advanced search methodologies
Resolved	EBernhardson	T161632 [Epic] Improve search by researching and deploying machine learning to re-rank search results
Declined	None	T177520 Experiment with different grouping of queries that get fed into the DBN

Event Timeline

EBernhardson created this task.Oct 5 2017, 5:00 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 5 2017, 5:00 PM

debt added a parent task: T161632: [Epic] Improve search by researching and deploying machine learning to re-rank search results.Oct 5 2017, 5:08 PM

This will ramp up after T177302 is completed

debt mentioned this in T177519: Build and AB test an ML Model with all the features exploded into individual pieces.Oct 5 2017, 5:11 PM

Moving to the sprint board as T177302 is done and we've already done a portion of this in T176493.

dcausse renamed this task from Experiement with different grouping of queries that get fed into the DBN to Experiment with different grouping of queries that get fed into the DBN.Nov 20 2017, 9:57 AM

dcausse claimed this task.

dcausse moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.

Moving back to backlog as this task actually covers 2 experiments and thought it was new:

dbn group sizing : 20 and 35 (https://gerrit.wikimedia.org/r/#/c/387586/): do we have a task to run the report on this data?
grouping by reording query terms

Should we experiment with smaller group size 5?
Should we experiment with the second suggestion to reorder query terms?

dcausse updated the task description. (Show Details)Nov 20 2017, 12:54 PM

From @chelsyx:

In T176493#3776047, @chelsyx wrote:

Report for the DBN test: https://analytics.wikimedia.org/datasets/discovery/reports/Experiement_with_different_grouping_of_queries_that_get_fed_into_the_DBN.html

debt moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.Nov 20 2017, 10:06 PM

In T177520#3774172, @dcausse wrote:

Moving back to backlog as this task actually covers 2 experiments and thought it was new:

dbn group sizing : 20 and 35 (https://gerrit.wikimedia.org/r/#/c/387586/): do we have a task to run the report on this data?

grouping by reording query terms

Should we experiment with smaller group size 5?

Depends on the results of the previous AB test. I think since both 20 and 35 were noticably worse than our arbitrarily chosen default of 10, it's worth testing groups with a smaller size.

Should we experiment with the second suggestion to reorder query terms?

I think this is worthwhile. I do wonder if it will invalidate some of the things we learn about query group sizing though. I suppose it really depends on what hidden variable is changing the user behaviour:

Is less training data from larger group sizes resulting in less optimization? I don't think this is the case, as the offline ndcg@10 of the model shows a larger increase over baseline with the larger query groupings.
Are the larger minimum dbn groups throwing out important long tail (or middle tail? there is a huge long tail after 10 sessions per group still..) training data? My intuition is that this is whats happening, but I'm not sure how to validate this.

Assuming the second is true it seems sorting the terms may allow us to get even more of the long tail information into the DBN. On the other hand it may have an unintended effect of putting together queries that aren't as related as we hope. I'm optimistic the second-stage of grouping will negate any poor handling here.

I also think this can at least partly be tested offline:

Measure the number of first and second stage (or only second stage? first might only be interesting but not particularly useful) groups for both methods of normalizing. Might be interesting to also pull basic count statistics (min/max/std, from spark df.describe()) on sessions per group.
Could perhaps look at a small sampling of the groups to decide if they look to be any better/worse, or if there are obvious cases where wrong things are being grouped together.

Maybe more? not sure.

As discussed in today's sprint planning, we'll need to run this test again with a smaller sample size.

to clarify, the smaller is in the sizes of the groups. Basically 10 (the default) performed noticably better than both grouping sizes we tested (20, 35). We will run again with smaller sizes (tbd, maybe 5, 8 and 15? 12? i dunno)

EBernhardson moved this task from not in use - please delete to Incoming on the Discovery-Search (Current work) board.Feb 6 2018, 9:40 PM

EBernhardson removed a project: Discovery-Search (Current work).May 3 2018, 8:48 PM

Closing for now. This might be reactivated as part of a larger initiative to improve MLR, but does not make much sense on its own.

Experiment with different grouping of queries that get fed into the DBNClosed, DeclinedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Experiment with different grouping of queries that get fed into the DBN
Closed, DeclinedPublic
Actions

Related Objects
Search...