Result table should include:
- wiki search was performed on
- normalized search query
- page id of the result
- relevance label
Result table should include:
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Add DBN training | search/MjoLniR | master | +344 -130 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Invalid | None | T174064 [FY 2017-18 Objective] Implement advanced search methodologies | |||
| Resolved | EBernhardson | T161632 [Epic] Improve search by researching and deploying machine learning to re-rank search results | |||
| Resolved | EBernhardson | T162053 backend data engineering and plumbing for LTRank | |||
| Duplicate | EBernhardson | T162075 Oozie job for merging click data with DBN relevance scores |
Perhaps this should simply be the output of the DBN job (T162056)? Not sure, but on review it seems like we have quite a few intermediate data steps that might not be necessary. On the other hand I think it's likely we want some intermediate steps, so if one of those steps has a problem we only have to run from there forward in the pipeline, rather than running the whole pipeline from the beginning.
Change 347038 had a related patch set uploaded (by EBernhardson):
[search/MjoLniR@master] Add DBN training
attached patch is only half the work though, this part adds the DBN training to mjolnir, but it doesn't setup the oozie half of the pipeline. I wonder if we should be have separate tasks for these, as the oozie pipelines will only make sense once most of the initial code in mjolnir is ready to start running a complete pipeline.