T314863: [SPIKE] Section topics article-level relevance score has led to a first working computation of article-level relevance scores.
This task lists the next steps, and may be broken down into subtasks.
Engineering
- compute the full TF, i.e., raw TF / count of all topic QIDs per page QID - done in commit 47df610d
- add constant columns at the very end of the pipeline - done in commit a8f32455
- don't select TF-IDF component columns, just the final score - blocked by T318348: [SPIKE] Section-level topic relevance score
- refactor the relevance computation function - done in commit a8f32455
- consider refactoring tests with chispa - done in commit 415329db
Product
- decide whether we should keep null page QIDs and/or topic QIDs [no]
- decide whether we should also implement section-level relevance [yes, but not part of this ticket - will be part of T318348 and T318324]
-
manual checks on data samples- see T325318: Data check with a focus on article-level topic relevance score instead