Page MenuHomePhabricator

[SPIKE] Section topics article-level relevance score
Closed, ResolvedPublic

Description

Goal

To enable section topics ranking.

Definition

General section topics relevance is a score that measures to what extent a given section topic helps summarize and understand the content of a given piece of text in a Wikipedia article.

NOTE: we must distinguish between article-level and section-level relevance.

Proposal

The baseline score can be computed through a custom TF-IDF weight, where:

  • the term frequency (TF) component follows a cross-wiki fashion thanks to section alignment. This is already implemented as the initial score, see T311750: Combine section extraction with blue links relevance score - UPDATE: the initial score is at the article level, as it doesn't take into account sections
  • the inverted document frequency (IDF) component is instead computed within the same wiki

Event Timeline

CBogen subscribed.

Removing Growth since this is about general section topics relevance and not section-level image suggestions relevance, and the Growth team has previously expressed interest in being involved in the latter but not the former. Feel free to add Growth back if you want to track the ticket.

Moving back to ready for development, T315851: Finalize code base should come before this ticket.

mfossati changed the task status from Open to In Progress.Aug 26 2022, 12:41 PM
CBogen renamed this task from [SPIKE] Section topics relevance score to [SPIKE] Section topics article-level relevance score.Oct 3 2022, 3:22 PM

@Cparle I've just opened https://gitlab.wikimedia.org/repos/structured-data/section-topics/-/merge_requests/3 and requested @MunizaA 's review.
While the merge request mainly caters for T318092: [M] Exclude certain sections from having topics in the section topics pipeline, it also includes this relevance score.

I suggest to wait for the review as an extra pair of eyes on this crucial task.

Additional review done, resolving.