Page MenuHomePhabricator

[L] Crystallize article-level section topics relevance score
Closed, ResolvedPublic

Description

T314863: [SPIKE] Section topics article-level relevance score has led to a first working computation of article-level relevance scores.
This task lists the next steps, and may be broken down into subtasks.

NOTE: QID = Wikidata identifier

Engineering

Product

CC @AUgolnikova-WMF @CBogen.

Event Timeline

@mfossati I'm not sure I understand what you mean when you say that T314863 led to article-level relevance scores. Also not sure what keeping null page QIDs means. Maybe we need to discuss when you're back from vacation at the next Section Topics meeting on Sept 19?

@mfossati I'm not sure I understand what you mean when you say that T314863 led to article-level relevance scores.

Given a topic, relevance can be measured with respect to the article or to the section.
With the SEO use case in mind, the first implementation is at the article level. This would enable a ranking of topics per article.
The section-level score can be computed separately.

Also not sure what keeping null page QIDs means.

Page titles or topic titles don't necessarily have a QID on Wikidata, resulting in null values in the dataset. The score is based on QIDs, so we can't compute it if they're null.

Maybe we need to discuss when you're back from vacation at the next Section Topics meeting on Sept 19?

Yes, I'll add that on the agenda document.

@mfossati I'm not sure I understand what you mean when you say that T314863 led to article-level relevance scores.

Given a topic, relevance can be measured with respect to the article or to the section.
With the SEO use case in mind, the first implementation is at the article level. This would enable a ranking of topics per article.
The section-level score can be computed separately.

IMO T314863 should not be considered complete until it also has a section-level relevance score. That was my understanding of the purpose of that ticket, since that is what is needed for our main use case (section-level image suggestions) and what will also be needed for the schema.org use case (since the goal is to display information about each section in Google).

Also not sure what keeping null page QIDs means.

Page titles or topic titles don't necessarily have a QID on Wikidata, resulting in null values in the dataset. The score is based on QIDs, so we can't compute it if they're null.

Sounds like there's no reason to keep those, but let's discuss further when we meet.

CBogen updated the task description. (Show Details)

Moving back to the backlog until T314863 is complete and we can estimate

CBogen updated the task description. (Show Details)
CBogen renamed this task from Crystallize section topics relevance score to [L] Crystallize article-level section topics relevance score.Sep 27 2022, 3:33 PM
mfossati changed the task status from Open to In Progress.Oct 19 2022, 10:02 AM
mfossati claimed this task.
mfossati moved this task from Doing to Blocked on the Structured-Data-Backlog (Current Work) board.

Moving to blocked: discarding the TF-IDF component columns will be done after T318348: [SPIKE] Section-level topic relevance score.