==Goal==
To enable ranking of section topics **at the section level**.
==Definition==
**Section-level** topic relevance is a score that measures to what extent a given section topic helps summarize and understand the content of a given **section** text in a Wikipedia article.
==Proposal 1==
Build on top of the **article-level** one, computed in a **cross-wiki** fashion as per {T314863}'s proposal and implemented as per T314863#8208535.
This requires a large adaptation of the current logic: we need machine-learned section alignments to compute the topic frequency across Wikis.
===Caveats===
The following points were raised by @matthiasmullie :
> - Machine-learning section alignment might still not be good enough: it is quite realistic that some of the most relevant topics for a page already appear in the leading paragraph (and will continue to be discussed intensively in other sections, where we will no longer be able to pick up on them), and it is quite realistic that this would be true for many other wikis as well - so even if we manage to accurately align sections, we may still not be able to find the topic in any of the wikis (because in all/most of them, it was already linked only once, earlier in the page)
> - It’s also quite plausible that aligning sections only gets us that far. In wikis with fewer coverage/detail, we may not be able to find a matching section (because that content isn’t there, or just a minor snippet within a larger other section)
> - If the above doesn’t work out, we’d have to start to figure out actual (re)occurrences of a topic within a full article. String matching likely won’t cut it, so some form of language processing may be needed - which is essentially what we intended to avoid in the first place by going with links-based topic identification.
==Proposal 2==
Compute the topic frequency **within** the same wiki, pretty much as we do with the IDF component.
This requires a small adaptation of the current logic.
===Caveats===
The score is potentially biased.
See the following discussion with @matthiasmullie:
> a “topic” is extracted based on links, which (per wiki guidelines) only appear once. As such, I wonder how reliable the frequency (never more than 1, for any topic mentioned) is
Currently a given topic may appear more than once in a given page, due to links occurring in templates, typically infoboxes. This is something that we plan to filter as per {T318092}, but we may want to revert this decision.
Are we sure the [guideline](https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking#Duplicate_and_repeat_links) is actually enforced on all Wikipedias? From my manual checks on en, fr, it, pt, and es it looks so, but is that something that we can reliably check?
> I doubt whether it’s enforced, but it’s certainly recommended to minimize duplicate links, so we should expect far fewer links (an expectation of max of 1) than the amount of times the topic is actually present
Acceptance Criteria
[] Evaluate the two solutions - document their viability and level of effort in this ticket
[] Choose which solution to move forward with and document that in T318324