Page MenuHomePhabricator

[M] Exclude date format topics from section topics pipeline
Closed, ResolvedPublic

Description

Context
Based on the section topics sample for evaluation (especially noticeable in RU and FR), there are date topics in the following formats: day (dd), day month (dd month), month, year (yyyy). These topics sometimes have high scores and in general are not meaningful. And as such we would like to exclude them.

Example
For example, in In this article-> section "Клубная карьера" we identify more than 20 section topics dates: "18 апреля", "2010" e.g.

AC

  • There are no topics in date formats from section topics pipeline

Details

TitleReferenceAuthorSource BranchDest Branch
Remove links to 'point in time' topics using a denylist of qids.repos/structured-data/section-topics!15xcollazoT323597-exclude-dates-take-2main
Customize query in GitLab

Event Timeline

AUgolnikova-WMF renamed this task from Exclude date format topics to Exclude date format topics from section topics pipeline.Nov 22 2022, 2:38 PM
AUgolnikova-WMF updated the task description. (Show Details)
MarkTraceur renamed this task from Exclude date format topics from section topics pipeline to [M] Exclude date format topics from section topics pipeline.Dec 1 2022, 5:58 PM

xcollazo opened https://gitlab.wikimedia.org/repos/structured-data/section-topics/-/merge_requests/15

Draft: Remove links to 'point in time' topics using a denylist of qids.