For making articles searchable via topic, ElasticSearch needs to index them via ORES drafttopic scores. ORES predictions come in the form of a topic name -> probability map (example); the best tool to implement them in ElasticSearch seems to be a text field, with topics stored as words and scores as word frequencies (meaning probabilities need to be scaled and rounded to some integer range; 0-1000 should be accurate enough).
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | • Rileych | T240517 [EPIC] Growth: Newcomer tasks 1.1.1 (ORES topics) | |||
| Declined | None | T240558 Update ORES articletopic data score in ElasticSearch when an article gets edited | |||
| Resolved | dcausse | T240550 Add mapping for ORES topic field in ElasticSearch |
Event Timeline
Change 558577 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/mediawiki-config@master] [WIP] [cirrus] add elastic mapping for ores drafttopics
Change 559142 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/GrowthExperiments@master] Add ores_drafttopics field mapping
Change 558577 abandoned by DCausse:
[cirrus] add elastic mapping for ores drafttopics
Change 559142 abandoned by DCausse:
Add ores_drafttopics field mapping
Reason:
will move this inside CirrusSearch directly
Change 559911 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Add ores_drafttopics field mapping
Change 559911 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Add ores_drafttopics field mapping
This is still waiting for an in-place reindex before it is queryable. We were waiting on wmf.19 and an unrelated mapping change before running that. Now that that change is deployed along with this one we should be able to run the re-index this week.
Change 573003 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Enable ores_articletopics field for all wikis
Change 573003 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable ores_articletopics field for all wikis
Change 573056 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/vagrant@master] Enable ORES articletopic handling in cirrussearch role
Change 573056 merged by jenkins-bot:
[mediawiki/vagrant@master] Enable ORES articletopic handling in cirrussearch role
Thanks @EBernhardson!
Does that block T243357: Once the ORES articletopic - ElasticSearch pipeline is set up, update data about all articles? I figured we could do it now so we don't have to wait a week for the full dataset.
The reindex wont block anything, essentially elasticsearch will store all the data we send to it, but it's only searchable once the reindex process makes it to that wiki.