Page MenuHomePhabricator

Add mapping for ORES topic field in ElasticSearch
Closed, ResolvedPublic

Description

For making articles searchable via topic, ElasticSearch needs to index them via ORES drafttopic scores. ORES predictions come in the form of a topic name -> probability map (example); the best tool to implement them in ElasticSearch seems to be a text field, with topics stored as words and scores as word frequencies (meaning probabilities need to be scaled and rounded to some integer range; 0-1000 should be accurate enough).

Event Timeline

Tgr created this task.Dec 12 2019, 10:57 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 12 2019, 10:57 AM

Change 558577 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/mediawiki-config@master] [WIP] [cirrus] add elastic mapping for ores drafttopics

https://gerrit.wikimedia.org/r/558577

EBernhardson triaged this task as Medium priority.

Change 559142 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/GrowthExperiments@master] Add ores_drafttopics field mapping

https://gerrit.wikimedia.org/r/559142

Change 558577 abandoned by DCausse:
[cirrus] add elastic mapping for ores drafttopics

https://gerrit.wikimedia.org/r/558577

Change 559142 abandoned by DCausse:
Add ores_drafttopics field mapping

Reason:
will move this inside CirrusSearch directly

https://gerrit.wikimedia.org/r/559142

Change 559911 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Add ores_drafttopics field mapping

https://gerrit.wikimedia.org/r/559911

Change 559911 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Add ores_drafttopics field mapping

https://gerrit.wikimedia.org/r/559911

Tgr added a comment.Feb 13 2020, 12:19 AM

@dcausse this task has been resolved, right?

This is still waiting for an in-place reindex before it is queryable. We were waiting on wmf.19 and an unrelated mapping change before running that. Now that that change is deployed along with this one we should be able to run the re-index this week.

Change 573003 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Enable ores_articletopics field for all wikis

https://gerrit.wikimedia.org/r/573003

Change 573003 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable ores_articletopics field for all wikis

https://gerrit.wikimedia.org/r/573003

Change 573056 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/vagrant@master] Enable ORES articletopic handling in cirrussearch role

https://gerrit.wikimedia.org/r/573056

Change 573056 merged by jenkins-bot:
[mediawiki/vagrant@master] Enable ORES articletopic handling in cirrussearch role

https://gerrit.wikimedia.org/r/573056

This reindex process is running, will probably finish late next week

Tgr added a comment.Feb 20 2020, 6:13 PM

Thanks @EBernhardson!

Does that block T243357: Once the ORES articletopic - ElasticSearch pipeline is set up, update data about all articles? I figured we could do it now so we don't have to wait a week for the full dataset.

The reindex wont block anything, essentially elasticsearch will store all the data we send to it, but it's only searchable once the reindex process makes it to that wiki.

Tgr added a comment.Apr 2 2020, 2:45 PM

Search is working now on zhwiki so probably finished?

dcausse added a subscriber: Mstyles.Apr 2 2020, 2:48 PM

@Tgr yes the reindex is done but beware that it's quite common for some wikis to fail, @Mstyles will double check the ones that failed and rerun the reindex on them.

TJones closed this task as Resolved.Apr 29 2020, 2:37 PM