Page MenuHomePhabricator

Expose all slots to the search interface
Open, NormalPublic

Description

Search engines such as Cirrus should examine the content of all slots when updating the search index.

Related Objects

StatusAssignedTask
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenNone
ResolvedAbit
OpenNone
OpenNone
OpenNone
OpenNone
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolvedppelberg
ResolvedKrinkle
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone

Event Timeline

daniel created this task.Mar 19 2018, 4:02 PM
daniel triaged this task as Normal priority.
Smalyshev added subscribers: dcausse, EBernhardson.

Two big questions here are:

  1. One document or multiple documents? (I think the trend is for now for one document)
  2. If the answer is one document, how to reconcile slots with potential intersections? I.e., if both slots want to put something in opening_text, what happens? Etc.
Restricted Application added a project: Discovery. · View Herald TranscriptMar 19 2018, 6:12 PM
  1. If the answer is one document, how to reconcile slots with potential intersections? I.e., if both slots want to put something in opening_text, what happens? Etc.

For now, I'd blindly concatenate. That's the baseline.
We have to answer similar questions for a lot of things, including the generation of the HTML the user will see. I plan an RFC about that question.

  1. At least for cirrus, it pretty much needs to be one document if we want any kind of interaction between fields of multiple content types.
  1. I think, again only wrt cirrus, this is going to depend heavily on how those fields get into the queries issued. The current method with a variety of hard coded field names really pushes for the ability to overwrite, such as work on file media info which will overwrite opening_text field on file pages. The two will have to be figured out in parallel i suppose.

(sorry I'm very new to MCR)
How will this work regarding namespaces?
I mean can there be a mix of namespaces here or is there a single top level namespace somewhere?

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Apr 23 2018, 7:28 AM

Should we set up some kind of meeting to sync on this and develop strategy? Maybe on the hackathon? I am personally still rather fuzzy on how this whole thing is supposed to work and on MCR details too, and I am suspecting I am not the only one :)

Cparle added a subscriber: Cparle.May 1 2018, 9:13 AM

+1 ... if we're setting up a meeting please count me in (I'll be at the hackathon)

daniel removed a project: Epic.Oct 4 2018, 9:46 AM

not an empic, this is a concrete task

FWIW for the initial release of the SDoC multi-lingual captions stuff, I used the onSearchDataForIndex hook to write search data for MediaInfo slots

FWIW for the initial release of the SDoC multi-lingual captions stuff, I used the onSearchDataForIndex hook to write search data for MediaInfo slots

Update: We switched to CirrusSearchBuildDocumentParse in 2a0610b8a2d05d872878da292117f140520f5098.

Update: We switched to CirrusSearchBuildDocumentParse in 2a0610b8a2d05d872878da292117f140520f5098.

That hook's interface is actually not MCR compatible, since it only takes a singe Content object. I commented on the patch here in phab.

I worked around that in MediaInfo by using WikiPage::factory( $title )->getRevisionRecord() ... ought we raise a ticket to make the hook MCR compatible? Not really sure what's using the hook, so I'm not sure how to proceed ...

@Cparle this ticket here *is* about making sure all slots are passed to cirrus. Cirrus should then also pass them on via its own hooks. Changing a hook signature isn't trivial though, it's generally better to introduce a new hook.

I think this ticket here is sufficient to track the need to do this. Your workaround should be fine for MediaInfo for now. Perhaps, add a comment to your hook handler that points to this ticket.

Change 472647 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/WikibaseMediaInfo@master] Adding note about workaround pending T190066

https://gerrit.wikimedia.org/r/472647

Change 472647 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Adding note about workaround pending T190066

https://gerrit.wikimedia.org/r/472647

greg added a project: Multimedia.Mar 7 2019, 10:59 PM
Restricted Application added a project: Multimedia. · View Herald TranscriptApr 3 2019, 4:22 PM
Ramsey-WMF moved this task from Untriaged to Tracking on the Multimedia board.Apr 3 2019, 4:23 PM