Page MenuHomePhabricator

StructuredDiscussions posts are not indexed in builtin search
Open, HighPublic

Description

This refers to the site search (Special:Search)

Splitting from T59512.

(In reply to bug 57512 comment #12)

>unless we can get lucky in being able to find them with the
>search engine without false positives

You'd have to get really lucky, since from what I understand, the search
engine
does not index flow posts.

>The non-updating of link tables also makes it impossible to search for external
>links (e.g. to find past discussions of the reliability of a source), or to
>find discussions that link to some page, or to find discussions that use an
>image, and so on.

Or to find everywhere the spambots added linkspam to a specific website

(In reply to bug 57512 comment #13)

(In reply to comment #12)
> >unless we can get lucky in being able to find them with the
> >search engine without false positives
>
> You'd have to get really lucky, since from what I understand, the search
> engine
> does not index flow posts.

Sounds like that should be another blocker for bug 60178. Is there a bug for
that yet?


Merge from Trello:

This card is for engineer to pick up Matthias work, answer more questions from checklist below, then meet with Danny/S/Nick and Nik and Chad (and other interested developers) to figure out how to get Flow Topic search results in site search.


Questions to answer
  • how could CirrusSearch index Topic text?
  • how would edits & replies to a Topic trigger CirrusSearch reindexing?
  • Could intitle:math work for topic's title rather than only the Topic: *S0n4m7q632z8ycpv* page title?
  • will CirrusSearch index HTML of pages or wikitext? Note wikitext will match content text expanded using latest templates whereas posts don't use latest templates until re-edited.
  • will Topic search work for wikis using Flow but not CirrusSearch extension?
  • can relevance ranking of search results prefer topic title, then reply text?
  • what about the other Full text search features like prefix and Topic:math ?
  • how can we get Nik and Chad to do all the work?
  • Should https://trello.com/c/dTRhYBdY be a separate card?
  • can site search show topic h2 title instead of Topic: *S0n4m7q632z8ycpv* ?
  • can we support existing prefix:Talk:Some_Flow_board to search within topics that appeared on a particular Flow board?
  • can we support prefix:Talk:Beta_Features/ to search across all the Beta features Flow boards that are subpages of Talk:Beta_Features?
  • will edits made beyond Topic: be included (boards description)

See also:

Details

Reference
bz60493

Related Objects

Event Timeline

bzimport raised the priority of this task from to High.
bzimport set Reference to bz60493.
bzimport added a subscriber: Unknown Object (MLST).
matmarex created this task.Jan 27 2014, 9:09 PM

bingle-admin wrote:

The WMF core features team tracks this bug on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/flow/cards/737, but people from the community are welcome to contribute here and in Gerrit.

Updated Trello card in backlog: https://trello.com/c/VnlAS6vN

Quiddity removed a subscriber: Maryana.Dec 19 2014, 1:42 AM
Quiddity updated the task description. (Show Details)Feb 20 2015, 6:54 PM
Quiddity set Security to None.
demon removed a subscriber: demon.Aug 19 2015, 3:35 PM
Qgil added a subscriber: Qgil.May 11 2016, 10:10 AM

High priority, assigned to none? One of the arguments is probably wrong. :)

DannyH removed a subscriber: DannyH.May 11 2016, 6:41 PM
Elitre added a subscriber: Elitre.May 23 2016, 3:36 PM
Tgr updated the task description. (Show Details)Mar 8 2017, 9:25 PM
Tgr updated the task description. (Show Details)
Tgr updated the task description. (Show Details)
Qgil removed a subscriber: Qgil.Jun 13 2017, 3:56 PM
Trizek-WMF updated the task description. (Show Details)Sep 28 2017, 3:44 PM
Pasleim added a subscriber: Pasleim.
Johnywhy added a subscriber: Johnywhy.EditedApr 12 2018, 4:06 PM

+1
https://www.mediawiki.org/wiki/Topic:Ub3lef1vgnrccfb6
job for StructuredDiscussions team?

Niridya renamed this task from Flow posts are not indexed in builtin search to StructuredDiscussions posts are not indexed in builtin search.Jun 23 2018, 12:39 PM
Restricted Application added a project: Growth-Team. · View Herald TranscriptJul 18 2018, 7:00 PM
SBisson moved this task from Inbox to Triaged but Future on the Growth-Team board.Jul 20 2018, 6:06 PM
Alsee added a subscriber: Alsee.Oct 3 2018, 4:47 PM

The way Flow splits a page into a multitude of independent objects is a notable issue. It's common to try to find something based on fragments that come from different comments or different parts of the page. Hopefully this will help answer some of the questions asked in this task. Fragments of a search may include any or all of the following:

  • "noticeboard" matches in page title
  • "dispute resolution" matches in board description
  • "foo" matches in topic title
  • "bar" matches in a comment
  • "johndoe" matches author of a different comment
  • "january" matches in the timestamp of a different comment
  • "baz" matches in topic summary
  • "insource:spammer.com" match hidden within a comment, in the board description, in the topic summary, in the topic title, in a username. An insource search could credibly match in the timestamp, although I find it hard to picture a use case requiring insource while wanting flow_timestamp hits.
  • Plus anything I missed.
Jony added a subscriber: Jony.Jan 1 2019, 11:01 PM