HomePhabricator

Description

Search

includes/api/ApiSearchFlow.php will be the API endpoint.
includes/Search/* provides the search backend (mimicking Cirrus' classes, but
adapted to our Flow data.)
includes/Formatter/SearchQuery.php is the glue between our API endpoint & the
search classes - it's similar to our other formatter classes.
maintenance/FlowForceSearchIndex.php is the script to index all Flow data in ES.

To index all data, run:

$ php maintenance/FlowForceSearchIndex.php

To search all indexed data, call the new API: e.g.

http://mediawiki.dev/api.php?action=flow&submodule=search&qterm=test

Here's how the data is indexed in ES:

http://localhost:9200/enwiki_flow/topic/:
{
"namespace": 1,
"namespace_text": "Talk",
"pageid": 2,
"title": "Main Page",
"timestamp": "2014-02-07T01:42:57Z",
"update_timestamp": "2014-02-25T14:12:40Z",
"revisions": [ {

		"id": "rpvwvywl9po7ih77",
		"text": "topic title content",
		"source_text": "topic title content",
		"moderation_state": "",
		"timestamp": "2014-02-07T01:42:57Z",
		"update_timestamp": "2014-02-07T01:42:57Z",
		"type": "topic"

}, {

		"id": "ropuzninqgyf19ko",
		"text": "reply content",
		"source_text": "reply '''content'''",
		"moderation_state": "hide",
		"timestamp": "2014-02-25T14:12:40Z",
		"update_timestamp": "2014-02-25T14:12:40Z",
		"type": "post"

} ]
}

http://localhost:9200/enwiki_flow/header/:
{
"namespace": 1,
"namespace_text": "Talk",
"pageid": 2,
"title": "Main Page",
"timestamp": "2014-02-07T01:42:57Z",
"update_timestamp": "2014-02-07T01:42:57Z",
"revisions": [ {

                "id": "s1ijdhjhqeoq2b2r",
                "text": "header content",
		"source_text": "header content",
                "moderation_state": "",
                "timestamp": "2014-02-07T01:42:57Z",
		"update_timestamp": "2014-02-07T01:42:57Z",
		"type": "header"

} ]
}

We can do a full-text search, which can be filtered to only a (couple of)
page(s) or namespace(s). This should, for example, translate to (in ES):

search for text, but only in specific page:
API: api.php?action=flow&submodule=search&qterm=test&qtitle=Talk:Test
curl -XGET http://localhost:9200/enwiki_flow/topic/_search -d '{

"query": {
        "filtered": {
                "query": {
                        "term": { "revisions.text": "test" }
                },
                "filter": {
                        "term": { "pageid": 24 }
                }
        }
}

}'

The data stored allows us query for a bit more down the road, e.g.:

Example queries (to ES):

find moderated stuff
this will return topics, we can then find the specific posts ourselves
curl -XGET http://localhost:9200/enwiki_flow/topic/_search -d '{
"query": {

		"filtered": {
			"filter": {
				"term": { "revisions.moderation_state": "hide" }
			}
		}

}
}'

find unread posts (will probably just check posts more recent than last visit)
will return topics, we can then find the specific posts ourselves
curl -XGET http://localhost:9200/enwiki_flow/topic/_search -d '{
"query": {

		"filtered": {
			"filter": {
				"range": {
					"revisions.timestamp": { "gt": "2014-04-01T00:00:00Z" }
				}
			}
		}

}
}'

Fixes T78788
Fixes T78789
Fixes T78791

Bug: T78788
Bug: T78789
Bug: T78791
Change-Id: Ib1229acc09b26c65dc9f08c7ea0f19b8191b799b