HomePhabricator

Index Flow data

Authored by matthiasmullie.

Description

Index Flow data

To index all data, run:

$ php maintenance/FlowForceSearchIndex.php

Here's how the data is indexed in ES:

http://localhost:9200/enwiki_flow/topic/:
{
"namespace": 1,
"namespace_text": "Talk",
"pageid": 2,
"title": "Main Page",
"timestamp": "2014-02-07T01:42:57Z",
"update_timestamp": "2014-02-25T14:12:40Z",
"revisions": [ {

		"id": "rpvwvywl9po7ih77",
		"text": "topic title content",
		"source_text": "topic title content",
		"moderation_state": "",
		"timestamp": "2014-02-07T01:42:57Z",
		"update_timestamp": "2014-02-07T01:42:57Z",
		"type": "topic"

}, {

		"id": "ropuzninqgyf19ko",
		"text": "reply content",
		"source_text": "reply '''content'''",
		"moderation_state": "hide",
		"timestamp": "2014-02-25T14:12:40Z",
		"update_timestamp": "2014-02-25T14:12:40Z",
		"type": "post"

} ]
}

http://localhost:9200/enwiki_flow/header/:
{
"namespace": 1,
"namespace_text": "Talk",
"pageid": 2,
"title": "Main Page",
"timestamp": "2014-02-07T01:42:57Z",
"update_timestamp": "2014-02-07T01:42:57Z",
"revisions": [ {

                "id": "s1ijdhjhqeoq2b2r",
                "text": "header content",
		"source_text": "header content",
                "moderation_state": "",
                "timestamp": "2014-02-07T01:42:57Z",
		"update_timestamp": "2014-02-07T01:42:57Z",
		"type": "header"

} ]
}

We can do a full-text search, which can be filtered to only a (couple of)
page(s) or namespace(s). This should, for example, translate to (in ES):

search for text, but only in specific page:
API: api.php?action=flow&submodule=search&qterm=test&qtitle=Talk:Test
curl -XGET http://localhost:9200/enwiki_flow/topic/_search -d '{

"query": {
        "filtered": {
                "query": {
                        "term": { "revisions.text": "test" }
                },
                "filter": {
                        "term": { "pageid": 24 }
                }
        }
}

}'

The data stored allows us query for a bit more down the road, e.g.:

Example queries (to ES):

find moderated stuff
this will return topics, we can then find the specific posts ourselves
curl -XGET http://localhost:9200/enwiki_flow/topic/_search -d '{
"query": {

		"filtered": {
			"filter": {
				"term": { "revisions.moderation_state": "hide" }
			}
		}

}
}'

find unread posts (will probably just check posts more recent than last visit)
will return topics, we can then find the specific posts ourselves
curl -XGET http://localhost:9200/enwiki_flow/topic/_search -d '{
"query": {

		"filtered": {
			"filter": {
				"range": {
					"revisions.timestamp": { "gt": "2014-04-01T00:00:00Z" }
				}
			}
		}

}
}'

Fixes T78788

Bug: T78788
Change-Id: I4c5e868459ba5980b3a13ea402b2f5e2026178b2

Details

Committed
matthiasmullieApr 17 2015, 7:10 AM
Parents
rEFLWf7d58eeda493: Flow ES config
Branches
Unknown
Tags
Unknown
References
refs/changes/89/195889/7
Tasks
T78788: U7. Index Flow data in ES
ChangeId
I4c5e868459ba5980b3a13ea402b2f5e2026178b2