HomePhabricator

Updated mediawiki/extensions Project: mediawiki/extensions/Flow…

Description

Updated mediawiki/extensions Project: mediawiki/extensions/Flow cfc10edca88cd07e68ba338ccc0365b660c6ffde

Index Flow data

To index all data, run:

$ php maintenance/FlowForceSearchIndex.php

Here's how the data is indexed in ES:

http://localhost:9200/enwiki_flow/topic/:
{
"namespace": 1,
"namespace_text": "Talk",
"pageid": 2,
"title": "Main Page",
"timestamp": "2014-02-07T01:42:57Z",
"update_timestamp": "2014-02-25T14:12:40Z",
"revisions": [ {

		"id": "rpvwvywl9po7ih77",
		"text": "topic title content",
		"source_text": "topic title content",
		"moderation_state": "",
		"timestamp": "2014-02-07T01:42:57Z",
		"update_timestamp": "2014-02-07T01:42:57Z",
		"type": "topic"

}, {

		"id": "ropuzninqgyf19ko",
		"text": "reply content",
		"source_text": "reply '''content'''",
		"moderation_state": "hide",
		"timestamp": "2014-02-25T14:12:40Z",
		"update_timestamp": "2014-02-25T14:12:40Z",
		"type": "post"

} ]
}

http://localhost:9200/enwiki_flow/header/:
{
"namespace": 1,
"namespace_text": "Talk",
"pageid": 2,
"title": "Main Page",
"timestamp": "2014-02-07T01:42:57Z",
"update_timestamp": "2014-02-07T01:42:57Z",
"revisions": [ {

                "id": "s1ijdhjhqeoq2b2r",
                "text": "header content",
		"source_text": "header content",
                "moderation_state": "",
                "timestamp": "2014-02-07T01:42:57Z",
		"update_timestamp": "2014-02-07T01:42:57Z",
		"type": "header"

} ]
}

We can do a full-text search, which can be filtered to only a (couple of)
page(s) or namespace(s). This should, for example, translate to (in ES):

search for text, but only in specific page:
API: api.php?action=flow&submodule=search&qterm=test&qtitle=Talk:Test
curl -XGET http://localhost:9200/enwiki_flow/topic/_search -d '{

"query": {
        "filtered": {
                "query": {
                        "term": { "revisions.text": "test" }
                },
                "filter": {
                        "term": { "pageid": 24 }
                }
        }
}

}'

The data stored allows us query for a bit more down the road, e.g.:

Example queries (to ES):

find moderated stuff
this will return topics, we can then find the specific posts ourselves
curl -XGET http://localhost:9200/enwiki_flow/topic/_search -d '{
"query": {

		"filtered": {
			"filter": {
				"term": { "revisions.moderation_state": "hide" }
			}
		}

}
}'

find unread posts (will probably just check posts more recent than last visit)
will return topics, we can then find the specific posts ourselves
curl -XGET http://localhost:9200/enwiki_flow/topic/_search -d '{
"query": {

		"filtered": {
			"filter": {
				"range": {
					"revisions.timestamp": { "gt": "2014-04-01T00:00:00Z" }
				}
			}
		}

}
}'

Fixes T78788

Bug: T78788
Change-Id: I4c5e868459ba5980b3a13ea402b2f5e2026178b2

Details

Provenance
jenkins-botAuthored on
Gerrit Code ReviewCommitted on Apr 17 2015, 6:49 PM
Parents
rMEXTe04047c50af5: Updated mediawiki/extensions Project: mediawiki/extensions/CentralAuth…
Branches
Unknown
Tags
Unknown
Tasks
T78788: U7. Index Flow data in ES
ChangeId
I4c5e868459ba5980b3a13ea402b2f5e2026178b2

Event Timeline

Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT35d02c83026a: Updated mediawiki/extensions Project: mediawiki/extensions/Flow… (authored by jenkins-bot <jenkins-bot@gerrit.wikimedia.org>).Apr 17 2015, 6:49 PM