Page MenuHomePhabricator

Log Wikidata Query Service queries to the event gate infrastructure
Closed, ResolvedPublic

Description

Event gate is the standard infrastructure for... well, event logging. We should switch Wikidata Query Service's logging to the system, to ease in data analysis by using our existing workflows.

  • Add a schema to event-schema
  • Adapt wdqs to collect and send events to event gate
NOTE: the reason we would like to have a dedicated dataset vs webrequest (filtered using the sparql tag) is that more than 1% of the queries we receive on the backend are POST requests. Such queries cannot be inspected using webrequest.

Event Timeline

Deskana raised the priority of this task from to Needs Triage.
Deskana updated the task description. (Show Details)
Deskana subscribed.
Deskana set Security to None.
Deskana moved this task from Needs triage to WDQS on the Discovery-ARCHIVED board.

Which events should we be logging via EventLogging?

Deskana claimed this task.

I think our discussions last week led to us deciding that we didn't want to do this. If I'm wrong, this can be reopened.

dcausse renamed this task from Switch Wikidata Query Service logging to EventLogging infrastructure to Log Wikidata Query Service queries to the event gate infrastructure.Oct 2 2019, 8:12 PM
dcausse reopened this task as Open.
dcausse removed Deskana as the assignee of this task.
dcausse updated the task description. (Show Details)
dcausse triaged this task as Medium priority.

Change 541563 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/event-schemas@master] Add wdqs/sparqlquery 0.0.1

https://gerrit.wikimedia.org/r/541563

Q: I assume you'll want this to go into the eventgate-analytics instance, yes? I.e. You won't be building production services on this stream, just using it for offline analytics (well, offline performance testing).

@Ottomata absolutely this is for analysis purposes

@Ottomata I updated the task description to indicate the steps needed to make this happen, I probably missed something so could you take a quick look and let us know if I missed something important here (e.g. do we need to do something on the refinery/hadoop side to create the hive table and/or add a purge mechanism for the 90days retention).

do we need to do something on the refinery/hadoop side to create the hive table

Depending on the name of the stream, it will probably need whitelisted in camus and refine job configs.

and/or add a purge mechanism for the 90days retention

I believe the tables are (or should be) purged by default so if you just want the data dropped after 90 days it will be.

Change 543004 had a related patch set uploaded (by DCausse; owner: DCausse):
[wikidata/query/rdf@master] WIP: Add sparql query event gate generator

https://gerrit.wikimedia.org/r/543004

Added link to the task T236251: Add header returning time millis to first solution similar to TTFB measured in Blazegraph.
The corresponding header X-FIRST-SOLUTION-MILLIS might be very useful while analyzing long-running queries and also comparing queries performance. If the time reported by Blazegraph is significantly less than total time of the query execution, it might be caused by:

  1. Total result is very large one, and it has consumed much time on serialization/deserialization (that is basically OK situation, if the number of results are large)
  2. Some connectivity issues, over network and/or inter-process. In this case the metric X-FIRST-SOLUTION-MILLIS will be the same for subsequent calls, but total query time vary over time.
  3. Query might be very unselective, but additional constraints filter out many potential solutions, so the first solution is computed fast but to collect all the asked results it takes much time. Such queries are subject to analysis and might need fixing in the Blazegraph code or data layout.

The header X-FIRST-SOLUTION-MILLIS will return number of milliseconds spent before first solution is available to be written in the response payload (that is the last time when the headers might be added to the result). The value shell not exceed query timeouts sent on jetty and query level.

Change 541563 merged by Ottomata:
[mediawiki/event-schemas@master] Add sparql/query 1.0.0

https://gerrit.wikimedia.org/r/541563

sparql/query schema is merged. We'll need to do an eventgate-analytics k8s deploy before it can be used. Let me know when you want to start testing this. We can also update eventgate-analytics in deployment-prep first if you can/want to test there.

@Igorkim78 thanks for the suggestion, we could perhaps log all response HTTP headers (we already log all request HTTP headers).
@Ottomata thanks for the merge, I'll perhaps send a quick patch to add this before asking you to deploy anything.

Change 545563 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/event-schemas@master] Add http.response_headers to sparql/query/1.0.0

https://gerrit.wikimedia.org/r/545563

Change 545563 merged by Ottomata:
[mediawiki/event-schemas@master] Add http.response_headers to sparql/query/1.0.0

https://gerrit.wikimedia.org/r/545563

Change 543004 merged by jenkins-bot:
[wikidata/query/rdf@master] Add sparql query event gate generator

https://gerrit.wikimedia.org/r/543004

Change 548764 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate-analytics - stream config for new stream wdqs.sparql-query

https://gerrit.wikimedia.org/r/548764

Change 548779 had a related patch set uploaded (by DCausse; owner: DCausse):
[wikidata/query/rdf@master] Make the stream name configurable for sparql/query events

https://gerrit.wikimedia.org/r/548779

Tested sending an event to eventgate in beta generated by the new code:

curl -XPOST -H"Content-Type: application/json; charset=UTF-8" http://deployment-eventgate-3.deployment-prep.eqiad.wmflabs:8192/v1/events -d '[{"namespace":"ns123","query":"select","format":"xml","params":{"foo":"bar","baz":"bat,bats\\,\\\\qux"},"meta":{"id":"00000000-0000-0001-0000-000000000002","dt":"1970-01-01T00:00:00Z","request_id":"the_request_id","domain":"hostname.domain.local","stream":"mystream.name"},"http":{"method":"POST","clientIp":"10.1.2.3","request_headers":{"Accept":"application/json","X-Request-Id":"the_request_id","X-Custom":"one,two\\,t\\\\hree"},"status_code":200,"has_cookies":true},"backend_host":"hostname.domain.local","query_time":1000,"$schema":"/sparql/query/1.0.0"}]'

And I could see it in kafka:

dcausse@deployment-kafka-jumbo-1:~$ kafkacat -b localhost:9092 -t eqiad.mystream.name
% Auto-selecting Consumer mode (use -P or -C to override)
{"namespace":"ns123","query":"select","format":"xml","params":{"foo":"bar","baz":"bat,bats\\,\\\\qux"},"meta":{"id":"00000000-0000-0001-0000-000000000002","dt":"1970-01-01T00:00:00Z","request_id":"the_request_id","domain":"hostname.domain.local","stream":"mystream.name"},"http":{"method":"POST","clientIp":"10.1.2.3","request_headers":{"Accept":"application/json","X-Request-Id":"the_request_id","X-Custom":"one,two\\,t\\\\hree"},"status_code":200,"has_cookies":true},"backend_host":"hostname.domain.local","query_time":1000,"$schema":"/sparql/query/1.0.0"}

Sorry about this stupid queue name, I forgot to change what the unit test generates, I could not find a way to delete this topic as I failed to find the zookeeper instance that holds this metadata.

Change 549081 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/puppet@production] [wdqs] configure eventgate endpoint for sparql/query events

https://gerrit.wikimedia.org/r/549081

Sorry about this stupid queue name

No worries, its beta ┐|・ิω・ิ#|┌

Change 548779 merged by jenkins-bot:
[wikidata/query/rdf@master] Make the stream name configurable for sparql/query events

https://gerrit.wikimedia.org/r/548779

Change 550754 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/puppet@production] [wdqs] Fix event service configuration

https://gerrit.wikimedia.org/r/550754

Change 550754 merged by Gehel:
[operations/puppet@production] [wdqs] Fix event service configuration

https://gerrit.wikimedia.org/r/550754

Does this being closed mean we can access data on kafka?

@JAllemandou no, I moved this to done too early and it was closed before being available.
The code is not yet deployed. I'll ping you once that is available.

Change 548764 merged by Ottomata:
[operations/deployment-charts@master] eventgate-analytics - stream config for new sparql-query streams

https://gerrit.wikimedia.org/r/548764

Change 549081 merged by Gehel:
[operations/puppet@production] [wdqs] configure eventgate endpoint for sparql/query events

https://gerrit.wikimedia.org/r/549081

Change 554327 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Import *sparql-query topics and refine them

https://gerrit.wikimedia.org/r/554327

Change 554327 merged by Ottomata:
[operations/puppet@production] Import *sparql-query topics and refine them

https://gerrit.wikimedia.org/r/554327

Change 554353 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Import sparql-query events in mediawiki_analytics camus job

https://gerrit.wikimedia.org/r/554353

Change 554353 merged by Ottomata:
[operations/puppet@production] Import sparql-query events in mediawiki_analytics camus job

https://gerrit.wikimedia.org/r/554353

Change 554357 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use mediawiki_analytics_events, no mediawiki_events camus job for sparql-query

https://gerrit.wikimedia.org/r/554357

Change 554357 merged by Ottomata:
[operations/puppet@production] Use mediawiki_analytics_events, no mediawiki_events camus job for sparql-query

https://gerrit.wikimedia.org/r/554357

Mentioned in SAL (#wikimedia-operations) [2019-12-04T10:31:10Z] <gehel> rolling restart of wdqs for config change (event logging) - T101013