Page MenuHomePhabricator

Create UDFs for analyzing SPARQL queries
Open, MediumPublic

Description

After T164021 has been implemented, we want to create some tags for SPARQL queries, to make it easier to collect statistics.

Ideas so far:

  • Tag for a request containing SPARQL query
  • SPARQL query type (SELECT/ASK/etc.)
  • UDF for properties used in a query (wdt:P1234 etc.)
  • UDF for entities used in a query (wd:Q2234 etc.)
  • UDF for services usage

More can be added per users requests.

Details

Related Gerrit Patches:
analytics/refinery/source : masterAdd tagger for Wikidata Query Service requests

Event Timeline

Smalyshev created this task.Jul 5 2017, 7:30 PM
Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptJul 5 2017, 7:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Nuria added a subscriber: Smallyen03.EditedJul 5 2017, 8:03 PM

@Smallyen03 : the idea of the tags is to be able to split webrequest dataset into "partitions" that make subsequent querying more effective. So tags have to be coarse, this one sounds good: "Tag for a request containing SPARQL query" the rest are too-fine-grained (if that makes sense). So, after tagging, we will move all sparql queries to their own little dataset that is alive for 60 days and you can subsequently query this dataset for property usage/query type and create a file with those metrics for the outside world to consume.

@Nuria so you're saying tags is not the right way to extract features like property usage? What would be the right way then - just creating other UDFs?

Eventually I think we do want to have a dataset which can be queried (and summarized, etc.) but property IDs, item IDs, etc. so we want to have those in data sets. The question is what is the best way to achieve it.

Change 364542 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[analytics/refinery/source@master] [WIP] Add tagger for Wikidata Query Service requests

https://gerrit.wikimedia.org/r/364542

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Jul 12 2017, 2:50 PM

Change 364542 merged by jenkins-bot:
[analytics/refinery/source@master] Add tagger for Wikidata Query Service requests

https://gerrit.wikimedia.org/r/364542

Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptJul 22 2017, 12:20 AM

Next steps: after T164020 is done:

  1. get wdqs data in separate partition
  2. create UDFs to extract more data
  3. Design table with derived statistical data for wdqs using UDFs above and implement it
Smalyshev updated the task description. (Show Details)Aug 4 2017, 12:46 AM
Smalyshev moved this task from Backlog to Waiting/Blocked on the User-Smalyshev board.
Smalyshev removed Smalyshev as the assignee of this task.Jan 30 2018, 8:05 AM