Page MenuHomePhabricator

Analyze queries performed on Wikidata Query Service to identify what users are using it for, and produce report
Closed, ResolvedPublic10 Estimated Story Points

Assigned To
Authored By
Deskana
Sep 14 2015, 11:46 PM
Referenced Files
F2682919: WDQS Analysis Report.html
Oct 9 2015, 11:51 PM
Unknown Object (File)
Oct 9 2015, 6:01 PM
Unknown Object (File)
Oct 7 2015, 12:17 AM
Tokens
"Like" token, awarded by Smalyshev.

Description

Output of this task should be a short report with some of the queries users are making using the Wikidata Query Service, so that we have an answer to the question "What kind of things are people doing with WDQS?".

If it's helpful for setting expectations, I would intend that the analyst that does this should spend no more than three or four hours on producing the report. This shouldn't be much more than a dumb list of the queries with some notes added to it.

Event Timeline

Deskana raised the priority of this task from to Medium.
Deskana updated the task description. (Show Details)
Deskana added a subscriber: Deskana.

I don't want to call dibs on this task but I want to note that I would very much like to work on this one if I can find the time to.

mpopov renamed this task from Analyse queries performed on Wikidata Query Service to identify what users are using it for, and produce report to Analyze queries performed on Wikidata Query Service to identify what users are using it for, and produce report.Sep 17 2015, 1:09 AM
mpopov set Security to None.
mpopov edited a custom field.

Putting this on the back burner in favor of user satisfaction stuff. In the mean time, running a data acquisition script on the past 30 days of WDQS queries.

I am currently having to do approximate string matching in order to separate out queries that are user-written (interesting) from those which are examples provided by us (not interesting)

It's a non-trivial task because 1. the sample queries as run by the users are not perfect matches with the examples we provide, and 2. Varnish crops uri_path (which is where the encoded query is stored), so the decoded query is a cropped version of the actual query the user ran.

Snapshot of the report at the moment: {F2662935}

Note that I have more cleaning up to do. A lot of the queries that are sample queries aren't marked as such yet. So take the query analysis sections with a grain of salt.

(Uploading in case @Smalyshev or @Deskana want to know some stuff about WDQS users and queries.)

Pinging @Ironholds! Care to review {F2674990}?

  1. Could we put the summary above the TOC, and not in italics? They make it hard to read.
  2. vacillating, not vascillating (and vacillating at around I guess)
  3. The lack of a title on those graphs makes it hard to see what they're about since I have to turn my head at a 90 degree angle to find out.
  4. Can you put the countries represented in a footnote?
  5. www.wikidata.org/wiki/property_talk isn't a domain. Is there a urltools bug? ;)
  6. The referal graph is so dense in lines that I can't read it easily.
  7. The lack of a legend in the query graph makes it of limited utility as a visual aide, since you have to read past it to understand it. Can you add one?
  8. I'm seeing two categories but the label still says four

Okay, done. Have fun reading the final draft of

, @Smalyshev & @Deskana. Maybe @Tfinc and @Wwes might want to take a look too.

Again, thanks @TJones and @Ironholds for your advice and feedback!


Repo for code to get the data and replicate the analysis: https://github.com/wikimedia-research/WQDSUsageAnalysis