Page MenuHomePhabricator

Grouping by property is not powerful enough for some use-cases
Open, Needs TriagePublic

Description

For both the SOAP and VG dashboards, we expected a direct relationship between the items and the group: P-195 (collection) is on each artwork item ; P-400 (platform) is on each game item, etc.

The idea was to have a very simple interface (ie, simple parameters) so that folks with little SPARQL-fu could easily use it.

By doing so, however, the kind of grouping possible was arbitrarily restricted.

Typically, in some use cases, the grouping is 1 (or more item away). For example, building a coverage dashboard of French churches per department: on the church item, P-131 points to the city, and from the city P-131 points to the department.

(Added complexity here is that the city will have more than one value of P-131, which means we need to further restrict in SPARQL).


The general use case here is overall reasonable, however I’m unsure how to build a path towards this which maintains the original goal of accessibility:

  • |grouping_property=P195 → easy ;
  • |grouping_sparql=?entity wdt:P195 ?grouping . → less so

(Would also need to think through how to reconstruct the no-group query if allowing 'free-form' input).

Event Timeline

JeanFred created this task.May 23 2019, 2:38 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 23 2019, 2:38 PM
JeanFred added a subscriber: Ayack.May 23 2019, 2:38 PM
JeanFred updated the task description. (Show Details)May 23 2019, 2:40 PM
JeanFred updated the task description. (Show Details)May 23 2019, 2:57 PM

For grouping by some sub-national level, maybe a separate query to get levels first is the most efficient. A shorthand for that could be a P31 statement or a property (if there is one).

Below a sample for Italy that could replace one that times-out:

SELECT ?grouping (COUNT(DISTINCT ?entity) as ?count) (SAMPLE(?entity) as ?sample) 
WITH
{
    SELECT DISTINCT ?grouping {   { ?grouping wdt:P31 wd:Q15089 } UNION { ?grouping wdt:P31 wd:Q15110 } minus { ?grouping  wdt:P576 [] }   }
} as %groupings
WHERE 
{ 
  INCLUDE %groupings 
  ?entity wdt:P17 wd:Q38; wdt:P625 []. 
  ?entity wdt:P131/wdt:P131* ?grouping . 
} 
GROUP BY ?grouping 
HAVING (?count > 100) 
ORDER BY DESC(?count) 
LIMIT 1000

For (non-contemporary) people, I think an interesting grouping could be by century, but I'm not entirely sure how that could work without a dedicated statement.

seav added a subscriber: seav.May 25 2019, 4:05 PM

As mentioned on Facebook, I agree that the tool should retain grouping_property to make using this tool easy for simple cases. But for more complex types of grouping, I think having a separate field like grouping_sparql (that will be used in case grouping_property is not present) would be nice.

I was able to "abuse" the current tool because grouping_property is currently not being sanitized to match the pattern /^P[0-9]+$/ so I was able to insert arbitrary SPARQL to group the selected items by whatever I want (basically, the SPARQL version of the SQL injection exploit). However, I had to use really weird SPARQL clauses so that my "hack" works with both the positive and MINUS (for the "No grouping" row) SPARQL queries.