Page MenuHomePhabricator

Grouping by property is not powerful enough for some use-cases
Open, Needs TriagePublic

Description

For both the SOAP and VG dashboards, we expected a direct relationship between the items and the group: P-195 (collection) is on each artwork item ; P-400 (platform) is on each game item, etc.

The idea was to have a very simple interface (ie, simple parameters) so that folks with little SPARQL-fu could easily use it.

By doing so, however, the kind of grouping possible was arbitrarily restricted.

Typically, in some use cases, the grouping is 1 (or more item away). For example, building a coverage dashboard of French churches per department: on the church item, P-131 points to the city, and from the city P-131 points to the department.

(Added complexity here is that the city will have more than one value of P-131, which means we need to further restrict in SPARQL).


The general use case here is overall reasonable, however I’m unsure how to build a path towards this which maintains the original goal of accessibility:

  • |grouping_property=P195 → easy ;
  • |grouping_sparql=?entity wdt:P195 ?grouping . → less so

(Would also need to think through how to reconstruct the no-group query if allowing 'free-form' input).

Event Timeline

For grouping by some sub-national level, maybe a separate query to get levels first is the most efficient. A shorthand for that could be a P31 statement or a property (if there is one).

Below a sample for Italy that could replace one that times-out:

SELECT ?grouping (COUNT(DISTINCT ?entity) as ?count) (SAMPLE(?entity) as ?sample) 
WITH
{
    SELECT DISTINCT ?grouping {   { ?grouping wdt:P31 wd:Q15089 } UNION { ?grouping wdt:P31 wd:Q15110 } minus { ?grouping  wdt:P576 [] }   }
} as %groupings
WHERE 
{ 
  INCLUDE %groupings 
  ?entity wdt:P17 wd:Q38; wdt:P625 []. 
  ?entity wdt:P131/wdt:P131* ?grouping . 
} 
GROUP BY ?grouping 
HAVING (?count > 100) 
ORDER BY DESC(?count) 
LIMIT 1000

For (non-contemporary) people, I think an interesting grouping could be by century, but I'm not entirely sure how that could work without a dedicated statement.

As mentioned on Facebook, I agree that the tool should retain grouping_property to make using this tool easy for simple cases. But for more complex types of grouping, I think having a separate field like grouping_sparql (that will be used in case grouping_property is not present) would be nice.

I was able to "abuse" the current tool because grouping_property is currently not being sanitized to match the pattern /^P[0-9]+$/ so I was able to insert arbitrary SPARQL to group the selected items by whatever I want (basically, the SPARQL version of the SQL injection exploit). However, I had to use really weird SPARQL clauses so that my "hack" works with both the positive and MINUS (for the "No grouping" row) SPARQL queries.

Trying to chart a path here:

  1. Do nothing. Assume that SPARQL injection (either "easy ones" like Maarten’s |grouping_property=P131/wdt:P131 or more complex ones like |grouping_property=p:P195 [ ps:P195 ?id ; pq:P2868 wd:Q29188408 ]' are enough .
  2. Allow arbitrary SPARQL: