Page MenuHomePhabricator
Paste P5921

script to count the total number of format constraint checks to be done on Wikidata
ActivePublic

Authored by Lucas_Werkmeister_WMDE on Aug 25 2017, 3:30 PM.
#!/bin/bash
# script to count the total number of format constraint checks to be done on Wikidata
if ! [[ -f predicates ]]; then
curl -s -H 'Accept: application/json' --data @- 'https://query.wikidata.org/sparql' << 'EOF' |
query=
SELECT ?p (COUNT(?constraintStatement) AS ?count) WHERE {
?property p:P2302 ?constraintStatement;
wikibase:claim ?p.
?constraintStatement ps:P2302 wd:Q21502404.
}
GROUP BY ?p
EOF
jq -r '.results.bindings | .[] | (.p.value + "\t" + .count.value)' | sed 's|^http://www.wikidata.org/prop/||' > predicates
fi
mkdir -p counts
while read -r predicate constraintCount; do
if ! [[ -s counts/$predicate ]]; then
curl -s -H 'Accept: application/json' --data @- 'https://query.wikidata.org/sparql' << EOF |
query=
SELECT (COUNT(?statement) AS ?count) WHERE {
?entity p:$predicate ?statement.
}
EOF
jq -r '.results.bindings[0].count.value' > "counts/$predicate"
if ! [[ -s counts/$predicate ]]; then
printf >&2 'error: could not get count for %q\n' "$predicate"
rm -- "counts/$predicate"
fi
fi
done < predicates
let constraints=0
let total=0
while read -r predicate constraintCount; do
if [[ -s counts/$predicate ]]; then
let count=$(<"counts/$predicate")
else
printf >&2 'warning: count for %q not included\n' "$predicate"
let count=0
fi
let total+=count*constraintCount
let constraints+=constraintCount
done < predicates
printf '%d constraints on %d total statements\n' "$constraints" "$total"

Event Timeline

Current result:

1786 constraints on 77862938 total statements

But this doesn’t include P2093 (author name string), of which there are apparently too many statements to count with SPARQL.