Page MenuHomePhabricator

Wikidata Truthy dump is missing important metadata triples
Open, Needs TriagePublicBUG REPORT

Description

The "Wikidata Truthy" dump is missing:

  • schema:Article schema:about links (Wikipedia pages). It only has schema:Dataset schema:about links
  • the counts and other statements after the blank line below. These are useful counts for ranking entities.
data:Q131171 a schema:Dataset ;
	schema:about wd:Q131171 ;
	schema:version "1304894739"^^xsd:integer ;
	schema:dateModified "2020-11-10T21:06:45Z"^^xsd:dateTime ;

	cc:license <http://creativecommons.org/publicdomain/zero/1.0/> ;
	schema:softwareVersion "1.0.0" ;
	wikibase:statements "59"^^xsd:integer ;
	wikibase:sitelinks "71"^^xsd:integer ;
	wikibase:identifiers "39"^^xsd:integer .

Could we please have them?

I've asked my colleagues to report which version of the Truthy dump we use (I think we got it 5-6m ago).
I don't know whether the statements are present in the full dump.

Event Timeline

Here are some queries to add the counts: directly to the Item node, and using some namespace ontoRecon:. I show the above counts, plus 3 more.
They would need a lot of memory (group over 90M items) and a lot of time (especially statements).
Not tested yet.

PREFIX skos:   <http://www.w3.org/2004/02/skos/core#>
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>

insert {?item ontoRecon:sitelinks ?n}
where {
  {select ?item (count(*) as ?n) {
    ?x a schema:Article; schema:about ?item. # TODO check the type of Category and Commons links
  } group by ?item}
};

insert {?item ontoRecon:identifiers ?n}
where {
  {select ?item (count(*) as ?n) {
    ?wd wikibase:propertyType wikibase:ExternalId; wikibase:directClaim ?wdt.
    ?item ?wdt ?extId
  }
};

insert {?item ontoRecon:statements ?n}
where {
  {select ?item (count(*) as ?n) {
    ?item a wikibase:Item; ?p ?y
    filter(?p not in (skos:prefLabel, skos:altLabel, schema:name, schema:description, rdfs:label))
  } group by ?item}
};

insert {?item ontoRecon:prefLabels ?n}
where {
  {select ?item (count(*) as ?n) {
    ?item a wikibase:Item; skos:prefLabel ?label
  } group by ?item}
};

insert {?item ontoRecon:allLabels ?n}
where {
  {select ?item (count(*) as ?n) {
    ?item a wikibase:Item; skos:prefLabel|skos:altLabel  ?label
  } group by ?item}
};

insert {?item ontoRecon:descriptions ?n}
where {
  {select ?item (count(*) as ?n) {
    ?item a wikibase:Item; schema:description ?descr
  } group by ?item}
};

Hello!
The "Wikidata Truthy" dump that we are currently using which we think is missing the data that my colleague @VladimirAlexiev specified above is this one https://dumps.wikimedia.org/wikidatawiki/entities/20201123/ (from 23.11.2020).

Thank you!