Page MenuHomePhabricator

[feature request] add count of external-ids to rdf (Wikidata)
Closed, ResolvedPublic

Description

Per T129046 the following are added:

wdata:Q2 a schema:Dataset ;
	schema:about wd:Q2 ;
	wikibase:sitelinks "1"^^xsd:integer ;
	wikibase:statements "25"^^xsd:integer .

It would be interesting to have an additional statement

wdata:Q2 wikibase:externalids "15"^^xsd:integer

This as external-identifiers are conceptually closer to sitelinks than other statements.

This way we could use it on query.wikidata.org etc ..

Related Objects

Event Timeline

Hi @Esc3300. Please associate at least one project with this task to allow others to find this task when searching in the corresponding project(s). Thanks!

Esc3300 renamed this task from [feature request] add count of external-ids to rdf to [feature request] add count of external-ids to rdf (Wikidata).Sep 1 2016, 9:04 AM
Esc3300 updated the task description. (Show Details)
thiemowmde triaged this task as Lowest priority.Sep 1 2016, 12:51 PM
thiemowmde added subscribers: Jonas, aude.

What you are describing as "externalids" is, as far as I understand, the number of statements with property type "external-id". I'm not sure if it's a good idea to have such a count per property type. I believe it's possible to count statements per type in SPARQL, isn't it?

Not that I'm aware of. This is why we had T129046 . If it's possible, would you add a new view to Graphana (number of external-ids per item)?

The idea is not to do it for any type, but for external ids.

@thiemowmde Sample workaournd that doesn't work:

SELECT ?ct (COUNT(?item) as ?ct2)
{
SELECT ?item (COUNT(?wdt) as ?ct)
{
  	# hint:Query hint:optimizer "None" .	
  	?item wdt:P31 wd:Q11424 . 
  	?item ?wdt [] .
  	?p wikibase:propertyType wikibase:ExternalId .
	?p wikibase:directClaim ?wdt .
}
GROUP BY ?item
}
GROUP BY ?ct 
ORDER BY DESC(?ct2)

I'm afraid this ticket still misses a user story. Why is this needed? What is the benefit? Which workflows will benefit from this? At the moment the only hint given in the task description is that this may be "interesting". That's not enough to form a decision.

If that's a problem, feel free to reject it ..

What you are describing as "externalids" is, as far as I understand, the number of statements with property type "external-id". I'm not sure if it's a good idea to have such a count per property type. I believe it's possible to count statements per type in SPARQL, isn't it?

I filed T114617 for that some time ago. External-id type properties are a bit different than the other types. These are used to link to the same concept in another database. As mentioned in the description of the task: A bit more like sitelinks. Why do we track the number of sitelinks? Because it gives an indication of the importance of an item. The same goes for properties of types external-id's. We already moved them in a special "identifiers" section in the interface. Important items also seem to have a lot of identifiers, see for example https://www.wikidata.org/wiki/Q5582#identifiers

I would probably use it myself to make a list of interesting people to write about on Wikipedia. For example the painters that don't have a Wikipedia article in any language, sorted by the number of identifiers descending. That sure would yield some interesting suggestions!

If this count is implemented as page property, it's be easy to add it. Otherwise, not so much. So I think T114617 is necessary for this one.

I found a workaround that .. works:

By replacing

	?p wikibase:propertyType wikibase:ExternalId .

in the above query with

	?p wdt:P31 wd:Q22964274 .

Should that be another bug?

If this count is implemented as page property, it's be easy to add it. Otherwise, not so much. So I think T114617 is necessary for this one.

Agreed. Phabricator blocker/child/subtask/parent syntax is confusing. Is this currently correctly linked?

Multichill claimed this task.

Guess what, it's in the RDF!

wdata:Q76 a schema:Dataset ;
schema:about wd:Q76 ;
cc:license http://creativecommons.org/publicdomain/zero/1.0/ ;
schema:softwareVersion "0.0.5" ;
schema:version "484349714"^^xsd:integer ;
schema:dateModified "2017-05-11T08:06:19Z"^^xsd:dateTime ;
wikibase:statements "202"^^xsd:integer ;
wikibase:identifiers "88"^^xsd:integer ;
wikibase:sitelinks "291"^^xsd:integer .

And it got picked up by the query service so we can do things like https://t.co/CD5tX4DVre

According to @Smalyshev the service has recently been reloaded:

SELECT (COUNT(*) as ?count) WHERE {
  ?a wikibase:statements ?b
}

yields 26153440

According to @Smalyshev the service has recently been reloaded:

SELECT (COUNT(*) as ?count) WHERE {
  ?a wikibase:statements ?b
}

yields 26153440

SELECT (COUNT(*) as ?count) WHERE {

?a wikibase:identifiers ?b

}

yields 2.875.589 so the import was probably about a week ago?

It's at 12,194,663 now.

Thanks for having implemented this!