Page MenuHomePhabricator

Track the number of identifier properties on an item in page_props
Closed, ResolvedPublic

Description

In T95287 we'll get a new identifier datatype for properties. In the page_props table we track several attributes of a page like the number of claims (wb-claims) and the number of sitelinks (wb-sitelinks). Would be nice to track the number of identifier properties on an item like the number of sitelinks. How to count exactly I'm not sure. One option is to count the number of P identifiers, other option is to count the total of number of claims. If for some reason the same identifier property is used multiple times on an item, the results will differ. I would probably go for the second option (number of claims), because that's closer to how wb-claims counts.

This page_prop could be used in a similar way as wb-sitelinks: To rank items and to suggest items to work on.

Related Objects

Event Timeline

Multichill raised the priority of this task from to Needs Triage.
Multichill updated the task description. (Show Details)
Multichill subscribed.

Bump. Now that we have identifiers on Wikidata it would be really nice to have this.

Looks like I created a (partial) duplicate with T144476 . Still, this could be on rdf and in the pp table.

thiemowmde triaged this task as Lowest priority.Sep 5 2016, 2:52 PM
Lydia_Pintscher raised the priority of this task from Lowest to Low.Sep 8 2016, 7:14 PM

@aude, @daniel, @hoo: Is there anything fundamentally different from how we do the same thing for sitelink count already? How much work do you think it'd be to get this added?

I've looked into this a little and it seems it's not easily doable. Item->getStatements() returns StatementList which without doing several SQL lookups we can not distinguish statements which their property is of a certain data type.

I need to add that I don't know the codebase good enough to give the last call. That's definitely something Daniel or Thiemo should take a look

I've looked into this a little and it seems it's not easily doable. Item->getStatements() returns StatementList which without doing several SQL lookups we can not distinguish statements which their property is of a certain data type.

Link? Why several lookups? Looking at MariaDB [wikidatawiki_p]> SELECT * from wb_property_info LIMIT 1000,100; I get things back like:

1392external-id{"type":"external-id","formatterURL":"http:\/\/www.comicbookdb.com\/creator.php?ID=$1"}

for https://www.wikidata.org/wiki/Property:P1392 . This could be cached so you don't have to hit the database each time and if it's getting parsed anyway, Could also populate the externallinks table if it gets parsed anyway. I wonder how the imagelinks table gets populated. Solution is probably similar.

@Ladsgroup there is a PropertyDataTypeLookup service. The implementation used in production is based on a cached list of data types, IIRC. That should be even quicker than the query suggested by Multichill.

@Ladsgroup code review was done. Any update on this?

I just got back. The CI tests look complicated and take some time to get it fixed. The main part is done though.

Change 345974 abandoned by Ladsgroup:
[WIP] Add tests for adding wb-identifiers to page_props

Reason:
Done in I53c8a1162fccc2b3f5b4179d9b1d5327decc5ff4

https://gerrit.wikimedia.org/r/345974

Change 345809 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add wb-identifiers field to page_props

https://gerrit.wikimedia.org/r/345809

Ladsgroup moved this task from Review to Done on the Wikidata-Former-Sprint-Board board.
Ladsgroup moved this task from Blocked on others to Done on the User-Ladsgroup board.