Allow accessing data from a Wikidata item not connected to the current page - arbitrary access (tracking)
OpenPublic

Assigned To
None
Priority
High
Author
Lydia_Pintscher
Blocks
T89594: Use the arbitrary access to Wikidata feature on Commons (tracking)
T76007: add ability to link/refer to foreign items and properties
T68108: Store media information for files on Wikimedia Commons as structured data
T74815: Add mw.wikibase.getEntityObject by site link (title) Lua function
T4007: Tracking bug (tracking)
Blocked By
T95567: Preload sitelinks based on usage tracking data
T93885: Implement a limit for entities accessed via arbitrary access features and mark as expensive
T93607: Make mw.wikibase.getEntity and mw.wikibase.getEntityObject the same function
T89002: Track multi-lingual label usage
T86187: Prepare deployment of usage tracking to Wikidata
T75460: Sane Lua label access in non-content languages
T76156: mw.wikibase: Use __index to lazy load entity contents
T76159: Preload labels and descriptions for Lua and the parser function based on usage tracking data
T76805: Allow getting descriptions in Lua without loading the whole entity into memory
T70029: allow arbitrary data access on Wikidata (parser function)
T69538: allow arbitrary data access on Wikidata (LUA)
T60856: Lua: Add expensive function getEntity(id) for non-connected entities
T68544: Notify client about changes to redirects
T49288: Track Wikidata entity usage on client pages
T49071: Allow use of the Lua API on a Wikibase repository
T46946: Allow use of property parser function on repo
Subscribers
Accurimbono, Rschen7754, Candalua and 56 others
Projects
Tokens
"Love" token, awarded by Ricordisamoa."Like" token, awarded by He7d3r.
Reference
bz47930
Security
None
Description

We should make it possible on the client to access data from an item that is not connected to the current page.

(filing as someone asked for a bug to follow progress on this)


Version: unspecified
Severity: major
Whiteboard: u=dev c=story p=0
See Also:

bzimport added a subscriber: wikidata-bugs.
bzimport set Reference to bz47930.
Lydia_Pintscher created this task.Via LegacyMay 1 2013, 1:55 PM
bzimport added a comment.Via ConduitMay 2 2013, 3:56 PM

soulkeeper.wikipedia wrote:

Very important functionality IMO. Two possible, realistic, use cases:

  1. We have an infobox in the article about sunflower (Q171497). We want to show that Linnaeus (Q1043) is the author of this species. Because this is botany, we don't want the infobox in Q171497 to show Linnaeus' full name.

Instead we want the infobox to show the "botanist author abbreviation" (Property:P428) from the external item Q1043. In other words, we need to be able to get the value of a property of a different item than the article in question.

  1. The species sunflower belongs to the genus Helianthus which belongs to the tribe Heliantheae which belongs to the subfamily Helianthoideae which belongs to the family Asteraceae which belongs to the order Asterales which belongs to the unranked (!) group Asterids which belongs to the unranked group Eudicots which belongs to the unranked group Angiosperms which belongs to the kingdom Plantae.

Storing all this information in Q171497 plus all of sunflower's close relatives requires a proverbial ton of redundancy in the data. I would much prefer to be able to traverse the items in at least 10 or 20 levels, to get the relevant name, rank, and possibly some other information from each group/level in the hierarchy. All this will be displayed in the infobox of sunflower (Q171497).

Mattflaschen added a comment.Via ConduitJun 3 2013, 6:13 AM

I have another use case for this, as explained at https://www.wikidata.org/w/index.php?title=Wikidata:Properties_for_deletion&oldid=49338731#Property:P289 and https://www.wikidata.org/w/index.php?title=Wikidata:Requests_for_comment/How_to_classify_items:_lots_of_specific_type_properties_or_a_few_generic_ones%3F&oldid=49339875#Discussion_2 .

USS Carl Vinson (CVN-70) is an instance of (P31) Nimitz-class aircraft carrier. Nimitz-class aircraft carrier in turn is an instance of ship class. In such cases (or analogous), the following algorithm should work. In a template such as {{Infobox ship}}:

  1. Take L as the list of items the page (ship) is an instance of.
  2. For every member x in L, check if x is an instance of ship class (Q559026). If so, display it in the ship class field.

Step 1 can be done now, but 2 requires a fix to this bug.

Mattflaschen added a comment.Via ConduitJun 4 2013, 6:18 AM

To be more concrete about this, I'd like mw.wikibase.lua to have a getEntity overload (or another method also returning an item) that accepted an arbitrary ID (Q123). Libraries on top of that can be done on-wiki.

Qgil added a comment.Via ConduitJun 5 2013, 4:28 PM

Another use case, as explained at http://www.wikidata.org/wiki/Help_talk:Lua

Provide localization and links to articles for all flag icons in all languages, based on single English table: http://en.wikipedia.org/wiki/Module:Sandbox/QuimGil/FlagTranslations

I'm working on a Lua based template that already would save hundreds of subtemplates to every Wikipedia. If it would also save the work of maintaining all the local translations for all languages that would be amazing.

Wikidata has already all this data in place. It's "just" a matter of leveraging it.

daniel added a comment.Via ConduitSep 4 2013, 10:55 AM

Trying to summarize a discussion we had internally about this a few weeks ago (from memory, please correct me if I got something wrong):

Tracking on clients:

  • have a table mapping local pages <---> entity IDs
  • ...should track both implicit and explicit item usage
  • ...should track property usage
  • ...should track indirect item usage (labels)
  • this needs to be updated whenever the page is edited
  • ...and when the page is moved (for implicit item usage) or deleted
  • ...and when any of the referenced items change (for implicit item usage, and indirect/label usage)
  • the tracking table is expected to be roughly as big as the pagelinks table

Tracking on the repo:

  • tracking table saying which client wiki uses which entities (but not where)
  • the information in that table is derived from the client side tables:
  • when the client side tracking table is updated, we need to note which entities were used before, and which are used after
  • then we can update the repo side tracking table accordingly

So, the flow of information is:

  • client edit: client tracking table -> repo tracking table
  • repo edit: repo change -> client change handler -> client page update -> client tracking table -> repo tracking table
Filceolaire added a comment.Via ConduitOct 9 2013, 7:03 PM

Use case:
We have an article on "Bonnie and Clyde" on Wikipedia.
We want it to have two infoboxes, importing data from the Wikidata pages for 'Bonnie Parker" and for "Clyde Barrow".

Use case:
We have a wikivoyage page for an area. It has datacards for three museums, two hotels, four bars, each of which has a wikidata item separate from the wikidata item linked to the current page. We want to import the info for these datacards into this page.

Lydia_Pintscher added a comment.Via ConduitNov 5 2013, 11:59 AM
  • Bug 49805 has been marked as a duplicate of this bug. ***
Kersti added a comment.Via ConduitNov 24 2013, 4:45 PM

Additional Use Case:
The labels may be used for Internationlisattion in Commons. On pages like https://commons.wikimedia.org/wiki/Acrocephalus Acrocephalus and https://commons.wikimedia.org/wiki/Acrocephalidae Acrocephalidae the labels corresponding to https://commons.wikimedia.org/wiki/Acrocephalus_aequinoctialis Acrocephalus aequinoctialis : https://www.wikidata.org/wiki/Q1585161 on wikidata may be used to internationalice the bird name.

Lydia_Pintscher added a comment.Via ConduitDec 3 2013, 4:19 PM

Here's the break-down of this task from the backlog page:

  1. create Subscription table on the repo
  2. create EntityUsage table on the client
  3. populate Subscription table on the repo
  4. populate EntityUsage table on the client
  5. update Subscription table on the repo
  6. (#47288) update EntityUsage table on the client
  7. use Subscriptions table to push changes to the right clients from the repo
  8. use EntityUsage table to deploy the changes on the client
  9. allow arbitrary access
Jarekt added a comment.Via ConduitJan 24 2014, 7:10 PM

Two use cases (which I hope will be possible after this is fixed):

  1. https://www.wikidata.org/wiki/Q1278115 for "Fort Ross". Commons category https://commons.wikimedia.org/wiki/Category:Fort_Ross needs access to Q1278115's properties. I hope that I will be able to access them if I know the Q code once this bug is fixed. Ideally there would be a way to look up the properties without the Q code, for example Commons Category:Albert Einstein -> Q7213562 -> P301 -> Q937 and its properties.
  1. https://commons.wikimedia.org/wiki/Creator:Albert_Einstein has the wikidata Q937 code, but if missing than I should be able to access the Q937 properties, because each Creator page is associated with a category so I should be able to follow the trail from category to article Q code outlined above.

Once that is a possibility it will be imbedded in templates/modules and used a lot. Or is there a better way?

bzimport added a comment.Via ConduitJan 26 2014, 12:10 PM

soulkeeper.wikipedia wrote:

Or is there a better way?

Ideally, Albert Einstein should be the same item as Category:Albert Einstein, Template:Albert Einstein, Creator:Albert Einstein, Wikipedia:Albert Einstein etc. In other words, one item should be able to link to many different namespaces in each language/project.

The developer team has, as far as I understand, acknowledged that something like this would be a good idea, but I don't know when or if it will be implemented. It is a huge change, technically.

See https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Commons_links

Jarekt added a comment.Via ConduitMar 17 2014, 3:13 PM

(In reply to soulkeeper.wikipedia from comment #11)

In other words, one item should be able to
link to many different namespaces in each language/project.

That would be a better solution and we probably would need a different bug report for it. Software would then either show a link to only one page per project or to all while somehow indicating if page is in a different namespace. This would really simplify how to handle properties associated with same subject at many namespaces.

SamB added a comment.Via ConduitMar 25 2014, 12:23 AM
  • Bug 58856 has been marked as a duplicate of this bug. ***
gerritbot added a comment.Via ConduitJul 9 2014, 1:14 PM

Change 144965 had a related patch set uploaded by Hoo man:
Lua: Allow arbitrary item access

https://gerrit.wikimedia.org/r/144965

bzimport added a comment.Via ConduitOct 2 2014, 9:40 AM

vlsergey wrote:

Why is this bug depends on 44946 and 68029? Doesn't look like real dependency for me.

Florian added a comment.Via ConduitOct 22 2014, 11:25 AM

(In reply to Sergey Vladimirov from comment #15)

Why is this bug depends on 44946 and 68029? Doesn't look like real
dependency for me.

+1, it seems unrelated.

He7d3r awarded a token.Via WebNov 24 2014, 11:58 AM
matej_suchanek added a subscriber: matej_suchanek.Via WebNov 24 2014, 2:44 PM
mxn added a subscriber: mxn.Via WebNov 24 2014, 8:56 PM
Liuxinyu970226 added a subscriber: Liuxinyu970226.Via WebNov 25 2014, 10:44 AM
HenkvD added a subscriber: HenkvD.Via WebNov 25 2014, 1:43 PM
Snipre added a subscriber: Snipre.Via WebNov 29 2014, 11:45 AM
Lydia_Pintscher added a project: Wikidata.Via WebDec 1 2014, 2:23 PM
Lydia_Pintscher removed a subscriber: wikidata-bugs.
Laddo added a subscriber: Laddo.Via WebDec 7 2014, 2:55 PM
greg added a subscriber: greg.Via WebDec 8 2014, 9:15 PM
Lydia_Pintscher moved this task to consider for next sprint on the Wikidata workboard.Via WebDec 11 2014, 1:39 PM
Rical added a subscriber: Rical.Via WebDec 12 2014, 8:13 PM
Ricordisamoa awarded a token.Via WebJan 3 2015, 1:24 PM
-jem- added a subscriber: -jem-.Via WebJan 14 2015, 6:57 PM
RP88 added a subscriber: RP88.Via WebJan 19 2015, 2:03 AM
Perhelion changed the title from "allow accessing data from an item not connected to the current page - arbitrary access (tracking)" to "Allow accessing Wikidata from an item not connected to the current page - arbitrary access (tracking)".Via WebJan 24 2015, 2:54 PM
Perhelion edited the task description. (Show Details)
Perhelion set Security to None.
Perhelion changed the title from "Allow accessing Wikidata from an item not connected to the current page - arbitrary access (tracking)" to "Allow accessing Wikidata durch an item not connected to the current page - arbitrary access (tracking)".Via WebJan 24 2015, 3:00 PM
Perhelion changed the title from "Allow accessing Wikidata durch an item not connected to the current page - arbitrary access (tracking)" to "Allow accessing Wikidata per item not connected to the current page - arbitrary access (tracking)".Via WebJan 24 2015, 3:03 PM
Perhelion edited the task description. (Show Details)
Perhelion added a comment.Via WebJan 24 2015, 3:08 PM
This comment was removed by Perhelion.
Lydia_Pintscher changed the title from "Allow accessing Wikidata per item not connected to the current page - arbitrary access (tracking)" to "Allow accessing data from an item not connected to the current page - arbitrary access (tracking)".Via WebJan 24 2015, 3:17 PM
Perhelion removed a subscriber: Perhelion.Via WebFeb 7 2015, 3:45 PM
jeremyb-phone added a subscriber: jeremyb.Via WebFeb 15 2015, 1:27 AM
Dr_Brains added a subscriber: Dr_Brains.Via WebFeb 21 2015, 11:48 PM
Pengo added a subscriber: Pengo.Via WebFeb 25 2015, 11:06 PM
Pengo added a comment.Via WebFeb 26 2015, 9:32 AM

This first comment includes the use case I'm interested in...

soulkeeper.wikipedia wrote:

  1. The species sunflower belongs to the genus Helianthus which belongs to the tribe Heliantheae which belongs to the subfamily Helianthoideae which belongs to the family Asteraceae which belongs to the order Asterales which belongs to the unranked (!) group Asterids which belongs to the unranked group Eudicots which belongs to the unranked group Angiosperms which belongs to the kingdom Plantae.

I've created a module to test this use case. It can only run on Wikidata's own wiki until this issue is resolved. Perhaps it would be useful for performance testing:

https://www.wikidata.org/wiki/User:Pengo/sunflower_example

CPU time usage: 0.172 seconds

Note that not only are the taxa being read from separate Wikidata items, ("Magnoliidae", "Asteranae", etc), but so are the ranks ("class", "subclass", etc). I haven't done any Lua code before to access Wikidata, so my code is probably a mess.

Here's another species which has a ridiculously long phylogenetic tree (as all birds in Wikidata do). I count approx. 90 entity lookups for this one:

https://www.wikidata.org/wiki/User:Pengo/longer_example

CPU time usage: 0.554 seconds

I realize that the devs need to do more than just some simple performance tweaks to get this resolved, but maybe this will help.

Lydia_Pintscher added a comment.Via WebFeb 26 2015, 1:07 PM

Thanks for the analysis, Pengo!

greg added a comment.Via WebFeb 26 2015, 11:54 PM

The Deployment calendar has this as "late February": https://wikitech.wikimedia.org/w/index.php?title=Deployments&oldid=146073#Upcoming

What's the current plan?

aude added a comment.Via WebFeb 27 2015, 12:04 AM

@greg there are some more issues (e.g. T89002) that we need to resolve for commons, since it has multilingual content. So, definitely not "late February".

We still might be able to proceed with enabling the usage tracking part, without arbitrary access, on some more wikis but need to double check with daniel if T89002 or anything else is a blocker for that.

Reaper35 added a subscriber: Reaper35.Via WebMar 1 2015, 11:47 AM
Eloquence added a subscriber: Eloquence.Via WebMar 5 2015, 7:43 AM

Can we specify a month to shoot for and add it to the appropriate Roadmap column?

Multichill added a comment.Via WebMar 5 2015, 7:27 PM

We still might be able to proceed with enabling the usage tracking part, without arbitrary access, on some more wikis but need to double check with daniel if T89002 or anything else is a blocker for that.

Any news on this. A two step deploy seems very sensible to me. After the first step we can see if that part works and if it performs and the user won't notice anything yet. In case of any problems it could just be pulled without any user impact.

greg added a project: Roadmap.Via WebMar 6 2015, 11:28 PM
Lydia_Pintscher added a comment.Via WebMar 9 2015, 12:22 PM

Hey folks,

Update on this: I can't make any meaningful predictions for this yet because it depends on how smooth the rollout of usage tracking is going from now on. We are in the process of rolling that out but hit a roadblock with multilingual wikis. We've discussed how to proceed and will do that over the next 2 or 3 weeks and roll it out to a few wikis. Once that is working smoothly we will start gradually rolling out arbitrary access. We're not going to roll this out to all wikis at once and will take a bit of time with each wiki since we need to evaluate performance and stability. The impact of arbitrary access on performance and stability will not be immediately visible because it'll only kick in once people start changing templates and Lua modules.

I'll post another timeline update on usage tracking rollout in the next 2 days.

Eloquence added a comment.Via WebMar 9 2015, 6:18 PM

Thanks Lydia. Does it sound reasonable to you to schedule arbitrary access tentatively for the April-June quarter for now, or do you still want to see if you can hit a March deployment goal?

adrianheine added a subscriber: adrianheine.Via WebMar 10 2015, 10:17 AM
Lydia_Pintscher added a comment.Via WebMar 10 2015, 10:22 AM

Yeah. April-June is more realistic. March is definitely out at this point.

greg moved this task to May-June 2015: Platform on the Roadmap workboard.Via WebMar 10 2015, 3:19 PM
Lydia_Pintscher added a comment.Via WebMar 12 2015, 3:39 PM

We just discussed this. The remaining blocker for a rollout on non-multilingual wikis should be fixed this sprint. This would mean we can start rolling out usage tracking to the first wikis in roughly 3 weeks. I suggest starting with French Wikisource and then Dutch Wikipedia. Neither of them should notice any changes but we should get a pretty good idea if there are any issues with scaling and performance.

greg added a comment.Via WebMar 12 2015, 4:09 PM

Thanks Lydia! Is there a task that tracks just the usage tracking rollout bit? It's something I want to add to the Roadmap workboard (and deployment calendar) :)

Lydia_Pintscher added a comment.Via WebMar 12 2015, 4:41 PM

@greg: That'd be T49288.

Candalua added a subscriber: Candalua.Via WebMar 13 2015, 3:37 PM
Rschen7754 added a subscriber: Rschen7754.Via WebMar 14 2015, 5:09 AM
Eloquence changed the title from "Allow accessing data from an item not connected to the current page - arbitrary access (tracking)" to "Allow accessing data from a Wikidata item not connected to the current page - arbitrary access (tracking)".Via WebMar 17 2015, 4:00 AM
Restricted Application added a project: notice. · View Herald TranscriptVia HeraldThu, Mar 26, 5:27 PM
gpaumier added a project: user-notice.Via WebThu, Mar 26, 10:10 PM
gpaumier moved this task to Not ready to announce on the user-notice workboard.
gpaumier moved this task to Triaged on the notice workboard.Via WebThu, Mar 26, 10:15 PM
gpaumier moved this task to Archive on the notice workboard.Via WebThu, Apr 2, 7:00 PM
Accurimbono added a subscriber: Accurimbono.Via WebSat, Apr 4, 11:02 AM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.