Page MenuHomePhabricator

Improved property recommender for Wikidata
Closed, InvalidPublic

Description

Build an AI that suggest property sets for Wikidata item based on instance of/subclass of and other statements.

What it does:

  • If the item is instanceof human, suggest 10 most frequently used properties for humans
  • If the item has occupation "actor", suggest properties about actors such as awards, films, etc.

Wiki thing it helps with:

  • Help avoid human error, choice of wrong properties
  • Have more complete items, stimulate people to complete data sets for items

Things that might helps us get this thing built:

  • Featured or "complete" items
  • Item "completeness" estimation (there's a project doing that IIRC)

Event Timeline

We already to have an existing recommender system. It should be improved. It is very important for me that we do not hard-code classes for the recommendation.

Great. I figured that was the case. What would be the best way to understand how prediction already works and to find out where improvements could be made?

Halfak renamed this task from Property recommender for Wikidata to Improved property recommender for Wikidata.Jan 27 2017, 3:38 PM

Related: Steinhauser, S., Gassler, W., Pichl, M., & Zangerle, E. Evaluation of Property Recommenders for Wikidata. http://informatik.uibk.ac.at/en/teaching/smb/theses/410.pdf

See also: Zangerle, E., Gassler, W., Pichl, M., Steinhauser, S., & Specht, G. (2016, August). An Empirical Evaluation of Property Recommender Systems for Wikidata and Collaborative Knowledge Bases. In Proceedings of the 12th International Symposium on Open Collaboration (p. 18). ACM. http://www.opensym.org/os2016/proceedings-files/p503-zangerle.pdf

@Lydia_Pintscher, is this the docs for the current recommender? https://www.mediawiki.org/wiki/WikidataEntitySuggester

Yes.

If work on this is started let's talk about it first.

@Lydia_Pintscher just showed me https://cool-wd.inf.unibz.it/ and apparently there's also a gadget for recommending properties from the same group.

I wonder if blaze graph this. @Smalyshev, how familiar are you with the property suggestion system and if blaze graph might be a good solution for recommend properties for entities (items, properties themselves)?

@Halfak depends on whether we can make a query that is fast enough. I think @Jonas did some work on that and for some cases queries are fast, while for others they are rather slow and it would better work as pre-calculated data set.

Addshore changed the task status from Open to Stalled.Jan 24 2020, 10:37 AM
Aklapper changed the task status from Stalled to Open.Aug 16 2022, 10:55 AM
Aklapper removed subscribers: Jonas, DarTar.

@Addshore: The previous comments don't explain who or what (task?) exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status, as tasks should not be stalled (and then potentially forgotten) for years for unclear reasons.

(Smallprint, as general orientation for task management:
If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead.
If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks.
If this task is stalled on an upstream project, then the Upstream tag should be added.
If this task requires info from the task reporter, then there should be instructions which info is needed.
If this task needs retesting, then the TestMe tag should be added.
If this task is out of scope and nobody should ever work on this, or nobody else managed to reproduce the situation described here, then it should have the "Declined" status.
If the task is valid but should not appear on some team's workboard, then the team project tag should be removed while the task has another active project tag.)

A new system is being worked on and the remaining work for that is tracked in T285098. I am going to close this one because it doesn't seem useful to spend more time on this here.