Page MenuHomePhabricator

Property suggester data treats P279 as classifying, while the PropertySuggester extension does not
Closed, ResolvedPublic

Description

PropertySuggester on wikidata.org does not treat P279 as classifying, while the suggester data is generated under the assumption that P279 is classifying. This inconsistency may lead to sub-optimal suggestions on Items that use P279.

Evidence

PropertySuggester's extension.json sets PropertySuggesterClassifyingPropertyIds to [ 31 ]: https://github.com/Wikidata-lib/PropertySuggester/blob/master/extension.json. Configuration for the live site does not override this, see https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/wmf-config/Wikibase-production.php (it only overrides it for test.wikidata.org). Or we may test this on the live site first, before changing the defaults.

However, PropertySuggester-Python's analyzer.ini sets it to 31,279, see https://github.com/Wikidata-lib/PropertySuggester-Python/blob/master/propertysuggester/analyzer/analyzer.ini. A look at the wbs_propertypairs table confirms that this configuration is used to produce the data for the live site.

Proposal

Add 279 to PropertySuggesterClassifyingPropertyIds in PropertySuggester's extension.json. This makes the default config consistent, and it causes the life site to treat P279 as classifying, allowing it to use the data in wbs_propertypairs correctly. Settings for labs and test should be adjusted accordingly. We may want to test this on the live site first, before changing the defaults, though.

We could also do it the other way around by removing 279 from analyzer.ini, but that would require the data to be re-generated. Also, treating P279 (subclass of) as classifying seems sensible.

Event Timeline

daniel triaged this task as Medium priority.Jun 20 2017, 5:29 PM

Let's add it and see if the suggestions improve.

daniel updated the task description. (Show Details)
daniel updated the task description. (Show Details)
daniel added a subscriber: aude.

@Lydia_Pintscher I can make a config patch, but the config change should be announced in advanced, and we should ask people to test and comment. We should also make sure we can get it reverted quickly if need be.

thiemowmde added subscribers: hoo, thiemowmde.

I think we can consider this a bugfix. Why do you think it needs announcement in advance?

I think we can consider this a bugfix. Why do you think it needs announcement in advance?

We got a few messages in the past of people who said "this used to show up in the suggestions and now it doesn't anymore". We need to check if the impact is not too much, but I highly doubt it.

Is the below related ?

It seems that P31/P279 gets suggested on items that only have coordinates (P625), but not on items that only have P641 (sport). I'd expect it to appear on any item that lacks it.

Is the below related ?

It seems that P31/P279 gets suggested on items that only have coordinates (P625), but not on items that only have P641 (sport). I'd expect it to appear on any item that lacks it.

Sport is removed from the propertypairs datafile because it is used on various types of items, while coordinates is only used on geographical items.

That makes sense for additional suggestions, but it still doesn't explain why P31/P279 aren't offered at all.

Try adding a statement to any item on these reports: https://www.wikidata.org/wiki/Category:P641_only_reports

That makes sense for additional suggestions, but it still doesn't explain why P31/P279 aren't offered at all.

Try adding a statement to any item on these reports: https://www.wikidata.org/wiki/Category:P641_only_reports

It doesn't have something to base suggestions on. My opinion is that it should fallback to the initial list (that could use some additions imo, like image), but that is unrelated to this task.

This seems resolved now, impact not that big.