Engineer features for item quality model
Closed, ResolvedPublic
Actions

Description

We have to specify features based on the quality criteria. These will be used for developing the prediction model.

Related Objects
Search...

Status	Assigned	Task
Resolved	• johl	T127047 Collection of topics for HPI hackathon
Resolved	awight	T187836 [Epic] Audit of pending ORES GUI deployments
Resolved	Glorian_WD	T127470 Deploy item quality classification model for Wikidata
Resolved	Glorian_WD	T157498 Train/test item quality model for Wikidata
Resolved	Glorian_WD	T157497 Engineer features for item quality model
Resolved	Ladsgroup	T158430 Use suggested properties to get signal for completeness
Resolved	hoo	T164994 Enable wbgetsuggestions API to get recommended properties even if they have existed in an item

Event Timeline

Glorian_WD created this task.Feb 7 2017, 9:22 PM

@Glorian_WD and I have been discussion how we'll get features that will give us some signal about which properties are expected for specific types of items. Here's my skeleton proposal:

query for most used statements (e.g. instance-of:human)
for the top N most used properties, query for the most secondary properties (instance-of:human, occupation:author)
for all items that pass some basic threshold of quality (e.g. has an external reference and >= N site-links) find the frequency of all other properties.
build an index on this so it can be quickly looked-up during scoring.