Page MenuHomePhabricator

Engineer features for item quality model
Closed, ResolvedPublic

Description

We have to specify features based on the quality criteria. These will be used for developing the prediction model.

Event Timeline

@Glorian_WD and I have been discussion how we'll get features that will give us some signal about which properties are expected for specific types of items. Here's my skeleton proposal:

  1. query for most used statements (e.g. instance-of:human)
  2. for the top N most used properties, query for the most secondary properties (instance-of:human, occupation:author)
  3. for all items that pass some basic threshold of quality (e.g. has an external reference and >= N site-links) find the frequency of all other properties.
  4. build an index on this so it can be quickly looked-up during scoring.