Engineer features for item quality model
We have to specify features based on the quality criteria. These will be used for developing the prediction model.

@Glorian_WD and I have been discussion how we'll get features that will give us some signal about which properties are expected for specific types of items. Here's my skeleton proposal:

  1. query for most used statements (e.g. instance-of:human)
  2. for the top N most used properties, query for the most secondary properties (instance-of:human, occupation:author)
  3. for all items that pass some basic threshold of quality (e.g. has an external reference and >= N site-links) find the frequency of all other properties.
  4. build an index on this so it can be quickly looked-up during scoring.