We have to specify features based on the quality criteria. These will be used for developing the prediction model.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • johl | T127047 Collection of topics for HPI hackathon | |||
Resolved | awight | T187836 [Epic] Audit of pending ORES GUI deployments | |||
Resolved | Glorian_WD | T127470 Deploy item quality classification model for Wikidata | |||
Resolved | Glorian_WD | T157498 Train/test item quality model for Wikidata | |||
Resolved | Glorian_WD | T157497 Engineer features for item quality model | |||
Resolved | Ladsgroup | T158430 Use suggested properties to get signal for completeness | |||
Resolved | hoo | T164994 Enable wbgetsuggestions API to get recommended properties even if they have existed in an item |
Event Timeline
Comment Actions
@Glorian_WD and I have been discussion how we'll get features that will give us some signal about which properties are expected for specific types of items. Here's my skeleton proposal:
- query for most used statements (e.g. instance-of:human)
- for the top N most used properties, query for the most secondary properties (instance-of:human, occupation:author)
- for all items that pass some basic threshold of quality (e.g. has an external reference and >= N site-links) find the frequency of all other properties.
- build an index on this so it can be quickly looked-up during scoring.