Page MenuHomePhabricator

property datatype information is missing in articlequality/ORES
Closed, DeclinedPublic13 Estimated Story Points

Description

In the process of training ORES/articlequality AI to estimate the quality of Wikidata items, we are using a set of features including some that are depending on the datatype of a statement. However, it turns out that the datatype is currently never available (None) in the actual data that we are working with. This means that some of the features are just plain not working and our model isn't as good as it could be.

Extensive digging revealed that the problem is that we are using the "wrong" API endpoint. Currently, we are using action=query&prop=revisions&rvprop=content which gives us the plain content of the revision as it is stored in the database. However, we do not store a properties datatype in the serialization of an item in the database and thus it is not available in the request-response.

acceptance criteria:

  • The property datatype in statements is available for feature analysis

Notes:

  • That probably has to be implemented by changing which api is used by the revscoring Extractor, maybe to a native Wikibase API?
  • it should be considered whether to use some of the existing caching
    • but also it needs to be considered how these changes affect the dumps-based analysis (which might use same API)
    • but also the dumps based analysis should maybe use entities dump instead of xml dump

Event Timeline

It solved in a different way. We don't add property datatype to ORES and instead hardcode a list of properties that have a certain datatype.