Page MenuHomePhabricator

Change signal: Ensure that the number of statements is counted correctly and consistently
Closed, ResolvedPublic

Description

Problem:
It seems that the number of statements is not always counted correctly (counting a statement group not just as one statement) and consistently (either always with or always without external ID statements). We should change this. The number of statements should be the actual number of individual statements, not statement groups.

Acceptance criteria:

  • we have checked all signals that touch the number of statements to ensure that they use the correct number

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I'm not sure what this task is referring to. I looked through our current features and it seems there is barely any to which the number of statements is relevant in the first place.

wikibase.py has two features, one counting all the statement groups ("properties") and one counting all the statements ("claims") individually. Both including the external identifiers.

wikidatawiki.py has the new external_identifiers feature that counts all individual statements that are external identifiers.
It has also the item_completeness feature that compares existing statement groups/properties with the ones suggested by property suggester. That makes sense to me.

There is also a pull request to add the number of statements that are not external identifiers. I'm not sure I understand machine learning good enough to gauge whether that feature adds any useful new signal or whether it is completely covered by being total number of statements - number of external identifier statements.

Ladsgroup subscribed.

I double checked and they are indeed counted correctly and consistently. I added some tests to make sure they stay like that: https://github.com/wikimedia/articlequality/pull/156