Page MenuHomePhabricator

Change $wgArticleCountMethod in Wikidata from default ('link') to 'any'
Closed, ResolvedPublic

Description

The value for the MediaWiki variable $wgArticleCountMethod in Wikidata should be changed from its default ('link') to 'any'.

See related talk.

Event Timeline

Urbanecm moved this task from Backlog to Doing on the good first task board.
Urbanecm moved this task from Backlog to Working on on the Wikimedia-Site-requests board.
Urbanecm triaged this task as Medium priority.Sep 4 2016, 11:33 AM

Change 308430 had a related patch set uploaded (by Urbanecm):
Change $wgArticleCountMethod in Wikidata from default ('link') to 'any'

https://gerrit.wikimedia.org/r/308430

Should be deployed during Morning SWAT window (September 5th, 18:00-19:00 UTC).

Urbanecm moved this task from Working on to To deploy on the User-Urbanecm board.
Urbanecm moved this task from Doing to Ready to go on the good first task board.

(Cross-posted from gerrit) Wikibase on purpose counts articles differently than normal MediaWiki. Changing this should only be done after a thorough discussion, I think… this should probably also be changed in Wikibase in case the community no longer agrees to the measurement methods once put in place.

Thanks. Does that mean that $wgArticleCountMethod isn't working in Wikidata as a usual 'link'? Which are those current measurement methods?

Thanks. Does that mean that $wgArticleCountMethod isn't working in Wikidata as a usual 'link'? Which are those current measurement methods?

Not sure whether it will work, or whether we fully override the behavior. Currently we consider an Item a "valid" (countable) article if it has either at least one statement or at least one sitelink (as far as I remember).

@hoo Why empty items shouldn't be counted as items? Is there any reason for it?

I've descheduled this change. Please gather more consensus.

Urbanecm lowered the priority of this task from Medium to Low.Sep 5 2016, 11:37 AM
Urbanecm moved this task from Ready to go to Doing on the good first task board.
Urbanecm moved this task from To deploy to To deploy - scheduled for SWAT on the User-Urbanecm board.

Currently we consider an Item a "valid" (countable) article if it has either at least one statement or at least one sitelink (as far as I remember).

I doubt it's implemented like that. Both the number of items with at least one statement (20,964,244) and the number items with at least one sitelink (20,338,226) are higher than the current returned value by the parser function (20,023,628). Hence, even some items with a claim or sitelink are not considered as valid articles.

Values taken from db replica:

select count(*) from page_props where pp_propname='wb-claims' and pp_value>0;
select count(*) from page_props where pp_propname='wb-sitelinks' and pp_value>0;

We are using [[https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/891e6c851740da99ea1676f1bf8bdc5e3a696612/repo/includes/Content/ItemContent.php#L236|ItemContent::isStub]] right now which basically checks that the item is not a redirect and has at least one statement (the Item::isEmpty check on line 238 is redundant with the Item::getStatements()->isEmpty() call).

I think the question hinges on is an item without a statement useful or not. Right now we have 2.75 Million of them of 23.8 Million items in total. That is quite a lot and we definitely need to get this number down. At the same time the number on the main page is incorrect and several times I had to correct people about it and had people complain to me about it.

So: Let's change it. Redirects should still not be counted but items without statements should be counted.

Wikidata items without statements usually still are containers for sitelinks and thus actively used from various Wikimedia projects. We can not call them useless. We should not ignore them and hide them, and if it's only in the count.

Properties and whatever new types of entities we will be getting should probably be included as well.

Most items without any statements do include sitelinks: https://www.wikidata.org/wiki/Wikidata:Database_reports/without_claims_by_site

Will be deployed today, 13:00-14:00 UTC.

Change 308430 merged by jenkins-bot:
Change $wgArticleCountMethod in Wikidata from default ('link') to 'any'

https://gerrit.wikimedia.org/r/308430

Mentioned in SAL (#wikimedia-operations) [2016-09-20T13:05:33Z] <zfilipin@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:308430|Change $wgArticleCountMethod in Wikidata from default (link) to any (T144687)]] (duration: 00m 47s)

Urbanecm closed this task as Resolved.EditedSep 20 2016, 1:34 PM
Urbanecm moved this task from Doing to Done on the good first task board.
Urbanecm moved this task from To deploy to Done on the Wikimedia-Site-requests board.

If anybody will see any problem, please reopen and comment this task.

My comment (T144687#2608508) was not answered: Why was the configuration changed and the not actual definition of what is an "article"?

@hoo What is an article is defined by the config, isn't it? Or at least how Wikidata count them.

But let say an article is a content page or valid item. Is it more clear?

@hoo What is an article is defined by the config, isn't it? Or at least how Wikidata count them.

But let say an article is a content page or valid item. Is it more clear?

The relevant code is in Wikibase's Item::isCountable (see Content::isCountable in MediaWiki). I think this should have been altered instead (or additionally at least).

I think the question hinges on is an item without a statement useful or not. Right now we have 2.75 Million of them of 23.8 Million items in total. That is quite a lot and we definitely need to get this number down. At the same time the number on the main page is incorrect and several times I had to correct people about it and had people complain to me about it.

So: Let's change it. Redirects should still not be counted but items without statements should be counted.

On many Wikipedia's we periodically have to update the number of items manually in the system, as it is a calculation and not an article/item count.

Maybe it does not fit here, but I would like to know also how many statements are added to Wikidata.

This was completed and it isn't technical isuee. There was consensus, there was request, and it was processed. We can't do more. Discussion is still open if somebody needs it...

Urbanecm raised the priority of this task from Low to Medium.Oct 10 2016, 11:50 AM