Page MenuHomePhabricator

Change $wgArticleCountMethod in Wikidata from default ('link') to 'any'
Closed, ResolvedPublic

Description

The value for the MediaWiki variable $wgArticleCountMethod in Wikidata should be changed from its default ('link') to 'any'.

See related talk.

Details

Related Gerrit Patches:

Event Timeline

abian created this task.Sep 4 2016, 8:57 AM
Restricted Application added subscribers: TerraCodes, Aklapper. · View Herald TranscriptSep 4 2016, 8:57 AM
Pasleim added a subscriber: Pasleim.Sep 4 2016, 9:05 AM
Restricted Application added subscribers: JEumerus, Matanya. · View Herald TranscriptSep 4 2016, 10:26 AM
Urbanecm claimed this task.Sep 4 2016, 11:13 AM
Restricted Application added a project: User-Urbanecm. · View Herald TranscriptSep 4 2016, 11:13 AM
Urbanecm moved this task from Backlog to Working on on the User-Urbanecm board.Sep 4 2016, 11:18 AM
Urbanecm moved this task from Backlog to Doing on the good first task board.
Urbanecm moved this task from Backlog to Working on on the Wikimedia-Site-requests board.
Urbanecm triaged this task as Medium priority.Sep 4 2016, 11:33 AM

Change 308430 had a related patch set uploaded (by Urbanecm):
Change $wgArticleCountMethod in Wikidata from default ('link') to 'any'

https://gerrit.wikimedia.org/r/308430

Should be deployed during Morning SWAT window (September 5th, 18:00-19:00 UTC).

Urbanecm moved this task from Working on to To deploy on the User-Urbanecm board.
Urbanecm moved this task from Doing to Ready to go on the good first task board.
hoo added a subscriber: hoo.Sep 5 2016, 10:55 AM

(Cross-posted from gerrit) Wikibase on purpose counts articles differently than normal MediaWiki. Changing this should only be done after a thorough discussion, I think… this should probably also be changed in Wikibase in case the community no longer agrees to the measurement methods once put in place.

abian added a comment.Sep 5 2016, 11:04 AM

Thanks. Does that mean that $wgArticleCountMethod isn't working in Wikidata as a usual 'link'? Which are those current measurement methods?

hoo added a comment.Sep 5 2016, 11:07 AM

Thanks. Does that mean that $wgArticleCountMethod isn't working in Wikidata as a usual 'link'? Which are those current measurement methods?

Not sure whether it will work, or whether we fully override the behavior. Currently we consider an Item a "valid" (countable) article if it has either at least one statement or at least one sitelink (as far as I remember).

@hoo Why empty items shouldn't be counted as items? Is there any reason for it?

I've descheduled this change. Please gather more consensus.

Urbanecm lowered the priority of this task from Medium to Low.Sep 5 2016, 11:37 AM
Urbanecm moved this task from Ready to go to Doing on the good first task board.
Urbanecm moved this task from To deploy to To deploy - scheduled for SWAT on the User-Urbanecm board.

Currently we consider an Item a "valid" (countable) article if it has either at least one statement or at least one sitelink (as far as I remember).

I doubt it's implemented like that. Both the number of items with at least one statement (20,964,244) and the number items with at least one sitelink (20,338,226) are higher than the current returned value by the parser function (20,023,628). Hence, even some items with a claim or sitelink are not considered as valid articles.

Values taken from db replica:

select count(*) from page_props where pp_propname='wb-claims' and pp_value>0;
select count(*) from page_props where pp_propname='wb-sitelinks' and pp_value>0;
hoo added a comment.Sep 5 2016, 6:41 PM

We are using ItemContent::isStub right now which basically checks that the item is not a redirect and has at least one statement (the Item::isEmpty check on line 238 is redundant with the Item::getStatements()->isEmpty() call).

I think the question hinges on is an item without a statement useful or not. Right now we have 2.75 Million of them of 23.8 Million items in total. That is quite a lot and we definitely need to get this number down. At the same time the number on the main page is incorrect and several times I had to correct people about it and had people complain to me about it.

So: Let's change it. Redirects should still not be counted but items without statements should be counted.

YMS added a subscriber: YMS.Sep 19 2016, 8:00 AM

Wikidata items without statements usually still are containers for sitelinks and thus actively used from various Wikimedia projects. We can not call them useless. We should not ignore them and hide them, and if it's only in the count.

Properties and whatever new types of entities we will be getting should probably be included as well.

Most items without any statements do include sitelinks: https://www.wikidata.org/wiki/Wikidata:Database_reports/without_claims_by_site

Will be deployed today, 13:00-14:00 UTC.

@Lea_Lacroix_WMDE please announce in next weekly summary

Change 308430 merged by jenkins-bot:
Change $wgArticleCountMethod in Wikidata from default ('link') to 'any'

https://gerrit.wikimedia.org/r/308430

Mentioned in SAL (#wikimedia-operations) [2016-09-20T13:05:33Z] <zfilipin@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:308430|Change $wgArticleCountMethod in Wikidata from default (link) to any (T144687)]] (duration: 00m 47s)

Urbanecm closed this task as Resolved.EditedSep 20 2016, 1:34 PM
Urbanecm moved this task from Doing to Done on the good first task board.
Urbanecm moved this task from To deploy to Done on the Wikimedia-Site-requests board.

If anybody will see any problem, please reopen and comment this task.

hoo reopened this task as Open.Sep 20 2016, 6:34 PM

My comment (T144687#2608508) was not answered: Why was the configuration changed and the not actual definition of what is an "article"?

@hoo What is an article is defined by the config, isn't it? Or at least how Wikidata count them.

But let say an article is a content page or valid item. Is it more clear?

For other entities, I wrote T146272

Urbanecm moved this task from To deploy to Working on on the User-Urbanecm board.Sep 27 2016, 10:18 AM
Urbanecm moved this task from Working on to Backlog on the User-Urbanecm board.
hoo added a comment.Sep 27 2016, 5:17 PM

@hoo What is an article is defined by the config, isn't it? Or at least how Wikidata count them.
But let say an article is a content page or valid item. Is it more clear?

The relevant code is in Wikibase's Item::isCountable (see Content::isCountable in MediaWiki). I think this should have been altered instead (or additionally at least).

Urbanecm moved this task from Backlog to Watching on the User-Urbanecm board.Oct 3 2016, 5:27 PM
Romaine added a subscriber: Romaine.Oct 3 2016, 9:48 PM

I think the question hinges on is an item without a statement useful or not. Right now we have 2.75 Million of them of 23.8 Million items in total. That is quite a lot and we definitely need to get this number down. At the same time the number on the main page is incorrect and several times I had to correct people about it and had people complain to me about it.
So: Let's change it. Redirects should still not be counted but items without statements should be counted.

On many Wikipedia's we periodically have to update the number of items manually in the system, as it is a calculation and not an article/item count.

Maybe it does not fit here, but I would like to know also how many statements are added to Wikidata.

Urbanecm closed this task as Resolved.Oct 10 2016, 11:50 AM

This was completed and it isn't technical isuee. There was consensus, there was request, and it was processed. We can't do more. Discussion is still open if somebody needs it...

Urbanecm raised the priority of this task from Low to Medium.Oct 10 2016, 11:50 AM