Page MenuHomePhabricator

Page/edit count milestone tool
Open, LowestPublic

Description

Feature request to allow better tracking of page count milestones, such as the recent 6 millionth article on ENWP.

As part of the group that recently determined what the 6 millionth article was, we felt stymied by software limitations. While we could use the stat tracking page's report of how many articles there were, we had no software based way of knowing exactly what that page was. Our stopgap method was to watch the page counter, note roughly when it ticked over, and then use the new page feed to try to figure it out more precisely. Still, we were only able to get it down to the minute, and there were 15 pages created within that minute. In the end, the 6M page was chosen by community consensus of what we thought the best page created in that moment was, not what the actual 6M page was, as we have no way of knowing exactly what it was. Many editors accused this process of being unfair, rigged, or a sham, and were dissapointed to hear that there was no software way to know for sure.

As other page milestones approach on ENWP (chiefly 50 million total pages, ~6 months out), it would be useful to have some sort of tool that accurately tracks page milestones. The billionth edit milestone is also fast approaching (~1 month out), and a similar concern exists that we have no software based way to determine exactly what that edit will be.

Event Timeline

I wonder if Product-Analytics has already thought about any potential approach to this problem.

Re: billionth edit, the current plan at https://en.wikipedia.org/wiki/Wikipedia:Billionth_edit_pool seems to be to use the oldid field, as the count provided by {{NUMBEROFEDITS}} is 4.7 million too low. Article count is another story, as we have seen.

I feel like this is always going to be "more art than science". Determining the seven millionth article is limited by the fact that we do not count deleted articles as part of the total article count. This is different from using oldid for edits, since as I understand the oldid number includes deleted edits.

Suppose in 2024 we have 6,999,995 live articles, and we number each of those articles in order of the time they were created. After we do this, six more articles are created, so we declare the fifth to be the seven-millionth article on Wikipedia. Seems reasonable. Now suppose, however, it later came to light that while those five articles were being created, the article that was previously number 100 on the list was deleted at AfD, meaning the article that was previously number 101 should now be moved to article 100, and likewise for all subsequent articles. This means that what we earlier described as the 7,000,001st article should now be declared the 7,000,000th article.

The current practice of declaring the seven-millionth article requires looking through Special:NewPages and finding the point in time around which the seven-millionth article should have been created, creating a shortlist of articles based on that point, then choosing by consensus the best, most-promising article from the bunch. I feel this method would be superior to a technical method due to the limitations of such a tool.

Edit: and this analysis also doesn't factor in restorations of previously deleted articles, which there are around a dozen or so every day according to Special:Log/delete. A small number, but nontrivial for the purposes of determining the millionth article milestone "authoritatively".

I wonder if Product-Analytics has already thought about any potential approach to this problem.

Nope. Also I don't think the Product Analytics can or should take this on, which is why the team was untagged.

Aklapper triaged this task as Lowest priority.Feb 3 2020, 7:16 PM
Aklapper removed a project: Growth-Team.

In that case I do not expect anybody to work on this - see also the previous comment here.