Page MenuHomePhabricator

[AOI] Investigation: Can we improve page stats?
Closed, ResolvedPublic

Description

Per http://www.allourideas.org/wikimediagadgets/results.

Some existing tools:

Quite some of the tools (I could only investigate where it was disclosed) use data from http://dumps.wikimedia.org/other/pagecounts-all-sites/ which is provided by WMF's Analytics team. See also the open tasks list for the Datasets-Webstatscollector project.

Please answer the following questions:

  • Are there high priority bugs or features that the Community Tech team could address in a short period of time?
    • This project is already being worked on by the Analytics team.
  • If so, are the maintainers amendable to working with us and is the code publicly available?
    • Not Applicable.
  • Would either of the tools above be good tools to convert into MediaWiki extensions or add as functionality to existing extensions?
    • Again, tool already being worked on by Analytics. We should not pursue this any further.

See also:

Event Timeline

kaldari created this task.Aug 8 2015, 1:29 AM
kaldari raised the priority of this task from to Needs Triage.
kaldari updated the task description. (Show Details)
kaldari added a project: Community-Tech.
kaldari added a subscriber: kaldari.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 8 2015, 1:29 AM
kaldari moved this task from Untriaged to Ready on the Community-Tech board.Aug 8 2015, 1:29 AM
kaldari renamed this task from [AOI] Spike: Can we improve page stats? to [AOI] Investigation: Can we improve page stats?.Aug 12 2015, 2:35 AM
kaldari set Security to None.
kaldari updated the task description. (Show Details)Aug 12 2015, 10:18 PM
kaldari updated the task description. (Show Details)
Niharika claimed this task.Aug 14 2015, 4:18 PM

Some information about http://stats.grok.se:

It'd be a good idea to get in touch with the maintainer (he apparently does not reply on his talk page) and see if there's interest in improving this. The tool does not seem like a big deal in itself and we might be better off writing a new tool if that's not the case. Some issues are mentioned at https://en.wikipedia.org/wiki/User_talk:Killiondude/stats

Another more modern tool available since March 2014 is - Wikipedia Trends: http://www.wikipediatrends.com

  • Run by a private group of individuals apparently. More info on their about page.
  • Source code: Unknown.

We can reach out to them to see if they'd like to open source it - it looks like it isn't.

Niharika updated the task description. (Show Details)Aug 14 2015, 5:39 PM
Niharika updated the task description. (Show Details)Aug 14 2015, 5:52 PM

Hi @kevinator, @Milimetric, @DarTar. Page stats is one of the community requests the community tech team is looking into. We found a lot of tools and scripts (see list above!) developed for this purpose by individuals which indicates that this is indeed a much-needed tool. Is there any official app for providing page/category statistics? If not, are there plans to develop one?

@NiharikaKohli absolutely. We delayed for much too long on releasing tools that show pageview stats. There are many problems with the existing tools, and they mostly stem from the quality of the data we provide on dumps.wikimedia.org. We have committed to deliver a pageview api on top of quality data by the end of this quarter (end of September). The discussion about it is happening on this task: T44259 and on the analytics mailing list. As a matter of fact, I owe that phab task an update, so I'll go do that right now. Until then, you might want to check this out: https://vital-signs.wmflabs.org/

That's a daily updated dashboard that has metrics such as pageviews for all individual language / project pairs. It also has totals: https://vital-signs.wmflabs.org/#projects=all/metrics=Pageviews

Unfortunately we've had some outages in the recent days that caused some data quality issues, there are annotations on that graph roughly describing it and we're posting more information as we have it.

Andrew West puts out weekly pageviews by Wikiproject (at least for WPMED) and breaks it down into desktop and mobile. https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_medical_pages

I think he is the only one breaking out mobile at this point in time. This tool here needs to have mobile added to it https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Popular_pages

I think it is based off of stats.grok.sk but not sure.

OsamaK added a subscriber: OsamaK.Aug 15 2015, 1:22 AM

Thanks for the information, @Milimetric! That was helpful.

@Doc_James, the tool linked by @Milimetric above does show mobile statistics. Click on 'Data Breakdowns' on the left. More information here- https://meta.wikimedia.org/wiki/Research:Page_view#Resulting_format - See 'access_method'.

Niharika updated the task description. (Show Details)Aug 15 2015, 2:32 PM
Niharika moved this task from In Dev/Progress to Needs Review/Feedback on the Community-Tech board.
Quiddity updated the task description. (Show Details)Aug 21 2015, 11:01 PM
Quiddity added a subscriber: Quiddity.
kaldari added a comment.EditedSep 11 2015, 8:03 PM

@Doc_James, @Milimetric: I'm confused, does domas' pageview data not include mobile traffic? or is it just not broken out as separate data?

kaldari added a comment.EditedSep 11 2015, 8:04 PM

Also, could someone please tell me what "domas' pageview data" actually refers to? Where does this live? Is this the same as http://dumps.wikimedia.org/other/pagecounts-all-sites/?

I'm moving this task back to In Development because I want to investigate whether or not we should add mobile pageviews to Mr.Z-bot's popular pages function (https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Mr.Z-bot_4), per Doc_James. (This may require waiting for T44259.)

kaldari claimed this task.Sep 11 2015, 8:18 PM
kaldari triaged this task as Low priority.
kaldari moved this task from Done to To be estimated/discussed on the Community-Tech board.
kaldari added a project: Community-Tech-Sprint.
kaldari moved this task from Ready to In Development on the Community-Tech-Sprint board.

@kaldari: we left the title of that task alone for historical and nostalgic reasons. There's nothing in the new API that has much to do with Domas's pageview data. The data will be processed as follows:

  1. consumed from varnish, from all sources (mobile, upload, text, etc.)
  2. passed through our modern pageview definition
  3. titles of articles normalized (parsed out of index.php?title=, api.php?... etc.
  4. aggregated hourly, daily, and yearly
  5. served through restbase (this is what we're working on right now)

Let me know if you have any questions. Currently there is no data comparable to what we're going to use on any of the dumps.wikimedia.org/other subfolders, but we're aiming to providing this data there too.

@Milimetric: Got it, but I still don't have a good understanding of the current pageview data. According to https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Popular_pages, Mr.Z-bot's output comes from Domas' data, but I have no idea what that is, and whether or not it includes mobile traffic.

@kaldari: this page tries to explain a little bit, but I'll expand on the two links I think are relevant:

http://dumps.wikimedia.org/other/

  1. It sounds like this Mr.Z-bot looks at either one of these:
    • "Pagecount data collected by Domas Mituzas"
    • "Pagecount/projectcount data derived by Erik Zachte from Domas Mituzas' archives"
  2. The two links above don't include mobile data, this is the only data on dumps that currently includes mobile data:
    • "Pagecount/projectcount data including mobile/zero sites". This includes mobile but uses the same pageview definition as Domas's data, to be inter-comparable. The data we're gathering in the cluster uses the new pageview definition I called "modern" in my previous comment.

Does that help?

@Milimetric: Yes, that helps a ton!

@Milimetric: One more question (sorry for so many questions): Which dataset does http://stats.grok.se/ use? Does it include mobile traffic? If not, do you have any idea why it doesn't use "Pagecount/projectcount data including mobile/zero sites"?

@kaldari: stats.grok.se does not include mobile traffic, it uses the "Pagecount data collected by Domas Mituzas" dataset. The reason it hasn't switched is because the volunteer who runs it apparently hasn't been able to get the free time to do it. This was one of the last pushes we needed to start the pageview API development.

kaldari closed this task as Resolved.Sep 14 2015, 11:05 PM

Since it looks like the Analytics team is on top of most of this for the time being, the only task that came out of this for the Community Tech team is T112569.

DannyH moved this task from Untriaged to Archive on the Community-Tech board.