Page MenuHomePhabricator

Pageview Stats tool
Closed, ResolvedPublic

Assigned To
None
Authored By
DannyH
Dec 5 2015, 12:47 AM
Referenced Files
F3239848: ArticleTraffic.png
Jan 18 2016, 4:29 PM
F3239804: Wikistics1.PNG
Jan 18 2016, 4:11 PM
F3239807: Wikistics2.PNG
Jan 18 2016, 4:11 PM
F3239800: Wiki_ViewStats_-_2014-06-26 (1).png
Jan 18 2016, 4:03 PM
Tokens
"Party Time" token, awarded by Liuxinyu970226."Like" token, awarded by Pine."Like" token, awarded by Rdicerb."Yellow Medal" token, awarded by Elitre."Orange Medal" token, awarded by Shizhao.

Description

This card tracks a top 10 wish from the Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey

Original proposal: Wikipedia uses the old stats.grok.se (http://stats.grok.se/) that should be patched to be used corretly from the other wikis. Several bug have been highlighted long time ago, but no one took them in charge. On the other hand recently has been developed wikiviewstats (https://tools.wmflabs.org/wikiviewstats/) that is a more complete and flexible, graphic tool. Unfortunately, it has been stop, and no one was able to take it back on track. I suppose that should be quicker to fix the above issues instead of writing from scratch a brand new stats tool able to monitor the accesses of any articles (fundamental to understand the visitor's insterests), however any of the two choices would be a good improvement. --Andyrom75 (talk) 22:01, 11 November 2015 (UTC)

Community Tech preliminary assessment:

Support: High. Unanimous support votes.

Impact: High. This tool Will help programs demonstrate impact, and will help researchers. Stats.grok.se goes down regularly, and is not reliable.

Feasibility: High (relatively easy, compared to other projects). Analytics has the pageview API, so it should mostly be front-end work. Could be implemented either on Labs or as a Wikimedia-specific extension. Labs implementation would be easier and faster. It needs front-end spec and design, iterations with community input.

Risk: Low. Just need to make sure it’s reliable and scalable so it doesn’t flood the API.

Project page: https://meta.wikimedia.org/wiki/Community_Tech/Pageview_stats_tool

https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Miscellaneous#Pageview_Stats_tool

Related Objects

StatusSubtypeAssignedTask
ResolvedLea_WMDE
ResolvedNone
ResolvedNiharika
ResolvedMilimetric
ResolvedMilimetric
ResolvedOttomata
Resolved mobrovac
Resolvedkaldari
Resolvedkaldari
ResolvedNone
ResolvedJohan
ResolvedJAllemandou
ResolvedMusikAnimal
Resolvedkaldari
Resolvedkaldari
Duplicatekaldari
OpenNone
Resolvedkaldari
Resolvedkaldari
ResolvedMusikAnimal
Resolvedkaldari
Resolvedkaldari

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Per discussion on the Analytics mailing list, would it be possible for the students to publish their Project Planning Document on Commons under a CC-BY or CC-BY-SA license?

If it's text, might Wikisource work as well? According to https://commons.wikimedia.org/wiki/Commons:Project_scope, text should be used in Commons only if it's in form of text files that are of use to other Wikimedia Projects.

Per discussion on the Analytics mailing list, would it be possible for the students to publish their Project Planning Document on Commons under a CC-BY or CC-BY-SA license?

We would be glad to release our Project Planning Document under CC BY-SA, but since it is a part of our grade, we need to check with our course supervisor first. Most likely, he will have no problem with it, we just have to make sure.
Fun seeing so much interest in our doings!

If it's text, might Wikisource work as well? According to https://commons.wikimedia.org/wiki/Commons:Project_scope, text should be used in Commons only if it's in form of text files that are of use to other Wikimedia Projects.

The document is a more or less our analysis of the project, risks, our project plan and stuff, a pdf consisting mostly of text but a couple of images.
Commons seems like the better choice to me.

Hey all, discovered that @MusikAnimal has been working on the demo API to make it more functional: https://tools.wmflabs.org/musikanimal/pageviews Might want to piggy back on that, rather creating a whole new interface, etc.

Indeed. I thought you all were close to finishing the native interface? T125917 My fork of marcelrf's app just adds history push states so it can support deep linking (the pop states I didn't finish... but not important). I also changed it to use Opensearch so you can query for redirect pages. Finally, I'm going to change the project selector to an input field so it will work on all WMF wikis, but I also want to add some validations to it.

FYI my work here was sloppily put together just to get something out that people can use. I don't think it works in some older browsers, and isn't really meant to be a long-term solution.

Indeed. I thought you all were close to finishing the native interface?

FYI that analytics team is not working on this near term.

My fork of marcelrf's app just adds history push states so it can support deep linking (the pop states I didn't finish... but not important).

Sure, our showcase app is not a tool but a means for people to copy code and get something working with API as soon as possible so forks welcome!

I don't think it works in some older browsers,

Well if it uses the history APi it doesn't work in many browsers (not saying that is an issue): http://caniuse.com/#search=history

@Nuria I can add in code to skip push/pop states if it's not supported. That's easy to do, but I don't plan on putting much more effort into supporting old IEs. After all, Microsoft doesn't support them either! :)

I have cross-wiki support working now. I just want to validate the input against the site matrix, as a dropdown would be far too long to have an entry for every WMF wiki. I also threw in some meta tags to make the app more mobile-friendly. Finally, fix the pop state bug... Basically it pushes, but when you hit the browser Back button, the view isn't updated. There are some conflicting event listeners that need to be ironed out.

Per discussion on the Analytics mailing list, would it be possible for the students to publish their Project Planning Document on Commons under a CC-BY or CC-BY-SA license?

We would be glad to release our Project Planning Document under CC BY-SA, but since it is a part of our grade, we need to check with our course supervisor first. Most likely, he will have no problem with it, we just have to make sure.
Fun seeing so much interest in our doings!

If it's text, might Wikisource work as well? According to https://commons.wikimedia.org/wiki/Commons:Project_scope, text should be used in Commons only if it's in form of text files that are of use to other Wikimedia Projects.

The document is a more or less our analysis of the project, risks, our project plan and stuff, a pdf consisting mostly of text but a couple of images.
Commons seems like the better choice to me.

Thank you. If it's possible to do so and you're interested, it would also be great if you could present a lightning talk about this project on February 16th. Please see https://www.mediawiki.org/wiki/Lightning_Talks#February_2016. If you cannot attend at that time of day, perhaps you could record a talk that could be played during the meeting. Thanks!

@MusikAnimal

My fork of marcelrf's app just adds history push states so it can support deep linking (the pop states I didn't finish... but not important).

Sure, our showcase app is not a tool but a means for people to copy code and get something working with API as soon as possible so forks welcome!

Yes, Nuria is right. But anyway, your version of the demo is totally amazing!

@MusikAnimal
btw. It seems that there is code to strip www. from entry, but www.wikidata.org is the official url for wikidata, so it's currently very difficult to get stats for that project (you have to race the JS :) ).

Also @MusikAnimal, your tool doesn't seems to work in Safari here. The input field to select pages doesn't function.

@TheDJ Fixed, though I'm not able to actually get any data from the pageviews API. Are we sure wikidata is supported?

@Sjoerddebruin I am not able to reproduce. What version of Safari are you using, and under what operating system?

@mforns I believe you are the one I have to thank for sparking this effort! I would have not thought to use Select2 or even Bootstrap. I've longed shunned CSS libraries, but I'm sold now, thanks to you!

I had originally said this was not meant to a be a long-term solution, but I take that back. There are many more features coming!

@TheDJ Fixed, though I'm not able to actually get any data from the pageviews API. Are we sure wikidata is supported?

@Sjoerddebruin I am not able to reproduce. What version of Safari are you using, and under what operating system?

@mforns I believe you are the one I have to thank for sparking this effort! I would have not thought to use Select2 or even Bootstrap. I've longed shunned CSS libraries, but I'm sold now, thanks to you!

I had originally said this was not meant to a be a long-term solution, but I take that back. There are many more features coming!

Safari 9.0.1 on OS X 10.11.1. Javascript console says that the following file can't be found: https://tools.wmflabs.org/musikanimal/vendor/jquery.min.map But I think it has something to do with my adblocker.

Also, Wikidata should have pageviews AFAIK.

@Sjoerddebruin yes, there is an issue with ad blockers. I have no idea why, but there definitely aren't any ads :) I put a notice about this where the <canvas> is rendered, so that if the chart showed it would hide the notice. Did the chart load for you? That means some ad blockers are blocking different things... ugh. I'd like to know why this issue exists in the first place. I want to blame it on Tool Labs but am not sure :/

@Sjoerddebruin yes, there is an issue with ad blockers. I have no idea why, but there definitely aren't any ads :) I put a notice about this where the <canvas> is rendered, so that if the chart showed it would hide the notice. Did the chart load for you? That means some ad blockers are blocking different things... ugh. I'd like to know why this issue exists in the first place. I want to blame it on Tool Labs but am not sure :/

The chart also didn't work. I think it has something to do with HTTPS and the location of some files (CORS).

@Sjoerddebruin so you did see the notice, then? which ad blocker extension are you using? I've got to get to the bottom of this.

@Sjoerddebruin so you did see the notice, then? which ad blocker extension are you using? I've got to get to the bottom of this.

Currently using uBlock, until a good adblocker that supports the new content blocking API appears.

@kaldari Sure! The only problem is I'm in the middle of a big cleanup. When this first started I just wanted to get something out that worked, now I'm trying to make this a manageable codebase for others to work on too. Glad to know someone is willing to help :)

Improve CSV format is one that is simple and unlikely to cause conflicts at this point, if you're up for that. I can share my plans for the larger features that will require interface changes. Maybe we can talk more on IRC.

@MusikAnimal: Cool, created T127143 so we can track it as part of Community Tech work.

@Egedda have you uploaded the Project Planning Document to Commons yet?

@Egedda have you uploaded the Project Planning Document to Commons yet?

We haven't gotten a response yet from our supervisor (which should've gotten last week, but hey, thats uni).
But we will be happy share our PPD as soon as our supervisor gets back to us, sorry for the delay.

Graph:PageViews template is now done, and can be inserted anywhere to show the pageviews of any wiki page on any site. For a demo, see US presidental candidates pageviews side by side

Is there any recommended way to get all article's pageviews on a daily/weekly basis?
or I can build my own solution by calling the stats API?

Thanks

Thanks. Not sure if those dump will be affected by T114019?

Bianjiang, probably not. Dumps 2.0 is about database dumps, not traffic log dumps.

Erik

@Bianjiang: could you be more specific about what you're trying to accomplish? The pageview API provides access to project-level aggregates, per-article counts, and the top 1000 articles for each project on each day or month. So if I can get a more specific question I might be able to help or add a feature request.

@ezachte, your dataset (https://dumps.wikimedia.org/other/pagecounts-ez/) uses the old pageview definition, and thus is quite different from the new data. We'll be announcing the new pageview dataset on dumps soon, probably early next quarter (we're just backfilling so we have as much data as possible in there).

I guess @ezachte has answered my question, or maybe there are other options i was not aware of ...

my use case:
I build logic to extract various of things from an article, and the logic may fail on some articles while work on most of articles. when improving the logic, I'd like to use pageview of articles as a weight metric to help me focus on popular (more important) articles. So I need to have pageview for all articles.

So I need to have pageview for all articles.

You can get that from pageview API, please see docs: https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI

my use case:
I build logic to extract various of things from an article, and the logic may fail on some articles while work on most of articles. when improving the logic, I'd like to use pageview of articles as a weight metric to help me focus on popular (more important) articles. So I need to have pageview for all articles.

@Bianjiang, I see now that I misunderstood. The dataset that @ezachte pointed out is the correct one.

So I need to have pageview for all articles.

You can get that from pageview API, please see docs: https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI

For bulk data, the dumps are the better source. I was just confused about the status of which dataset is derived from which. We'll have to clean up the docs around that.

@Nuria depends on what @Biangjang meant: I thought separate counts for each article. That might work in theory, not in practice, for largest wikis, no? If combined total for all articles, then yes of course.

@Milimetric I updated https://dumps.wikimedia.org/other/pagecounts-ez/ sometime in December to use the newest pageview definition.

@ezachte oooh!!! I thought you did that but then I looked at the docs on the page and they still say they're a derivative of pagecounts-raw. Ok, I'll add that to my todo for cleaning up those docs. Thanks, I'll go edit my comment.

I was ging to update those docs, then I forgot. My bad.

Please consider to update this page too.
https://en.m.wikipedia.org/wiki/Wikipedia:Pageview_statistics

Hmm that page should probably be deleted, or completely rewritten. It seems devoted to stats.grok.se, when we wouldn't want it devoted to Pageviews Analysis, either. There are multiple tools out there, and more on the way.

@Shizhao , please update your example link above - if someon's user interface is not "zh", they won't see it. The link should have &uselang=zh at the end.

@Shizhao , please update your example link above - if someon's user interface is not "zh", they won't see it. The link should have &uselang=zh at the end.

fixed