Page MenuHomePhabricator
Feed Advanced Search

Apr 24 2016

ezachte added a comment to T131783: Examine wikistats reports, make a summary of the most granular data needed that would serve all reports.

Regional codes and number of speakers per language.

Apr 24 2016, 10:28 AM · Analytics-Kanban
ezachte added a comment to T131783: Examine wikistats reports, make a summary of the most granular data needed that would serve all reports.

As for counting bytes vs chars vs words, here are some considerations.

Apr 24 2016, 10:16 AM · Analytics-Kanban
ezachte added a comment to T131783: Examine wikistats reports, make a summary of the most granular data needed that would serve all reports.

Countable namespaces

Apr 24 2016, 9:42 AM · Analytics-Kanban

Apr 21 2016

ezachte added a comment to T132761: pagecounts-ez files are missing data.

The daily/monthly aggregates already use the newest data feed (aka Dan's webstatscollector 3.0, hadoop based)
We should abandon the hourly file feeds, based on webstatscollector 1.0/2.0. Not the aggregates.

Apr 21 2016, 9:48 PM · Analytics

Apr 19 2016

ezachte added a comment to T76348: Upgrade stat1001 to Debian Jessie.

I looked at the backups at stat1001. I need to tidy things up. Some backups occur too often, and have a lot of garbage in it. Apologies for the overhead this incurred.

Apr 19 2016, 10:28 PM · Patch-For-Review, Analytics-Kanban, Operations

Apr 13 2016

ezachte added a comment to T76348: Upgrade stat1001 to Debian Jessie.

I can't login right now to check.
The vast majority of that 2TB will be backups, which I thin out every half year or so.
All html files in htdocs should be copies from generated files on stat1002.

Apr 13 2016, 8:14 PM · Patch-For-Review, Analytics-Kanban, Operations

Mar 26 2016

ezachte added a comment to T130406: Add (and default to) a breakdown in percentages also for the line chart..

Changing default to percentages will make WoW changes really small (except when a new browser version is released and people do mass update).

Mar 26 2016, 8:00 PM · Patch-For-Review, Analytics-Kanban

Mar 25 2016

ezachte added a comment to T44318: Restore WikiStats features disabled for mere performance reasons.

Denied. Expect an announcement on Wikistats in coming weeks, soon as migrated traffic reports (breakdown of browser and OS traffic data) are published.

Mar 25 2016, 10:43 AM · Analytics, Analytics-Wikistats

Mar 7 2016

ezachte added a comment to T126579: Total page view numbers on Wikistats do not match new page view definition.

@Not much. I did some consistency checks, but nothing conclusive yet. My approach is to compare Wikistats counts with ad hoc aggegrated webstatscollector 3.0 counts. If those match it's out of my hands, and the mismatch should be found in hive scripts. If those don't match hopefully it will become apparent what constitutes the difference. BTW I may be mostly offline for the rest of the week (moving).

Mar 7 2016, 10:08 PM · Analytics, Internet-Archive, Analytics-Wikistats

Mar 5 2016

ezachte added a comment to T127359: Problems with Erik Zachte's Wikipedia Statistics.

The two metrics are incompatible from 2007 onwards

Mar 5 2016, 11:29 PM · Analytics-Wikistats

Mar 2 2016

ezachte added a comment to T120497: Pageview Stats tool.

I was ging to update those docs, then I forgot. My bad.

Mar 2 2016, 5:55 PM · Pageviews-API, Analytics, Tools, Phlogiston-Category, Community-Tech, Community-Wishlist-Survey-2015
ezachte added a comment to T120497: Pageview Stats tool.

@Nuria depends on what @Biangjang meant: I thought separate counts for each article. That might work in theory, not in practice, for largest wikis, no? If combined total for all articles, then yes of course.

Mar 2 2016, 5:39 PM · Pageviews-API, Analytics, Tools, Phlogiston-Category, Community-Tech, Community-Wishlist-Survey-2015

Feb 29 2016

ezachte added a comment to T120497: Pageview Stats tool.

Bianjiang, probably not. Dumps 2.0 is about database dumps, not traffic log dumps.

Feb 29 2016, 2:56 AM · Pageviews-API, Analytics, Tools, Phlogiston-Category, Community-Tech, Community-Wishlist-Survey-2015
ezachte added a comment to T120497: Pageview Stats tool.

Daily and monthly aggregates are at https://dumps.wikimedia.org/other/pagecounts-ez/merged/

Feb 29 2016, 1:29 AM · Pageviews-API, Analytics, Tools, Phlogiston-Category, Community-Tech, Community-Wishlist-Survey-2015

Feb 25 2016

ezachte added a comment to T127657: Wikistats doesn't yet know of all content namespaces on Wikisource.

Yes, that's why I asked:

Feb 25 2016, 6:13 PM · Analytics, Analytics-Wikistats

Feb 24 2016

ezachte updated subscribers of T127359: Problems with Erik Zachte's Wikipedia Statistics.

2 I updated links at both places, thanks for noticing.

Feb 24 2016, 10:47 PM · Analytics-Wikistats
ezachte added a comment to T127657: Wikistats doesn't yet know of all content namespaces on Wikisource.

@Zdzislaw. Hardcoded extra namespaces for some wikisource projects is older code. The newer way is to follow the API which lists all content namespaces per wiki. Every day I harvest these settings for all wikis.

Feb 24 2016, 9:27 PM · Analytics, Analytics-Wikistats

Feb 18 2016

ezachte added a comment to T127359: Problems with Erik Zachte's Wikipedia Statistics.

1 The code is at https://github.com/wikimedia/analytics-wikistats/blob/master/dumps/perl/WikiCountsInput.pm, line 1810 etc, sub CollectArticleCounts
WikiStats is poorly documented (but that fact by itself is pretty well documented as I keep telling this every year or so)

Feb 18 2016, 10:24 PM · Analytics-Wikistats

Jan 28 2016

ezachte closed T124340: LIMN input file wikilytics_in_pageviews.csv no longer updated as Resolved.

Path names for this step were wrong after major update for https://phabricator.wikimedia.org/T114379 . Fixed

Jan 28 2016, 6:49 PM · Analytics-Wikistats
ezachte closed T123477: Daily/monthly aggregation of hourly page view files halted as Resolved.

Oops. This I fixed some two weeks ago, but I hadn't marked it as resolved yet. Doing that now.

Jan 28 2016, 6:43 PM · Analytics

Jan 22 2016

ezachte added a comment to T120497: Pageview Stats tool.

If we have more human resources than functional requirements, I'd like to propose this idea: what about making this new UI language independent, internationalization done via Translatewiki?

Jan 22 2016, 9:48 PM · Pageviews-API, Analytics, Tools, Phlogiston-Category, Community-Tech, Community-Wishlist-Survey-2015

Jan 21 2016

ezachte created T124340: LIMN input file wikilytics_in_pageviews.csv no longer updated.
Jan 21 2016, 8:06 PM · Analytics-Wikistats

Jan 18 2016

ezachte added a comment to T120497: Pageview Stats tool.

One more defunct tool by EWM, for comparison:

Jan 18 2016, 4:29 PM · Pageviews-API, Analytics, Tools, Phlogiston-Category, Community-Tech, Community-Wishlist-Survey-2015
ezachte added a comment to T120497: Pageview Stats tool.

Another tool that was hugely popular among press people, many years ago was Wikistics, built on top of stats.grok.se
It focuses totally on most accessed pages, in a very simple format.

Jan 18 2016, 4:11 PM · Pageviews-API, Analytics, Tools, Phlogiston-Category, Community-Tech, Community-Wishlist-Survey-2015
ezachte added a comment to T120497: Pageview Stats tool.

Here is a screen copy of WikiViewStats.
https://upload.wikimedia.org/wikipedia/commons/a/a4/Wiki_ViewStats_-_2014-06-26.png

Jan 18 2016, 3:58 PM · Pageviews-API, Analytics, Tools, Phlogiston-Category, Community-Tech, Community-Wishlist-Survey-2015
ezachte added a comment to T113695: Clean the code review queue of analytics/wikistats.

closed https://gerrit.wikimedia.org/r/#/c/92056/ -> 4/5 open

Jan 18 2016, 3:37 PM · DevRel-February-2016, Analytics, DevRel-January-2016, DevRel-December-2015, DevRel-November-2015, Analytics-Wikistats, DevRel-October-2015
ezachte added a comment to T113695: Clean the code review queue of analytics/wikistats.

@Nemo_bis says this may have to wait till Feb
in the meantime I can look further into '[Full dump analysis] Reduce edits_only and reverts_only intricacy'

Jan 18 2016, 3:34 PM · DevRel-February-2016, Analytics, DevRel-January-2016, DevRel-December-2015, DevRel-November-2015, Analytics-Wikistats, DevRel-October-2015

Jan 17 2016

ezachte closed T122864: Mediacounts missing top1000 files after 2016-01-01: rsync fails as Resolved.

There was lingering test code. Files are up to date now.

Jan 17 2016, 2:03 PM · Analytics-Kanban, Datasets-Webstatscollector, Datasets-Archiving, Analytics-Cluster

Jan 15 2016

ezachte added a comment to T117221: [Epic] Update official Wikimedia press kit with accurate numbers.

Since early Dec 2015 there is a new chart on 'Active wikis' which will help us to assess a good cut-off point.
http://stats.wikimedia.org/EN/PlotActivityZZ.png

Jan 15 2016, 2:32 PM · Research-Backlog, Product-Analytics, Reading-analysis, Analytics, Research-consulting
ezachte added a comment to T113695: Clean the code review queue of analytics/wikistats.

I reached out by mail to @Nemo_bis with comments on each open patch.

Jan 15 2016, 1:05 PM · DevRel-February-2016, Analytics, DevRel-January-2016, DevRel-December-2015, DevRel-November-2015, Analytics-Wikistats, DevRel-October-2015

Jan 13 2016

ezachte created T123477: Daily/monthly aggregation of hourly page view files halted.
Jan 13 2016, 12:50 PM · Analytics
ezachte closed T122864: Mediacounts missing top1000 files after 2016-01-01: rsync fails as Resolved.

Well everything did get synced in the end. My assumption on required folder rights was wrong. Still not sure why
hdfs dfs -put -f /a/wikistats_git/mediacounts/daily/2016/mediacounts.top1000.2016-01-02.v00.csv.zip hdfs:///wmf/data/archive/mediacounts/daily/2016
didn't produce an immediate update.

Jan 13 2016, 12:24 AM · Analytics-Kanban, Datasets-Webstatscollector, Datasets-Archiving, Analytics-Cluster

Jan 12 2016

ezachte added a comment to T122864: Mediacounts missing top1000 files after 2016-01-01: rsync fails.

@Ottomata now both 2015 and 2016 have drwxr-xr-x instead of drwxrwxr-x So I can't update 2016

Jan 12 2016, 9:45 PM · Analytics-Kanban, Datasets-Webstatscollector, Datasets-Archiving, Analytics-Cluster

Jan 11 2016

ezachte updated subscribers of T122864: Mediacounts missing top1000 files after 2016-01-01: rsync fails.
Jan 11 2016, 6:15 PM · Analytics-Kanban, Datasets-Webstatscollector, Datasets-Archiving, Analytics-Cluster

Jan 8 2016

ezachte added a comment to T122864: Mediacounts missing top1000 files after 2016-01-01: rsync fails.

@Hydrix, sorry the fix was incomplete, in that the rsync still fails over folder access rights. I can't fix that myself (and today is all-staff) but should be done no later than Monday I assume.

Jan 8 2016, 4:13 PM · Analytics-Kanban, Datasets-Webstatscollector, Datasets-Archiving, Analytics-Cluster

Jan 6 2016

ezachte closed T122864: Mediacounts missing top1000 files after 2016-01-01: rsync fails as Resolved.

Connection of stat1002 with /mnt/hdfs/wmf/data/archive/projectview/geo/hourly/ was lost
@Ottomata fixed this: "Hadoop namenode was inactive"

Jan 6 2016, 2:13 PM · Analytics-Kanban, Datasets-Webstatscollector, Datasets-Archiving, Analytics-Cluster

Dec 17 2015

ezachte added a comment to T48204: Page view stats: monthly most popular articles not updated.

Revisiting that page I think my comment about not being terribly essential was mostly for the 2nd, 3rd and 4th tables on that page which focus on most requested non existing page and files.
I don't think the new page view api can zoom in on those missing pages/files in particular. But again, there seems to be little demand for it.

Dec 17 2015, 5:23 AM · Analytics-Wikistats

Dec 11 2015

ezachte closed T72900: "Top month" and "Trend last 24 months" missing in Wikipedia columns as Invalid.

The empty row is YoY which will return when new pageview def is there for 13 months. (removing it entirely would be better, but is really a small matter)

Dec 11 2015, 1:12 PM · Internet-Archive, Analytics-Wikistats

Dec 10 2015

ezachte closed T120294: ScanMail not showing months after July 2015 as Resolved.

New location is http://stats.wikimedia.org/mail-lists/index.html

Dec 10 2015, 3:02 PM · Wikimedia-Mailing-lists

Dec 9 2015

ezachte added a comment to T90203: Upgrade daily/monthly aggregations of pageview dumps to new data files.

@Milimetrics FYI the dumps use http://dumps.wikimedia.org/other/pageviews/
What makes them still usefull is that they contain page views for all articles (with 5 or more views per month).
Monthly totals, while retaining hourly precision.

Dec 9 2015, 10:08 PM · Analytics
ezachte closed T90203: Upgrade daily/monthly aggregations of pageview dumps to new data files as Resolved.

Actually this was done already some two weeks ago as a subtask of https://phabricator.wikimedia.org/T114379

Dec 9 2015, 10:05 PM · Analytics
ezachte reopened T90203: Upgrade daily/monthly aggregations of pageview dumps to new data files as "Open".
Dec 9 2015, 10:04 PM · Analytics
ezachte added a comment to T120294: ScanMail not showing months after July 2015.

I just learned the scripts had stalled for half a year from this thread.

Dec 9 2015, 5:26 PM · Wikimedia-Mailing-lists

Dec 1 2015

ezachte added a comment to T112956: Developer summit session: Pageview API from the Event Bus perspective.

The new data in http://dumps.wikimedia.org/other/pageviews/ already exclude spider requests, so contain user data only.
The way I'm reading @Milimetric's comment is amount of spiders requests could be added as a separate metric if called for.
I would be (somewhat) interested to see the overall share of spider traffic per project (not per wiki), but no big deal at all.
We could do that with a internal hive job using sampled data.

Dec 1 2015, 2:58 PM · Analytics, Wikimedia-Developer-Summit-2016

Nov 27 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Added 3 more charts for per project totals, e.g. http://stats.wikimedia.org/EN/draft/SummaryZZ.htm (preview location)

Nov 27 2015, 5:19 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Nov 24 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Three new charts for per project totals, to do: 'Total articles'

Nov 24 2015, 6:28 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Nov 20 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

I migrated daily/monthly aggregates from WC 1 to WC 3. This concludes migration effort for Monthly Page Views stream.

Nov 20 2015, 10:47 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte updated the task description for T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].
Nov 20 2015, 10:27 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Nov 19 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Updated diagram, a.o. to show new file names + added missing report

Nov 19 2015, 9:03 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte reopened T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] as "Open".
Nov 19 2015, 8:40 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Nov 17 2015

ezachte added a comment to T112956: Developer summit session: Pageview API from the Event Bus perspective.

Also Magnus and I both pleaded for monthly stats earlier, each for different use cases

Nov 17 2015, 6:45 PM · Analytics, Wikimedia-Developer-Summit-2016

Nov 13 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

@JAllemandou, Wow, great find! I guess this affects mostly wikis where a large percentage of page views is from wikipedians editing pages. Looking at http://stats.wikimedia.org/wikispecial/EN/TablesPageViewsMonthly.htm a similar effect seems to occur at meta. But somehow not at wikidata.

Nov 13 2015, 12:32 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Nov 12 2015

ezachte added a comment to T115922: Send email out to community notifying of change {lama} [1 pts].

I'm still working on https://phabricator.wikimedia.org/T114379 (see status report there)
Can we postpone this till everything is in place?

Nov 12 2015, 7:33 PM · Analytics-Wikistats, Analytics-Kanban, Patch-For-Review
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Dan,
I'm still working on loose ends for Monthly Page View Reports.
Also this task also was about Traffic Breakdown Reports, which we just started to work on. Is that another phab task now?

Nov 12 2015, 6:29 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Nov 11 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

@Nemo_bis

Will the "Views/hr" column in the index for each project (https://stats.wikimedia.org/wiktionary/EN/ and friends) be converted too?

Nov 11 2015, 12:08 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Nov 10 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Fixed foundation stats, which uses codes a bit differently:
www.f is foundation desktop, m.f is foundation mobile, zero.f is foundation zero.

Nov 10 2015, 4:21 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T116244: Update reportcard.wmflabs.org with July-October data.

BTW I propose we don't update comScore trends (and tell so on the RC). Their sudden drop seems inconsistent with our internal numbers. comScore was asked to comment and they agreed to investigate but that didn't bring us any further. We won't receive any updates from them anyway.

Nov 10 2015, 1:37 PM · Analytics-Backlog
ezachte added a comment to T116244: Update reportcard.wmflabs.org with July-October data.

Well I am mostly responsibly for this. When the comScore unique visitors and page views counts dropped so suddenly that it raised serious doubt over the numbers, and our internal page views revealed massive corruption of our own data [1] I stalled updates (informing @Tbayer). Our internal page view numbers are mostly fixed now [1], and better than ever (no more bot traffic). I hope to finalize this cut-over in coming days. We will then present updates to report card, with better numbers since May 2015, and some older totally insane and hard to fix numbers blanked out (PV for smaller projects) .

Nov 10 2015, 1:30 PM · Analytics-Backlog

Nov 6 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

So I checked with hive query on pageviews_hourly

Nov 6 2015, 12:02 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Nov 5 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].
Nov 5 2015, 4:18 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Nov 4 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

encore My scripts were processing webstatscollector 1.0 output so far. That's why I encountered it only now.

Nov 4 2015, 9:31 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Hey Dan, no worries. I should have been more clear. This has nothing to do with your upgrade to webstatscollector 3.0. It's a result of a conscious decision by Christian and me to keep webstatscollector 2.0 totally downward compatible. We chose to keep the upgrade to wc 2.0 transparent for users, who could switch to new files but could ignore new codes. This allowed us to do this upgrade fast.

Nov 4 2015, 9:29 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T117247: Understand the Perl code for "Visiting Country per Wiki" report {lama}.

By all means let's talk. I moved the meeting to Monday (I'm away Fri-Sun).

Nov 4 2015, 6:33 PM · Analytics-Kanban
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Status: updates have been tested, see stat1002:/a/dammit.lt/projectviews/projectviews_csv.zip

Nov 4 2015, 5:53 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T117247: Understand the Perl code for "Visiting Country per Wiki" report {lama}.

Once I got https://phabricator.wikimedia.org/T114379 done (hopefully tomorrow) I hope to get geo reports [1] back online using new hive feed

Nov 4 2015, 5:34 PM · Analytics-Kanban

Nov 3 2015

ezachte added a comment to T87738: Discrepancies in historical total active editor numbers.

As suggested in June, I think that lists of users who were counted in one dump but not another might be useful for debugging.

Nov 3 2015, 5:34 PM · Analytics, Analytics-Wikistats

Nov 2 2015

ezachte added a comment to T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].

@AndyRussG thanks for chiming in. Now I understand what this is about.

Nov 2 2015, 5:14 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

@Milimetric, projectviews are indeed all I need for this process
(someday when I upgrade daily&monthly aggregates, backfilling pageviews could be helpful) [1]

Nov 2 2015, 3:02 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T67994: Cross-link stats.wikimedia.org and ee-dashboard.wmflabs.org.

What's the best way to detect which language codes have new stats? (other than screen scraping https://meta.wikimedia.org/wiki/Research:VisualEditor)

Nov 2 2015, 11:47 AM · Analytics-Wikistats

Oct 30 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

The 10x larger numbers in webrequest vs. pageview_hourly are probably due to is_pageview being false for 90% of the hits. That makes sense on the regular site where there are a lot of things like JS, CSS, etc. coming down with each pageview. It's a bit surprising on wpzero. You can add the is_pageview filter on webrequest to validate this theory.

Oct 30 2015, 10:05 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].

Thanks Nuria, so we're zooming in on what happened. I'm still wondering though, how can we have 56 Special:HideBanners requests for every real page request? Doesn't that seem odd? Would we have to ask ops to explain?

Oct 30 2015, 8:39 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Is there any sensitivity we need to be aware of when publishing reports for small countries from the unsampled logs? For projects with little to no activity a set of localized pageviews can disclose the location of an editor.

The pageviews aren't localized as part of this dataset, this is just Page Title, View Count. Do you mean the localization that wikistats does used in combination with this? I'm not seeing the connection there either.

Oct 30 2015, 5:46 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

I also vote for doing away with .mw, it's redundant, and confusing indeed.

Oct 30 2015, 5:44 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

+1 hm on redacted numbers.

Oct 30 2015, 4:38 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte created T117236: Track overall traffic, without any filtering, broken down into major categories, for internal use..
Oct 30 2015, 4:30 PM · Analytics

Oct 29 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Dan, here is a comparison of data for one hour in webstatscollector 1/2/3
Most counts are similar, or understandably different. A few differences I'm not sure what to make of it. Any idea?

Oct 29 2015, 2:01 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Oct 28 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Thanks Dan! I'll do some sanity checks, and report back.

Oct 28 2015, 10:06 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Oct 27 2015

ezachte updated subscribers of T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].
Oct 27 2015, 4:14 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics

Oct 26 2015

ezachte added a comment to T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].

This text in description is misplaced "From squid logs I get a totally different number yet again (see below)" as both chart above and below this text refer to same data feed. (read below as 'upcoming comments')

Oct 26 2015, 6:15 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics
ezachte added a comment to T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].

@Nuria Right, I know actually, so yes 128 (x 1000) for sampled logs (255 - 127 CentralAuthoLogin) comes somewhat close to hive number from pageviews_hourly for July 10: 48k spider + 35k user = 82k. The 128k from squid logs is the upper limit as that factors in mime type only.

Oct 26 2015, 6:11 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics
ezachte added a comment to T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].

@Ottomata squid logs is 1:1000 sampled at stat1002:/a/squid/archive/sampled>

Oct 26 2015, 4:59 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics
ezachte updated the task description for T116531: Monthly page view stats for wikibooks, wikinews, wikiquote, wikisource, wikiversity for July 2015 are extremely anomalous.
Oct 26 2015, 4:49 PM · Analytics, Analytics-Wikistats
ezachte added a comment to T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].

filtering 1:1000 sampled squid logs for wikinews html requests

Oct 26 2015, 4:47 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics
ezachte added a comment to T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].

webstatscollector 2.0 output:
in stat1002:/mnt/data/xmldatadumps/public/other/pagecounts-raw/2015/2015-07>

Oct 26 2015, 4:40 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics
ezachte added a comment to T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].

hive query for


USE wmf ;

Oct 26 2015, 4:38 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics
ezachte created T116609: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts].
Oct 26 2015, 4:35 PM · Technical-Debt, Patch-For-Review, Analytics-Kanban, Analytics

Oct 25 2015

ezachte renamed T116531: Monthly page view stats for wikibooks, wikinews, wikiquote, wikisource, wikiversity for July 2015 are extremely anomalous from Monthly page view stats for wikibooks, wikinews, wikiquote, wikisource, wikiversity for Aug 2015 are extremely anomalous to Monthly page view stats for wikibooks, wikinews, wikiquote, wikisource, wikiversity for July 2015 are extremely anomalous.
Oct 25 2015, 3:31 PM · Analytics, Analytics-Wikistats
ezachte added a comment to T116531: Monthly page view stats for wikibooks, wikinews, wikiquote, wikisource, wikiversity for July 2015 are extremely anomalous.

Yes, I mean July. Aug 2011 was a botnet, with 5% of overall page views from less than 100 ip addresses, requesting Random Page (very nifty way to keep us busy).

Oct 25 2015, 3:31 PM · Analytics, Analytics-Wikistats
ezachte updated the task description for T116531: Monthly page view stats for wikibooks, wikinews, wikiquote, wikisource, wikiversity for July 2015 are extremely anomalous.
Oct 25 2015, 12:08 PM · Analytics, Analytics-Wikistats
ezachte created T116531: Monthly page view stats for wikibooks, wikinews, wikiquote, wikisource, wikiversity for July 2015 are extremely anomalous.
Oct 25 2015, 12:07 PM · Analytics, Analytics-Wikistats
ezachte created T116526: Study feedback from Andyrom75 for IT wikivoyage on how articles are classified by Wikistats.
Oct 25 2015, 10:58 AM · Analytics, Analytics-Wikistats

Oct 19 2015

ezachte added a comment to T113406: Quantifying the "sum of all contributors".

Some notes on what seems an unanswerable question.

Oct 19 2015, 3:21 PM · Research-Archive, Research-consulting

Oct 15 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

While waiting for new input for Monthly Pageview Reports (which is coming along, thanks @Milimetric !), I looked into Traffic Breakdown Reports, subset Geo Reports.

Oct 15 2015, 2:48 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Oct 9 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

@Tbayer not sure why you mention Wikistats in this context. Or am I getting you wrong?

Oct 9 2015, 10:11 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Oct 8 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Dan, using sequence numbers to detect anomalies makes total sense to me. In fact I used that also to repair multi-months 20%-30% UDP message loss. By measuring per server per hour how much the average gap between sequence numbers went above the expected average gap (which of course is 1000 for the 1:1000 sampled log). That will work for capture-errors. It's not a cure-all, it won't help for the case I mentioned where massive amounts of bogus 'page views' came our way for two weeks. Neither is my half-automated blacklisting of bad hours a cure-all.

Oct 8 2015, 1:34 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

So how to proceed?

Oct 8 2015, 10:17 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

@Tbayer absolutely, being consistent is important.

Oct 8 2015, 10:15 AM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Oct 7 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

The quick survey shows most support for continuation of the geographic reports (report 21-24) , more than other breakdowns https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per_report_B2

Oct 7 2015, 10:49 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats
ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

I figured we can produce all breakdowns by geography (middle column of TBD diagram) with two datasets, one for views, one for edits. 8 fields only in each:

Oct 7 2015, 9:14 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats

Oct 6 2015

ezachte added a comment to T114379: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts].

Monthly PageView reports (MPV)

Oct 6 2015, 2:50 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats