- read the Perl
- pick @ezachte's brain about it, see if he has any advice
- summarize anything besides basic aggregation, for example:
- any fuzziness added or other privacy measures
- any creation of synthetic data to fill potential holes
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Milimetric | T115607 Traffic Breakdown Report - Visiting Country per Wiki {lama} | |||
Resolved | JAllemandou | T117247 Understand the Perl code for "Visiting Country per Wiki" report {lama} |
Event Timeline
Once I got https://phabricator.wikimedia.org/T114379 done (hopefully tomorrow) I hope to get geo reports [1] back online using new hive feed
Would it make sense to focus on non geo reports first? (breakdown by browser, OS, etc)
For those non geo reports I'd advise to study the reports themselves rather than the code, which is ugly at places, and complicated because all reports are done from one perl file. [2]
[1] http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVisitsEdits.htm
[2] Bad programming I know, but for several years getting things done came before getting things done properly
By all means let's talk. I moved the meeting to Monday (I'm away Fri-Sun).
For starters my current thinking is this: reports like breakdown by browser, operating system, mime type, are relatively straightforward. Several aggregation levels and sort sequences are all presented now in one static html. A complete overhaul could make sense, with user specifying required detail level, time period, html requests only Y/N, apps Y/N. But next to dynamic query tool, a downloadable monthly csv/json file would also help a lot (fixed format or via API). This has been asked for many times.
So for revitalizing current reports I'm merely thinking of geo breakdowns, which also present demographics (mined from Wikipedia now), and has a bit richer UI. Not that these reports can't be improved, but they have less reliability issues and can be brought back to life relatively quickly. Then in the medium future with experience gained from replacing the non-geo these could also be rethought and improved. So for now I propose to revive these geo reports myself, and isolate the presentation code into a separate module.
Didn't read the code, but did a detailed review of the report.
My view of the thing summarized in "Pageviews Per Country" section( first) of that etherpad.