Page MenuHomePhabricator

Understand the Perl code for "Visiting Country per Wiki" report {lama}
Closed, ResolvedPublic

Description

  • read the Perl
  • pick @ezachte's brain about it, see if he has any advice
  • summarize anything besides basic aggregation, for example:
    • any fuzziness added or other privacy measures
    • any creation of synthetic data to fill potential holes

Event Timeline

Milimetric raised the priority of this task from to Medium.
Milimetric updated the task description. (Show Details)
Milimetric added a project: Analytics-Kanban.
JAllemandou set Security to None.
JAllemandou moved this task from Next Up to In Progress on the Analytics-Kanban board.
JAllemandou renamed this task from Understand the Perl code for this report {lama} to Understand the Perl code for "Visiting Country per Wiki" report {lama}.Nov 4 2015, 5:25 PM

Once I got https://phabricator.wikimedia.org/T114379 done (hopefully tomorrow) I hope to get geo reports [1] back online using new hive feed

Would it make sense to focus on non geo reports first? (breakdown by browser, OS, etc)
For those non geo reports I'd advise to study the reports themselves rather than the code, which is ugly at places, and complicated because all reports are done from one perl file. [2]

[1] http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVisitsEdits.htm
[2] Bad programming I know, but for several years getting things done came before getting things done properly

By all means let's talk. I moved the meeting to Monday (I'm away Fri-Sun).

For starters my current thinking is this: reports like breakdown by browser, operating system, mime type, are relatively straightforward. Several aggregation levels and sort sequences are all presented now in one static html. A complete overhaul could make sense, with user specifying required detail level, time period, html requests only Y/N, apps Y/N. But next to dynamic query tool, a downloadable monthly csv/json file would also help a lot (fixed format or via API). This has been asked for many times.

So for revitalizing current reports I'm merely thinking of geo breakdowns, which also present demographics (mined from Wikipedia now), and has a bit richer UI. Not that these reports can't be improved, but they have less reliability issues and can be brought back to life relatively quickly. Then in the medium future with experience gained from replacing the non-geo these could also be rethought and improved. So for now I propose to revive these geo reports myself, and isolate the presentation code into a separate module.

Didn't read the code, but did a detailed review of the report.

My view of the thing summarized in "Pageviews Per Country" section( first) of that etherpad.