Wed, Aug 21
@Yair_rand you examples show ingenuity, yet they also seem somewhat contrived. Suppose some malicious geeky and rather obsessed user would go to such length to 'exploit a weakness' in the privacy protection, and they learn about the country of a wikimedian who doesn't want to reveal themselves, how much damage could be done? Say China, with its enormous resources finds out that 16 active editors on a small wiki all edit from Taiwan. How much would they have learned then? Taiwan has 23+ million population. That geeky detective could probably also learn from text analysis (English isn't spoken the same in different countries), from analysis of edit times (where waking hours is a proxy for time zone), from edits being spaced wider apart from countries with low bandwidth. I admit all contrived examples as well, and only effective in combination, and in the hands of a very geeky and obsessed malicious user with infinite resources. It's probably easier for such a geek to infiltrate our security by social engineering, placing a mole, and what have you.
May 6 2019
IIRC the criterion was rather which wikis to exclude explicitly. Those were wikis which are not publicly editable or nor even readable. Example that comes to mind is the board wiki.
Apr 2 2019
That's OK. Cheers
Dec 19 2017
Ah, sorry I mixed up two tasks then. Just copied my comments to https://phabricator.wikimedia.org/T182960
For the record, I'm copying from a mail exchange with @Milimetric :
For the record, I'm copying from a mail exchange with @Milimetric :
My preference would be the second option: Just when a color is lighter than #999999 add a thin black outline.
But using lighter colors all over would help as well.
No, I'm fine with second bullet point, I just meant to say 'there is too much happening when I click the radiobutton'. But subdivide on the chosen metric makes sense.
Dec 18 2017
@Catrope sorry, I added subscribers and project
Dec 13 2017
- For a Wikipedia to be shown, it has to have a minimum of 0.1% of all traffic in pageviews.
Reports with new announcement are being generated. The English Wikipedia dump is still being parsed so I will update reports again later this week.
Dec 12 2017
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats briefly mentions the data lake and edits, but 90% of the page is about pageviews and traffic. Those are different data streams (no relation with dumps), different reports.
Yes I can update the text tomorrow.
Dec 6 2017
Here is a first draft: https://stats.wikimedia.org/wikinews/EN/draft/TablesWikipediaES.htm Please comment.
Dec 5 2017
I see recent R charts again! It was an elusive bug, hard to replicate.
Nov 28 2017
@mpopov Thanks, I totally rely on Andrew for this, I don't have root access, which is fine to me, so I can't mess up ;-) And server migrations are rare anyway.
@fdans WiViVi doesn't use a threshold because the aggregation level is that high that individuals don't stand out from the crowd, except for fringe cases (and even then ... how serious is that ?)
WiViVi reports monthly request counts, broken down by originating country and target wiki.
Yes one person can account for all page requests for a very small wiki from a very small country (fringe case).
I have been working on the premise this is not a privacy hazard.
script stat1005:/home/ezachte/wikistats/dumps/bash/collect_edits.sh has been adapted to stat1005
Thanks, will do.
Hmm, I could migrate all of Wikistats to stat1004 (prefer to keep all one machine, also charts are part of overall Wikistats job.
Is stat1004 machine equivalent to stat1005?
Does 'still Jessie' imply that stat1004 will be upgraded at some point and the same issue will reoccur?
Nov 27 2017
script stat1005:/home/ezachte/wikistats/dumps/bash/extract_dump.sh has been adapted to stat1005
Nov 19 2017
@fdans Yes it's still happening.
Nov 13 2017
Nov 9 2017
That looks odd, indeed, also for other languages.
Oct 24 2017
Yes the basic principle has changed a bit, albeit longer ago, start 2017.
Oct 23 2017
Another comparison between Wikistats 1 and 2: this time edit counts per user.
I compared edit counts for users with 5000+ edits on af.wikipedia.org, namespace 0.
I collected feedback in https://phabricator.wikimedia.org/T178591 (I don't know how to link it here as a subtask, I never did such)
Oct 19 2017
There is page_is_redirect_latest, I imagine it could be very useful to also have a field to which page id or page title the redirect goes. For example for combining pageview counts. Not that Wikistats 1 has such, but still..
There are columns event_user_is_bot_by_name and user_is_bot_by_name, but not event_user_is_bot or user_is_bot. Wouldn't that make sense to have those as well?
Question: with deleted revisions still somewhere in the database, as column revision_is_deleted suggests: should these be shielded from the public once this database is opened for public access?
Building on the previous comment (about page deletions):
I see revision_is_deleted, but how about page_is_deleted?
Oct 13 2017
@Milimetric thanks for caring!
Oct 12 2017
On some places I see K/M/B for thousand/million/billion.
On other places k/m/b. Maybe make a general formatting routine for this (preferedly language sensitive).
Oct 11 2017
I made a few more edits last night.
Oct 9 2017
bash file datamaps_views.sh has been migrated to stat1005, so monthly updates to WiViVi can now be generated
script datamaps_views.sh, for updating WiViVi data, has been adapted to stat1005
viz. now shows data for Sep 2017
ad-blocker uBlock blocks WiViVI (with 3 reports on this from a small audience, this better be solved before we publish the blog post)
Highest range of values for map "Wikipedia pageviews, percentage to language ...." (map in red-orange-yellow) has been split into two ranges, on user request. So range 50%-100% is now range 50%-80% and range 80%-100%
Sep 26 2017
Sep 22 2017
Major production jobs + visualisations developed by Erik Zachte which are still in use
Sep 15 2017
Data have been relayed to Tony Sep 7.
Thanks @Aklapper for quick response.
Sep 14 2017
Please do. Tomorrow any time till 12 AM PDT works for me. Preferably a bit earlier.
Sep 13 2017
Totally agree, non zero based y-axis doesn't give a sense of scale, and can make any slight breeze be misinterpreted as a hurricane.
I never understood why Tufte was so keen on these, and being applauded for it, where he often calls for being unambiguous.
Area coloring on chart with one line look nice to me.
Sep 12 2017
Ah I forgot to close this task. Scripts are running again.
Sep 11 2017
@Nuria Ah I missed these. Sorry about that. A good example of why restructuring my mailbox was dearly needed, so I finally fine-tuned Gmail filters this weekend. So I could start vetting later this week. Shall we do a hangout?
Sep 10 2017
USE wmf ;
SELECT s.*, count(*) AS count FROM (SELECT access_method, geocoded_data['country'] country, day FROM webrequest WHERE year=2017 AND month=9 AND day < 9 AND uri_host LIKE "%meta%" AND agent_type = "user" AND referer LIKE "%wikipedia%" AND uri_query LIKE "%BannerLoader%" AND uri_query LIKE "%wlm_2017%" ) s GROUP BY s.country, s.access_method, s.day ORDER BY s.country, s.access_method, s.day LIMIT 1000000 ;
Not knowing of P4040 I first ran code to count image impressions of the thumbnail in the banner.
Later when Leila pointed to P4040 I ran those queries (again for test case South Korea) and compared with my thumb counts: they matched quite well.
Sep 4 2017
As for data that Wikistats can supply: (2/2)
As for data that Wikistats can supply: (1/2)
After a long outage (partly caused by unexpected server migration) I revived Wikistats scripts and parsed dumps in the last 10 days. So I can partially answer the question.
Aug 21 2017
that's fine with me, thanks
right now I am updating all Wikistats scripts to make them more uniform, better organized, a bit more documented, with better logging, with better backups
in preparation for productification, which may require further updates, but at least all scripts will then be more intelligible (for me as well)
I'll make a note to bring this up again end of calendar year
Aug 7 2017
Several readers report issues with ad blocker uBlock. Needs to be deactivated to see viz.
Aug 3 2017
Hyvä Suomi! Most readers for any Wikipedia per capita, if I read correctly. It would have been nice to confirm by reading in the table, but the numbers displayed in the table were different from what was displayed in the choropleth map.
(me: Great point, thanks. I need to look into that.)
Aug 2 2017
Viz published today at https://stats.wikimedia.org/wikimedia/animations/pageviews/wivivi.html
Aug 1 2017
data files: https://stats.wikimedia.org/wikimedia/animations/pageviews/data.html (to do: registering on repository)
Working on it. Andrew helped with git problem, also file structure is different on new server.
Jul 31 2017
Current version: https://meta.wikimedia.org/wiki/WiViVi#Welcome_panel
Jul 29 2017
Yes, no Wikistats cron jobs on stat1005 yet. I'll look at it Monday.
Jul 27 2017
hmm in Chrome
https://stats.wikimedia.org/newviz still redirects to https://stats.wikimedia.org/wikimedia/animations/pageviews/datamaps-views-v03.html
https://stats.wikimedia.org/newviz.html redirects to https://stats.wikimedia.org/wikimedia/animations/pageviews/datamaps-views-v06.html
https://stats.wikimedia.org/newviz.html now redirects to newest beta: https://stats.wikimedia.org/wikimedia/animations/pageviews/datamaps-views-v06.html
Jul 25 2017
viz now works in Edge browser (with workaround for Edge bug where mouseover fails)
added splash screen, for basic introduction
cleaned html with W3C validator
Jul 19 2017
data files have been described at https://stats.wikimedia.org/wikimedia/animations/pageviews/data.html
Jul 17 2017
I coined the viz. WiViVi, for Wi kipedia Vi ews Vi sualized.
Jul 13 2017
I just published a new version at https://stats.wikimedia.org/newviz
Jul 9 2017
Will do so. I'm still tweaking the UI to make it adapt to small screen/window