User Details
- User Since
- Jan 6 2016, 1:53 PM (436 w, 1 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Erik Zachte [ Global Accounts ]
Mar 18 2020
Removing myself, I'm no longer involved
In fact this may have been resolved, I'm not sure. I certainly engaged with Dschungelfan
removing myself, I'm no longer involved
Jan 24 2020
@Nuria, thanks for the link. I will look at more depth to the task lists later this weekend.
I started a page on Wikistats 1 on https://meta.wikimedia.org/wiki/User:Erik_Zachte/Wikistats%201. Other than earlier overview pages for Wikistats this one is focused on where do we stand with Wikistats 1 now, in the light of the migration to Wikistats 2. What in Wikistats 1 still works? (several crucial data streams). What has been disabled ? (some of it prematurely I would say). What is not inside the scope of earlier surveys, but would be a pity if it got lost all-together (some of the Viz's). I give credits to new developments, but also make some critical remarks at the end.
Jan 9 2020
@elukey thanks for continuing a constructive dialogue.
@Nuria are you saying a fix that might take an hour, if not less, is not done, because another issue might popup in the future? It's not that you're committing for eternity to uphold Wikistats 1. Maintaining the perl scripts has never been expected, I've always been open about these being maintenance unfriendly.
Jan 5 2020
We are leaning towards turning these jobs off because they've been broken for almost a year and nobody has complained so far.
Dec 18 2019
Second issue:
@Milimetric thanks for heads-up.
Dec 17 2019
It seems I added @Error inadvertently. Is that a bot? Or a playful nickname?
Dec 16 2019
high level folders > 1GB:
A 270G ../wikistats_data/dammit
B 203G ../wikistats_data/dumps
C 138G ../wikistats_backup/
D 120G ../wikistats_data/squids
E 2G ../wikistats_data/mediacounts
high level folders > 1GB:
A 270G ../wikistats_data/dammit
B 203G ../wikistats_data/dumps
C 138G ../wikistats_backup/
D 120G ../wikistats_data/squids
E 2G ../wikistats_data/mediacounts
Dec 15 2019
So I looked first into the cron processes that are still enabled on home/ezachte. There are two.
Dec 13 2019
@elukey Hi! I'll get to this in coming days. Thanks for your patience.
Nov 15 2019
I have just reapplied for server access with John Bond
I was supposed to add the new public key myself at https://phabricator.wikimedia.org/T215790, but I can't even view that ticket as Erik_Zachte (ezachte).
Once I'm back online I will review the folders mentioned here, and comment.
Oct 23 2019
@Milimetric thanks for bringing this to completion
Aug 21 2019
@Yair_rand you examples show ingenuity, yet they also seem somewhat contrived. Suppose some malicious geeky and rather obsessed user would go to such length to 'exploit a weakness' in the privacy protection, and they learn about the country of a wikimedian who doesn't want to reveal themselves, how much damage could be done? Say China, with its enormous resources finds out that 16 active editors on a small wiki all edit from Taiwan. How much would they have learned then? Taiwan has 23+ million population. That geeky detective could probably also learn from text analysis (English isn't spoken the same in different countries), from analysis of edit times (where waking hours is a proxy for time zone), from edits being spaced wider apart from countries with low bandwidth. I admit all contrived examples as well, and only effective in combination, and in the hands of a very geeky and obsessed malicious user with infinite resources. It's probably easier for such a geek to infiltrate our security by social engineering, placing a mole, and what have you.
May 6 2019
IIRC the criterion was rather which wikis to exclude explicitly. Those were wikis which are not publicly editable or nor even readable. Example that comes to mind is the board wiki.
Apr 2 2019
That's OK. Cheers
Dec 19 2017
Ah, sorry I mixed up two tasks then. Just copied my comments to https://phabricator.wikimedia.org/T182960
For the record, I'm copying from a mail exchange with @Milimetric :
For the record, I'm copying from a mail exchange with @Milimetric :
My preference would be the second option: Just when a color is lighter than #999999 add a thin black outline.
But using lighter colors all over would help as well.
No, I'm fine with second bullet point, I just meant to say 'there is too much happening when I click the radiobutton'. But subdivide on the chosen metric makes sense.
Dec 18 2017
@Catrope sorry, I added subscribers and project
Dec 13 2017
- For a Wikipedia to be shown, it has to have a minimum of 0.1% of all traffic in pageviews.
Reports with new announcement are being generated. The English Wikipedia dump is still being parsed so I will update reports again later this week.
Dec 12 2017
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats briefly mentions the data lake and edits, but 90% of the page is about pageviews and traffic. Those are different data streams (no relation with dumps), different reports.
Yes I can update the text tomorrow.
Dec 6 2017
Here is a first draft: https://stats.wikimedia.org/wikinews/EN/draft/TablesWikipediaES.htm Please comment.
Dec 5 2017
I see recent R charts again! It was an elusive bug, hard to replicate.
Nov 28 2017
@mpopov Thanks, I totally rely on Andrew for this, I don't have root access, which is fine to me, so I can't mess up ;-) And server migrations are rare anyway.
@fdans WiViVi doesn't use a threshold because the aggregation level is that high that individuals don't stand out from the crowd, except for fringe cases (and even then ... how serious is that ?)
WiViVi reports monthly request counts, broken down by originating country and target wiki.
Yes one person can account for all page requests for a very small wiki from a very small country (fringe case).
I have been working on the premise this is not a privacy hazard.
script stat1005:/home/ezachte/wikistats/dumps/bash/collect_edits.sh has been adapted to stat1005
Thanks, will do.
Hmm, I could migrate all of Wikistats to stat1004 (prefer to keep all one machine, also charts are part of overall Wikistats job.
Is stat1004 machine equivalent to stat1005?
Does 'still Jessie' imply that stat1004 will be upgraded at some point and the same issue will reoccur?
Nov 27 2017
script stat1005:/home/ezachte/wikistats/dumps/bash/extract_dump.sh has been adapted to stat1005
Nov 19 2017
@fdans Yes it's still happening.
Nov 13 2017
Nov 9 2017
That looks odd, indeed, also for other languages.
Oct 24 2017
Yes the basic principle has changed a bit, albeit longer ago, start 2017.
Oct 23 2017
Another comparison between Wikistats 1 and 2: this time edit counts per user.
I compared edit counts for users with 5000+ edits on af.wikipedia.org, namespace 0.
I collected feedback in https://phabricator.wikimedia.org/T178591 (I don't know how to link it here as a subtask, I never did such)
Oct 19 2017
There is page_is_redirect_latest, I imagine it could be very useful to also have a field to which page id or page title the redirect goes. For example for combining pageview counts. Not that Wikistats 1 has such, but still..
There are columns event_user_is_bot_by_name and user_is_bot_by_name, but not event_user_is_bot or user_is_bot. Wouldn't that make sense to have those as well?
Question: with deleted revisions still somewhere in the database, as column revision_is_deleted suggests: should these be shielded from the public once this database is opened for public access?
Building on the previous comment (about page deletions):
I see revision_is_deleted, but how about page_is_deleted?
Oct 13 2017
@Milimetric thanks for caring!
Oct 12 2017
On some places I see K/M/B for thousand/million/billion.
On other places k/m/b. Maybe make a general formatting routine for this (preferedly language sensitive).
Oct 11 2017
I made a few more edits last night.
Oct 9 2017
bash file datamaps_views.sh has been migrated to stat1005, so monthly updates to WiViVi can now be generated
script datamaps_views.sh, for updating WiViVi data, has been adapted to stat1005
viz. now shows data for Sep 2017
https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html
ad-blocker uBlock blocks WiViVI (with 3 reports on this from a small audience, this better be solved before we publish the blog post)
Highest range of values for map "Wikipedia pageviews, percentage to language ...." (map in red-orange-yellow) has been split into two ranges, on user request. So range 50%-100% is now range 50%-80% and range 80%-100%
Sep 22 2017
Major production jobs + visualisations developed by Erik Zachte which are still in use
Sep 15 2017
Data have been relayed to Tony Sep 7.
Thanks @Aklapper for quick response.
Sep 14 2017
Please do. Tomorrow any time till 12 AM PDT works for me. Preferably a bit earlier.
Sep 12 2017
Ah I forgot to close this task. Scripts are running again.
Sep 11 2017
@Nuria Ah I missed these. Sorry about that. A good example of why restructuring my mailbox was dearly needed, so I finally fine-tuned Gmail filters this weekend. So I could start vetting later this week. Shall we do a hangout?
Sep 10 2017
USE wmf ;
SELECT s.*, count(*) AS count FROM (SELECT access_method, geocoded_data['country'] country, day FROM webrequest WHERE year=2017 AND month=9 AND day < 9 AND uri_host LIKE "%meta%" AND agent_type = "user" AND referer LIKE "%wikipedia%" AND uri_query LIKE "%BannerLoader%" AND uri_query LIKE "%wlm_2017%" ) s GROUP BY s.country, s.access_method, s.day ORDER BY s.country, s.access_method, s.day LIMIT 1000000 ;
Sep 4 2017
As for data that Wikistats can supply: (2/2)
As for data that Wikistats can supply: (1/2)
After a long outage (partly caused by unexpected server migration) I revived Wikistats scripts and parsed dumps in the last 10 days. So I can partially answer the question.
Aug 21 2017
that's fine with me, thanks
right now I am updating all Wikistats scripts to make them more uniform, better organized, a bit more documented, with better logging, with better backups
in preparation for productification, which may require further updates, but at least all scripts will then be more intelligible (for me as well)
I'll make a note to bring this up again end of calendar year
Aug 7 2017
from Facebook:
Several readers report issues with ad blocker uBlock. Needs to be deactivated to see viz.
Aug 3 2017
Hyvä Suomi! Most readers for any Wikipedia per capita, if I read correctly. It would have been nice to confirm by reading in the table, but the numbers displayed in the table were different from what was displayed in the choropleth map.
(me: Great point, thanks. I need to look into that.)