Page MenuHomePhabricator

Erik_Zachte (ezachte)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Jan 6 2016, 1:53 PM (436 w, 1 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Erik Zachte [ Global Accounts ]

Recent Activity

Mar 18 2020

Erik_Zachte placed T178891: German derivative of Wikistats report shows marked difference for new editors in Aug vs Sep up for grabs.

Removing myself, I'm no longer involved
In fact this may have been resolved, I'm not sure. I certainly engaged with Dschungelfan

Mar 18 2020, 6:22 PM · Data-Engineering, Analytics-Radar, Data-Engineering-Wikistats
Erik_Zachte placed T180118: X-axis is at odds with stated period in header of trend charts for 'total articles' for a wiki up for grabs.

removing myself, I'm no longer involved

Mar 18 2020, 6:19 PM · Analytics-Radar, Chinese-Sites, Data-Engineering-Wikistats

Jan 24 2020

Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

@Nuria, thanks for the link. I will look at more depth to the task lists later this weekend.

Jan 24 2020, 6:21 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)
Erik_Zachte updated subscribers of T238243: Archive /home/ezachte data on stat1007.

I started a page on Wikistats 1 on https://meta.wikimedia.org/wiki/User:Erik_Zachte/Wikistats%201. Other than earlier overview pages for Wikistats this one is focused on where do we stand with Wikistats 1 now, in the light of the migration to Wikistats 2. What in Wikistats 1 still works? (several crucial data streams). What has been disabled ? (some of it prematurely I would say). What is not inside the scope of earlier surveys, but would be a pity if it got lost all-together (some of the Viz's). I give credits to new developments, but also make some critical remarks at the end.

Jan 24 2020, 4:39 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)

Jan 9 2020

Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

@elukey thanks for continuing a constructive dialogue.

Jan 9 2020, 6:22 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)
Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

@Nuria are you saying a fix that might take an hour, if not less, is not done, because another issue might popup in the future? It's not that you're committing for eternity to uphold Wikistats 1. Maintaining the perl scripts has never been expected, I've always been open about these being maintenance unfriendly.

Jan 9 2020, 8:05 AM · Data-Platform-SRE (2024.03.04 - 2024.03.24)

Jan 5 2020

Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

We are leaning towards turning these jobs off because they've been broken for almost a year and nobody has complained so far.

Jan 5 2020, 5:24 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)

Dec 18 2019

Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

Second issue:

Dec 18 2019, 4:56 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)
Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

@Milimetric thanks for heads-up.

Dec 18 2019, 3:11 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)

Dec 17 2019

Erik_Zachte updated subscribers of T238243: Archive /home/ezachte data on stat1007.

It seems I added @Error inadvertently. Is that a bot? Or a playful nickname?

Dec 17 2019, 6:24 AM · Data-Platform-SRE (2024.03.04 - 2024.03.24)

Dec 16 2019

Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

high level folders > 1GB:
A 270G ../wikistats_data/dammit
B 203G ../wikistats_data/dumps
C 138G ../wikistats_backup/
D 120G ../wikistats_data/squids
E 2G ../wikistats_data/mediacounts

Dec 16 2019, 3:50 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)
Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

high level folders > 1GB:
A 270G ../wikistats_data/dammit
B 203G ../wikistats_data/dumps
C 138G ../wikistats_backup/
D 120G ../wikistats_data/squids
E 2G ../wikistats_data/mediacounts

Dec 16 2019, 3:28 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)

Dec 15 2019

Erik_Zachte updated subscribers of T238243: Archive /home/ezachte data on stat1007.

So I looked first into the cron processes that are still enabled on home/ezachte. There are two.

Dec 15 2019, 12:35 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)

Dec 13 2019

Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

@elukey Hi! I'll get to this in coming days. Thanks for your patience.

Dec 13 2019, 5:44 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)

Nov 15 2019

Erik_Zachte added a comment to T238243: Archive /home/ezachte data on stat1007.

I have just reapplied for server access with John Bond
I was supposed to add the new public key myself at https://phabricator.wikimedia.org/T215790, but I can't even view that ticket as Erik_Zachte (ezachte).
Once I'm back online I will review the folders mentioned here, and comment.

Nov 15 2019, 11:22 AM · Data-Platform-SRE (2024.03.04 - 2024.03.24)

Oct 23 2019

Erik_Zachte added a comment to T131280: Make aggregate data on editors per country per wiki publicly available.

@Milimetric thanks for bringing this to completion

Oct 23 2019, 4:00 AM · Product-Analytics, Analytics-Kanban

Aug 21 2019

Erik_Zachte added a comment to T131280: Make aggregate data on editors per country per wiki publicly available.

@Yair_rand you examples show ingenuity, yet they also seem somewhat contrived. Suppose some malicious geeky and rather obsessed user would go to such length to 'exploit a weakness' in the privacy protection, and they learn about the country of a wikimedian who doesn't want to reveal themselves, how much damage could be done? Say China, with its enormous resources finds out that 16 active editors on a small wiki all edit from Taiwan. How much would they have learned then? Taiwan has 23+ million population. That geeky detective could probably also learn from text analysis (English isn't spoken the same in different countries), from analysis of edit times (where waking hours is a proxy for time zone), from edits being spaced wider apart from countries with low bandwidth. I admit all contrived examples as well, and only effective in combination, and in the hands of a very geeky and obsessed malicious user with infinite resources. It's probably easier for such a geek to infiltrate our security by social engineering, placing a mole, and what have you.

Aug 21 2019, 10:17 AM · Product-Analytics, Analytics-Kanban

May 6 2019

Erik_Zachte added a comment to T222655: Formalize the concept of countable wikis.

IIRC the criterion was rather which wikis to exclude explicitly. Those were wikis which are not publicly editable or nor even readable. Example that comes to mind is the board wiki.

May 6 2019, 9:31 PM · Movement-Insights, Movement-Metrics

Apr 2 2019

Erik_Zachte added a comment to T176478: Renovation of Wikistats production jobs.

That's OK. Cheers

Apr 2 2019, 2:09 PM · Analytics-Radar, Research, Data-Engineering-Wikistats

Dec 19 2017

Erik_Zachte added a comment to T179530: Wikistats Bug: Menu to select projects doesn't work (sometimes?).

Ah, sorry I mixed up two tasks then. Just copied my comments to https://phabricator.wikimedia.org/T182960

Dec 19 2017, 4:54 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte updated subscribers of T182960: When searching for a project language, display a full list of languages.

For the record, I'm copying from a mail exchange with @Milimetric :

Dec 19 2017, 4:53 PM · Analytics, Data-Engineering-Wikistats
Erik_Zachte updated subscribers of T179530: Wikistats Bug: Menu to select projects doesn't work (sometimes?).

For the record, I'm copying from a mail exchange with @Milimetric :

Dec 19 2017, 4:30 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte added a comment to T183184: Make the colors used the line charts in Wikistats 2 more easy to recognize..

My preference would be the second option: Just when a color is lighter than #999999 add a thin black outline.
But using lighter colors all over would help as well.

Dec 19 2017, 4:18 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats, Analytics
Erik_Zachte added a comment to T183185: Display of radio buttons in Wikistats 2 is somewhat confusing.

No, I'm fine with second bullet point, I just meant to say 'there is too much happening when I click the radiobutton'. But subdivide on the chosen metric makes sense.

Dec 19 2017, 4:15 PM · Analytics-Kanban, Analytics, Patch-For-Review, Data-Engineering-Wikistats

Dec 18 2017

Erik_Zachte created T183192: Please add download option 'as csv file' to Wikistats 2.
Dec 18 2017, 9:18 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte added a comment to T183181: Consistently preserve settings when a user switches to a new metric (especially on the same page)..

@Catrope sorry, I added subscribers and project

Dec 18 2017, 9:08 PM · Analytics-Kanban, Data-Engineering-Wikistats, Analytics
Erik_Zachte assigned T183181: Consistently preserve settings when a user switches to a new metric (especially on the same page). to Milimetric.
Dec 18 2017, 9:08 PM · Analytics-Kanban, Data-Engineering-Wikistats, Analytics
Erik_Zachte assigned T183183: Present Wikistats 2 charts for the period selected by the user. to Milimetric.
Dec 18 2017, 9:06 PM · Analytics-Kanban, Analytics, Data-Engineering-Wikistats
Erik_Zachte assigned T183184: Make the colors used the line charts in Wikistats 2 more easy to recognize. to Milimetric.
Dec 18 2017, 9:06 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats, Analytics
Erik_Zachte assigned T183185: Display of radio buttons in Wikistats 2 is somewhat confusing to Milimetric.
Dec 18 2017, 9:05 PM · Analytics-Kanban, Analytics, Patch-For-Review, Data-Engineering-Wikistats
Erik_Zachte created T183188: Link to 'more info' doesn't always work.
Dec 18 2017, 9:04 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte created T183185: Display of radio buttons in Wikistats 2 is somewhat confusing.
Dec 18 2017, 8:58 PM · Analytics-Kanban, Analytics, Patch-For-Review, Data-Engineering-Wikistats
Erik_Zachte created T183184: Make the colors used the line charts in Wikistats 2 more easy to recognize..
Dec 18 2017, 8:49 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats, Analytics
Erik_Zachte created T183183: Present Wikistats 2 charts for the period selected by the user..
Dec 18 2017, 8:39 PM · Analytics-Kanban, Analytics, Data-Engineering-Wikistats
Erik_Zachte created T183181: Consistently preserve settings when a user switches to a new metric (especially on the same page)..
Dec 18 2017, 8:28 PM · Analytics-Kanban, Data-Engineering-Wikistats, Analytics
Erik_Zachte created T183180: roadmap of migration to Wikistats 2.
Dec 18 2017, 8:12 PM · Data-Engineering-Wikistats, Analytics

Dec 13 2017

Erik_Zachte added a comment to T181508: Privacy pageview threshold for map report.
  • For a Wikipedia to be shown, it has to have a minimum of 0.1% of all traffic in pageviews.
Dec 13 2017, 9:38 PM · Data-Engineering-Wikistats, Analytics-Kanban
Erik_Zachte added a comment to T182001: Add link to new wikistats 2.0 to wikistats 1.0 pages.

Reports with new announcement are being generated. The English Wikipedia dump is still being parsed so I will update reports again later this week.

Dec 13 2017, 4:59 PM · Analytics-Kanban, Data-Engineering-Wikistats

Dec 12 2017

Erik_Zachte added a comment to T182001: Add link to new wikistats 2.0 to wikistats 1.0 pages.

https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats briefly mentions the data lake and edits, but 90% of the page is about pageviews and traffic. Those are different data streams (no relation with dumps), different reports.

Dec 12 2017, 10:59 PM · Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte added a comment to T182001: Add link to new wikistats 2.0 to wikistats 1.0 pages.

Yes I can update the text tomorrow.

Dec 12 2017, 10:51 PM · Analytics-Kanban, Data-Engineering-Wikistats

Dec 6 2017

Erik_Zachte added a comment to T182001: Add link to new wikistats 2.0 to wikistats 1.0 pages.

Here is a first draft: https://stats.wikimedia.org/wikinews/EN/draft/TablesWikipediaES.htm Please comment.

Dec 6 2017, 11:33 PM · Analytics-Kanban, Data-Engineering-Wikistats

Dec 5 2017

Erik_Zachte closed T174946: R execution on stat1005 -> 'stack smashing error' as Resolved.

I see recent R charts again! It was an elusive bug, hard to replicate.

Dec 5 2017, 5:38 PM · Analytics

Nov 28 2017

Erik_Zachte added a comment to T174946: R execution on stat1005 -> 'stack smashing error'.

@mpopov Thanks, I totally rely on Andrew for this, I don't have root access, which is fine to me, so I can't mess up ;-) And server migrations are rare anyway.

Nov 28 2017, 6:52 PM · Analytics
Erik_Zachte added a comment to T181508: Privacy pageview threshold for map report.

@fdans WiViVi doesn't use a threshold because the aggregation level is that high that individuals don't stand out from the crowd, except for fringe cases (and even then ... how serious is that ?)
WiViVi reports monthly request counts, broken down by originating country and target wiki.
Yes one person can account for all page requests for a very small wiki from a very small country (fringe case).
I have been working on the premise this is not a privacy hazard.

Nov 28 2017, 6:14 PM · Data-Engineering-Wikistats, Analytics-Kanban
Erik_Zachte added a comment to T176478: Renovation of Wikistats production jobs.

script stat1005:/home/ezachte/wikistats/dumps/bash/collect_edits.sh has been adapted to stat1005

Nov 28 2017, 5:42 PM · Analytics-Radar, Research, Data-Engineering-Wikistats
Erik_Zachte added a comment to T174946: R execution on stat1005 -> 'stack smashing error'.

Thanks, will do.

Nov 28 2017, 4:29 PM · Analytics
Erik_Zachte added a comment to T174946: R execution on stat1005 -> 'stack smashing error'.

Hmm, I could migrate all of Wikistats to stat1004 (prefer to keep all one machine, also charts are part of overall Wikistats job.
Is stat1004 machine equivalent to stat1005?
Does 'still Jessie' imply that stat1004 will be upgraded at some point and the same issue will reoccur?

Nov 28 2017, 12:16 AM · Analytics

Nov 27 2017

Erik_Zachte updated the task description for T178084: Alpha release: Wikistats 2 UI feedback From Erik Z.
Nov 27 2017, 5:51 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte added a comment to T176478: Renovation of Wikistats production jobs.

script stat1005:/home/ezachte/wikistats/dumps/bash/extract_dump.sh has been adapted to stat1005

Nov 27 2017, 3:39 PM · Analytics-Radar, Research, Data-Engineering-Wikistats

Nov 19 2017

Erik_Zachte raised the priority of T174946: R execution on stat1005 -> 'stack smashing error' from Low to Medium.

@fdans Yes it's still happening.

Nov 19 2017, 1:34 PM · Analytics

Nov 13 2017

Nemo_bis awarded T178084: Alpha release: Wikistats 2 UI feedback From Erik Z a Doubloon token.
Nov 13 2017, 4:02 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats

Nov 9 2017

Erik_Zachte added a comment to T180118: X-axis is at odds with stated period in header of trend charts for 'total articles' for a wiki.

That looks odd, indeed, also for other languages.

Nov 9 2017, 11:34 AM · Analytics-Radar, Chinese-Sites, Data-Engineering-Wikistats
Erik_Zachte created T180118: X-axis is at odds with stated period in header of trend charts for 'total articles' for a wiki.
Nov 9 2017, 11:31 AM · Analytics-Radar, Chinese-Sites, Data-Engineering-Wikistats

Oct 24 2017

Erik_Zachte added a comment to T178891: German derivative of Wikistats report shows marked difference for new editors in Aug vs Sep.

Yes the basic principle has changed a bit, albeit longer ago, start 2017.

Oct 24 2017, 1:08 PM · Data-Engineering, Analytics-Radar, Data-Engineering-Wikistats
Erik_Zachte created T178891: German derivative of Wikistats report shows marked difference for new editors in Aug vs Sep.
Oct 24 2017, 12:17 PM · Data-Engineering, Analytics-Radar, Data-Engineering-Wikistats

Oct 23 2017

Erik_Zachte added a comment to T178591: Feedback on hive table mediawiki_history by Erik Z.

Another comparison between Wikistats 1 and 2: this time edit counts per user.
I compared edit counts for users with 5000+ edits on af.wikipedia.org, namespace 0.

Oct 23 2017, 5:01 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte added a comment to T153923: vet edit data on the data lake .

I collected feedback in https://phabricator.wikimedia.org/T178591 (I don't know how to link it here as a subtask, I never did such)

Oct 23 2017, 4:35 PM · Analytics-Radar

Oct 19 2017

Erik_Zachte added a comment to T178591: Feedback on hive table mediawiki_history by Erik Z.

There is page_is_redirect_latest, I imagine it could be very useful to also have a field to which page id or page title the redirect goes. For example for combining pageview counts. Not that Wikistats 1 has such, but still..

Oct 19 2017, 5:30 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte added a comment to T178591: Feedback on hive table mediawiki_history by Erik Z.

There are columns event_user_is_bot_by_name and user_is_bot_by_name, but not event_user_is_bot or user_is_bot. Wouldn't that make sense to have those as well?

Oct 19 2017, 5:27 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte added a comment to T178591: Feedback on hive table mediawiki_history by Erik Z.

Question: with deleted revisions still somewhere in the database, as column revision_is_deleted suggests: should these be shielded from the public once this database is opened for public access?

Oct 19 2017, 5:20 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte added a comment to T178591: Feedback on hive table mediawiki_history by Erik Z.

Building on the previous comment (about page deletions):

Oct 19 2017, 5:10 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte added a comment to T178591: Feedback on hive table mediawiki_history by Erik Z.

I see revision_is_deleted, but how about page_is_deleted?

Oct 19 2017, 4:44 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte created T178591: Feedback on hive table mediawiki_history by Erik Z.
Oct 19 2017, 4:38 PM · Data-Engineering-Wikistats, Analytics

Oct 13 2017

Erik_Zachte added a comment to T178084: Alpha release: Wikistats 2 UI feedback From Erik Z.

@Milimetric thanks for caring!

Oct 13 2017, 4:29 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats

Oct 12 2017

Erik_Zachte added a comment to T178084: Alpha release: Wikistats 2 UI feedback From Erik Z.

no-zero-line.PNG (277×326 px, 13 KB)
No nitpicking this time. An y-axis that doesn't start at zero, in combination with no numbers along the y-axis makes it anybody's guess what the chart tells. And most people will guess wrong.

Oct 12 2017, 4:58 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte added a comment to T178084: Alpha release: Wikistats 2 UI feedback From Erik Z.

On some places I see K/M/B for thousand/million/billion.
On other places k/m/b. Maybe make a general formatting routine for this (preferedly language sensitive).

Oct 12 2017, 4:41 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte added a comment to T178084: Alpha release: Wikistats 2 UI feedback From Erik Z.

numbers_alignment.PNG (338×145 px, 3 KB)
In general: can we have numbers right-align? E.g. https://stats.wikimedia.org/v2/#/am.wiktionary.org/reading/total-pageviews but also elsewhere.

Oct 12 2017, 4:37 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte renamed T178084: Alpha release: Wikistats 2 UI feedback From Erik Z from Alpha release: Wikistats 2 UI feedback to Wikistats 2 UI feedback.
Oct 12 2017, 4:31 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte assigned T178084: Alpha release: Wikistats 2 UI feedback From Erik Z to Nuria.
Oct 12 2017, 4:27 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte updated the task description for T178084: Alpha release: Wikistats 2 UI feedback From Erik Z.
Oct 12 2017, 4:25 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats
Erik_Zachte created T178084: Alpha release: Wikistats 2 UI feedback From Erik Z.
Oct 12 2017, 4:23 PM · Patch-For-Review, Analytics-Kanban, Data-Engineering-Wikistats

Oct 11 2017

Erik_Zachte added a comment to T169530: Dataviz blog post .

I made a few more edits last night.

Oct 11 2017, 2:39 PM · Wikimedia-Blog-Content

Oct 9 2017

Erik_Zachte added a comment to T144607: Visualization of Wikimedia traffic by language, country and region.

bash file datamaps_views.sh has been migrated to stat1005, so monthly updates to WiViVi can now be generated

Oct 9 2017, 5:08 PM · Research-Archive, Epic, Data-release
Erik_Zachte added a comment to T176478: Renovation of Wikistats production jobs.

script datamaps_views.sh, for updating WiViVi data, has been adapted to stat1005
viz. now shows data for Sep 2017
https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html

Oct 9 2017, 5:07 PM · Analytics-Radar, Research, Data-Engineering-Wikistats
Erik_Zachte added a comment to T144607: Visualization of Wikimedia traffic by language, country and region.

ad-blocker uBlock blocks WiViVI (with 3 reports on this from a small audience, this better be solved before we publish the blog post)

Oct 9 2017, 4:24 PM · Research-Archive, Epic, Data-release
Erik_Zachte added a comment to T144607: Visualization of Wikimedia traffic by language, country and region.

Highest range of values for map "Wikipedia pageviews, percentage to language ...." (map in red-orange-yellow) has been split into two ranges, on user request. So range 50%-100% is now range 50%-80% and range 80%-100%

Oct 9 2017, 4:16 PM · Research-Archive, Epic, Data-release

Sep 22 2017

Erik_Zachte added a comment to T176478: Renovation of Wikistats production jobs.

Major production jobs + visualisations developed by Erik Zachte which are still in use

Sep 22 2017, 3:02 PM · Analytics-Radar, Research, Data-Engineering-Wikistats
Erik_Zachte created T176478: Renovation of Wikistats production jobs.
Sep 22 2017, 12:44 PM · Analytics-Radar, Research, Data-Engineering-Wikistats

Sep 15 2017

Erik_Zachte closed T174950: Provide yearly update of stats for audit report as Resolved.

Data have been relayed to Tony Sep 7.
Thanks @Aklapper for quick response.

Sep 15 2017, 2:40 PM · Data-Engineering-Wikistats, Analytics

Sep 14 2017

Erik_Zachte added a comment to T153923: vet edit data on the data lake .

Please do. Tomorrow any time till 12 AM PDT works for me. Preferably a bit earlier.

Sep 14 2017, 9:05 PM · Analytics-Radar

Sep 12 2017

Erik_Zachte closed T172032: Pagecounts-ez not generating as Resolved.

Ah I forgot to close this task. Scripts are running again.

Sep 12 2017, 10:29 AM · Analytics-Radar
Erik_Zachte closed T172032: Pagecounts-ez not generating, a subtask of T152712: Replacement of stat1002 and stat1003, as Resolved.
Sep 12 2017, 10:29 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Sep 11 2017

Erik_Zachte added a comment to T153923: vet edit data on the data lake .

@Nuria Ah I missed these. Sorry about that. A good example of why restructuring my mailbox was dearly needed, so I finally fine-tuned Gmail filters this weekend. So I could start vetting later this week. Shall we do a hangout?

Sep 11 2017, 8:34 PM · Analytics-Radar

Sep 10 2017

Erik_Zachte added a comment to T175493: collect banner impressions for WLM2017 and see if counts are lower than expected.

USE wmf ;

SELECT
    s.*,
    count(*) AS count
FROM
  (SELECT
       access_method,
       geocoded_data['country'] country,
       day
   FROM
       webrequest
   WHERE
           year=2017
       AND month=9
       AND day < 9
       AND uri_host LIKE "%meta%"
       AND agent_type = "user"
       AND referer LIKE "%wikipedia%"
       AND uri_query LIKE "%BannerLoader%"
       AND uri_query LIKE "%wlm_2017%"
     ) s
GROUP BY
    s.country,
    s.access_method,
    s.day
  ORDER BY
    s.country,
    s.access_method,
    s.day
LIMIT 1000000 ;
Sep 10 2017, 7:42 PM · Wiki-Loves-Monuments
Erik_Zachte added a comment to T175493: collect banner impressions for WLM2017 and see if counts are lower than expected.

Introduction:
Not knowing of P4040 I first ran code to count image impressions of the thumbnail in the banner.
Later when Leila pointed to P4040 I ran those queries (again for test case South Korea) and compared with my thumb counts: they matched quite well.

Sep 10 2017, 7:29 PM · Wiki-Loves-Monuments
Erik_Zachte created T175493: collect banner impressions for WLM2017 and see if counts are lower than expected.
Sep 10 2017, 6:30 PM · Wiki-Loves-Monuments

Sep 4 2017

Erik_Zachte added a comment to T174950: Provide yearly update of stats for audit report.

As for data that Wikistats can supply: (2/2)

Sep 4 2017, 6:09 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte added a comment to T174950: Provide yearly update of stats for audit report.

As for data that Wikistats can supply: (1/2)

Sep 4 2017, 3:03 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte added a comment to T174950: Provide yearly update of stats for audit report.

After a long outage (partly caused by unexpected server migration) I revived Wikistats scripts and parsed dumps in the last 10 days. So I can partially answer the question.

Sep 4 2017, 2:49 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte created T174950: Provide yearly update of stats for audit report.
Sep 4 2017, 2:44 PM · Data-Engineering-Wikistats, Analytics
Erik_Zachte created T174946: R execution on stat1005 -> 'stack smashing error'.
Sep 4 2017, 2:32 PM · Analytics

Aug 21 2017

Erik_Zachte closed T173724: yearly folder in hdfs for mediacounts does not have write access for Wikistats by default (patched manually for earlier years) as Resolved.
Aug 21 2017, 3:59 PM · Analytics, Data-Engineering-Wikistats
Erik_Zachte added a comment to T173724: yearly folder in hdfs for mediacounts does not have write access for Wikistats by default (patched manually for earlier years).

that's fine with me, thanks
right now I am updating all Wikistats scripts to make them more uniform, better organized, a bit more documented, with better logging, with better backups
in preparation for productification, which may require further updates, but at least all scripts will then be more intelligible (for me as well)
I'll make a note to bring this up again end of calendar year

Aug 21 2017, 3:53 PM · Analytics, Data-Engineering-Wikistats
Erik_Zachte updated the task description for T173724: yearly folder in hdfs for mediacounts does not have write access for Wikistats by default (patched manually for earlier years).
Aug 21 2017, 2:08 PM · Analytics, Data-Engineering-Wikistats
Erik_Zachte created T173724: yearly folder in hdfs for mediacounts does not have write access for Wikistats by default (patched manually for earlier years).
Aug 21 2017, 2:07 PM · Analytics, Data-Engineering-Wikistats

Aug 7 2017

Erik_Zachte added a comment to T144607: Visualization of Wikimedia traffic by language, country and region.

from Facebook:

Aug 7 2017, 9:10 AM · Research-Archive, Epic, Data-release
Erik_Zachte added a comment to T144607: Visualization of Wikimedia traffic by language, country and region.

Several readers report issues with ad blocker uBlock. Needs to be deactivated to see viz.

Aug 7 2017, 9:01 AM · Research-Archive, Epic, Data-release

Aug 3 2017

Erik_Zachte added a comment to T144607: Visualization of Wikimedia traffic by language, country and region.

Hyvä Suomi! Most readers for any Wikipedia per capita, if I read correctly. It would have been nice to confirm by reading in the table, but the numbers displayed in the table were different from what was displayed in the choropleth map.
(me: Great point, thanks. I need to look into that.)

Aug 3 2017, 7:32 AM · Research-Archive, Epic, Data-release

Aug 2 2017

Erik_Zachte renamed T172304: WiViVi Broken in Firefox 50 (Linux only) from WiViVi Broken in Firefox 50 to WiViVi Broken in Firefox 50 (Linux only).
Aug 2 2017, 6:32 PM · Data-Engineering, Analytics-Radar, Data-Engineering-Wikistats
Erik_Zachte added a comment to T144607: Visualization of Wikimedia traffic by language, country and region.

Viz published today at https://stats.wikimedia.org/wikimedia/animations/pageviews/wivivi.html

Aug 2 2017, 4:34 PM · Research-Archive, Epic, Data-release