Page MenuHomePhabricator

MGerlach
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Sep 9 2019, 9:50 AM (9 w, 5 d)
Availability
Available
LDAP User
MGerlach
MediaWiki User
MGerlach (WMF) [ Global Accounts ]

Recent Activity

Wed, Nov 13

MGerlach added a comment to T235445: Schedule regular office hours for Wikimedia-Research.

Latest version below [1]
Todo: Discuss with analytics whether to do a joint office hours (see Leila's mail from 2019-11-09 )

Wed, Nov 13, 2:53 PM · Research

Mon, Nov 11

MGerlach added a comment to T234188: Taxonomy of new user reading patterns.

*Replicate analysis for 6 different wikis : en, de, fr, ar, cs, ko metawiki
*Added summary of main findings metawiki

Mon, Nov 11, 3:53 PM · Analytics, Research

Thu, Nov 7

MGerlach added a comment to T235445: Schedule regular office hours for Wikimedia-Research.

@leila
Based on your feedback, I did an iteration on the announcement.
Channels to distribute?

  • mailing lists: wiki-research-l, wikimedia-l (?), foundation-optional
  • WikiResearch on twitter
  • ...?
Thu, Nov 7, 3:30 PM · Research

Wed, Nov 6

MGerlach reopened T234893: Understanding the effect of talk-page interactions as "Open".
Wed, Nov 6, 4:42 PM · Research
MGerlach reopened T234893: Understanding the effect of talk-page interactions , a subtask of T229259: Martin: System and Programs onboarding, as Open.
Wed, Nov 6, 4:42 PM · Research
MGerlach closed T234893: Understanding the effect of talk-page interactions , a subtask of T229259: Martin: System and Programs onboarding, as Resolved.
Wed, Nov 6, 4:42 PM · Research
MGerlach closed T234893: Understanding the effect of talk-page interactions as Resolved.

Finally getting around to do some exploratory analysis.
I look at the following question:
Does a user, who interacts on a talk-page (via an edit), also contributes more edits to article-pages?
In short: For users with few edits, any additional interaction on a talk-page translates into a disproportionally large increase in the number of edits to article-pages. This suggests a crucial role of talk-page interactions for activity (and perhaps even productivity) on article-pages.

Wed, Nov 6, 4:42 PM · Research

Mon, Nov 4

MGerlach added a comment to T234188: Taxonomy of new user reading patterns.

added summary of results and approach on meta: https://meta.wikimedia.org/wiki/Research:New_user_reading_patterns

Mon, Nov 4, 6:05 PM · Analytics, Research

Thu, Oct 24

MGerlach added a comment to T234188: Taxonomy of new user reading patterns.

For new users, almost 2/3 use the desktop version.
In contrast, regular reading sessions preferentially take place via the mobile-web version.

Thu, Oct 24, 1:05 PM · Analytics, Research

Wed, Oct 23

MGerlach added a comment to T235445: Schedule regular office hours for Wikimedia-Research.

--What is the office hour for?--

  • Enable better communication with wikimedia community around research on wikimedia projects
    • direct point of contact with members of research team
    • lower barrier for interaction
    • centralized, open, and archived discussion
  • We welcome research-related questions from anyone, researchers and participants in the Wikimedia movement alike, including volunteers, developers, affiliates, and beyond.
Wed, Oct 23, 1:03 PM · Research

Tue, Oct 22

ppelberg awarded T234893: Understanding the effect of talk-page interactions a Insectivore token.
Tue, Oct 22, 10:37 PM · Research

Mon, Oct 21

MGerlach added a comment to T234188: Taxonomy of new user reading patterns.

Aim: Query the reading sessions of users that did not create a new account (i.e. those users which did not login at any point during the time-window of focus).
Obviously, the number of these users is much larger. Therefore, we want to subsample a subset of the same (or at least comparable size as the new-user data).

Mon, Oct 21, 5:03 PM · Analytics, Research

Oct 16 2019

MGerlach added a comment to T234188: Taxonomy of new user reading patterns.

That is fantastic @JAllemandou
I was suspecting something along these lines but it was not sure where/how to track those changes.
Should be possible to fix now.
Thanks a lot.

Oct 16 2019, 4:06 PM · Analytics, Research
MGerlach awarded T234188: Taxonomy of new user reading patterns a Yellow Medal token.
Oct 16 2019, 4:02 PM · Analytics, Research
MGerlach added a comment to T234188: Taxonomy of new user reading patterns.

When looking at the number of registration events over time, I find that there are between 100-200 events per hour.
However, at some point this number drops to exaclty 0 on 2019-07-23.
See plot here:

Oct 16 2019, 2:58 PM · Analytics, Research
MGerlach closed T234473: Requesting access to analytics cluster for Djellel Difallah as Resolved.

Sorry, didnt see it was already done. Closed

Oct 16 2019, 10:05 AM · Research, Operations, SRE-Access-Requests
MGerlach reopened T234473: Requesting access to analytics cluster for Djellel Difallah as "Open".

@DED who just joined the research team is having the same issue.
I think he needs to be added to LDAP-group.
@elukey - Could you help here?
Thanks

Oct 16 2019, 10:05 AM · Research, Operations, SRE-Access-Requests
MGerlach reopened T199736: Help accessing SWAP for research collaborators, a subtask of T198656: [CDP-3 1.1] A map of verifiability of information in Wikimedia projects, as Open.
Oct 16 2019, 9:47 AM · Research
MGerlach reopened T199736: Help accessing SWAP for research collaborators as "Open".
Oct 16 2019, 9:47 AM · Research
MGerlach updated subscribers of T199736: Help accessing SWAP for research collaborators.

@elukey.
@DED who just joined the research team is having the same issue (cluster access task for him T234473 ).
I guess he needs to be added to LDAP-group. Could you add him?
Thanks

Oct 16 2019, 9:45 AM · Research
MGerlach awarded T234188: Taxonomy of new user reading patterns a Like token.
Oct 16 2019, 7:32 AM · Analytics, Research

Oct 14 2019

MGerlach moved T235445: Schedule regular office hours for Wikimedia-Research from Staged to Services on the Research board.
Oct 14 2019, 5:01 PM · Research
MGerlach created T235445: Schedule regular office hours for Wikimedia-Research.
Oct 14 2019, 5:00 PM · Research

Oct 11 2019

MGerlach updated the task description for T234893: Understanding the effect of talk-page interactions .
Oct 11 2019, 1:42 PM · Research
MGerlach added a comment to T232707: Requesting access to analytics cluster for Martin Gerlach.

That solved it. Thanks.

Oct 11 2019, 11:57 AM · Analytics, Operations, SRE-Access-Requests
MGerlach added a comment to T232707: Requesting access to analytics cluster for Martin Gerlach.

@MoritzMuehlenhoff opening this again since I cannot access the cluster anymore, e.g. via 'ssh mgerlach@stat1007.eqiad.wmnet'
This happended after I reinstalled ubuntu (and everything else) on my wmf-laptop. I kept all the ssh-config files and keys which worked before (all content from the .ssh-folder).

Oct 11 2019, 7:05 AM · Analytics, Operations, SRE-Access-Requests

Oct 8 2019

MGerlach triaged T234893: Understanding the effect of talk-page interactions as High priority.
Oct 8 2019, 8:28 AM · Research
MGerlach moved T234893: Understanding the effect of talk-page interactions from Staged to In Progress on the Research board.
Oct 8 2019, 8:27 AM · Research
MGerlach added a subtask for T229259: Martin: System and Programs onboarding: T234893: Understanding the effect of talk-page interactions .
Oct 8 2019, 8:26 AM · Research
MGerlach added a parent task for T234893: Understanding the effect of talk-page interactions : T229259: Martin: System and Programs onboarding.
Oct 8 2019, 8:26 AM · Research
MGerlach created T234893: Understanding the effect of talk-page interactions .
Oct 8 2019, 8:26 AM · Research
MGerlach added a comment to T231688: Define October-December 2019 goals.

@leila Yes, this looks good to me. Happy to discuss the other item this week.

Oct 8 2019, 7:49 AM · Research-management, Research

Oct 7 2019

MGerlach updated subscribers of T234188: Taxonomy of new user reading patterns.

@JAllemandou @Ottomata
I would love to get your feedback on the spark-code I wrote to query the reading patterns of new users [notebook is attached].

Oct 7 2019, 5:30 PM · Analytics, Research

Oct 2 2019

MGerlach updated the task description for T234188: Taxonomy of new user reading patterns.
Oct 2 2019, 10:46 AM · Analytics, Research

Sep 30 2019

MGerlach added a comment to T232123: Parse wikidumps and extract redirect information for 1 small wiki, romanian .

The historical redirect table is extracted from wmf.mediawiki_wikitext_history
The above code extracts for each revision_id the redirect-command (say #redirect or #REDIRECT or #Weiterleitung) and the redirect-page (i.e. where it redirects to).
My aim was to write code that could join that information into the wmf.mediawiki_history table for a single snapshot of a given wikiproject (see the notebook).

Sep 30 2019, 3:29 PM · Research, Analytics
MGerlach updated the task description for T234188: Taxonomy of new user reading patterns.
Sep 30 2019, 10:53 AM · Analytics, Research
MGerlach added a comment to T234188: Taxonomy of new user reading patterns.

If we want to understand reading patterns, we want to use wmf.webrequests.

Sep 30 2019, 9:53 AM · Analytics, Research
MGerlach updated the task description for T234188: Taxonomy of new user reading patterns.
Sep 30 2019, 9:49 AM · Analytics, Research
MGerlach moved T234188: Taxonomy of new user reading patterns from Staged to In Progress on the Research board.
Sep 30 2019, 9:47 AM · Analytics, Research
MGerlach added a subtask for T229259: Martin: System and Programs onboarding: T234188: Taxonomy of new user reading patterns.
Sep 30 2019, 9:47 AM · Research
MGerlach added a parent task for T234188: Taxonomy of new user reading patterns: T229259: Martin: System and Programs onboarding.
Sep 30 2019, 9:47 AM · Analytics, Research
MGerlach created T234188: Taxonomy of new user reading patterns.
Sep 30 2019, 9:46 AM · Analytics, Research

Sep 24 2019

MGerlach added a comment to T232123: Parse wikidumps and extract redirect information for 1 small wiki, romanian .

Memory error persists

@JAllemandou
Main problem: memory error for large (and even not super large) wikis such as frwiki.
I implemented some of your suggestions from the discussion today with andrew

  • processing a single query
  • only keeping minimal amount of text (substrings of the redirect command and the redirect-page-title)
  • not saving as pandas, but simply applying the count() function to see how many results we get.

Attached is a new notebook (executed with '''pyspark - YARN (large)'''.

Sep 24 2019, 3:58 PM · Research, Analytics

Sep 23 2019

MGerlach added a comment to T219903: Keep research.wikipedia.org landing page updated.

@Isaac : add Martin to team members. If you explain to me, I can do that too. Thanks.

Sep 23 2019, 3:11 PM · Research
MGerlach added a comment to T232123: Parse wikidumps and extract redirect information for 1 small wiki, romanian .

Thanks for the feedback @JAllemandou

Sep 23 2019, 1:30 PM · Research, Analytics

Sep 20 2019

MGerlach added a comment to T232123: Parse wikidumps and extract redirect information for 1 small wiki, romanian .

@JAllemandou
I came up with a first solution on spark (see attached notebooks; I ran this on the notebook-server).
This creates a dataframe with all revision-entries that are identified as redirects based on the content (page_id, revision_id, redirect_page).
I tested on rowiki and it runs in no time.
I extract the redirect-aliases automatically, so in principle could be applied to any wiki.

Sep 20 2019, 2:45 PM · Research, Analytics
MGerlach added a member for Research: MGerlach.
Sep 20 2019, 9:40 AM

Sep 19 2019

MGerlach closed T232707: Requesting access to analytics cluster for Martin Gerlach as Resolved.

@MoritzMuehlenhoff Added separate key for Cloud VPS.

Sep 19 2019, 9:25 AM · Analytics, Operations, SRE-Access-Requests

Sep 18 2019

MGerlach closed T232707: Requesting access to analytics cluster for Martin Gerlach as Resolved.

@elukey thanks, works now. Closing this taks.

Sep 18 2019, 1:41 PM · Analytics, Operations, SRE-Access-Requests
MGerlach reopened T232707: Requesting access to analytics cluster for Martin Gerlach as "Open".

Thanks,
I can ssh into production servers.
However, I cannot access SWAP following this documentation [1]. It seems that I havent been added to the wmf-LDAP group (as requested above), according to this.
Could you add me such that I have SWAP-access? Sorry if I am missing something.

Sep 18 2019, 10:20 AM · Analytics, Operations, SRE-Access-Requests

Sep 17 2019

MGerlach added a comment to T232123: Parse wikidumps and extract redirect information for 1 small wiki, romanian .
  1. Language dependent Redirect Codes
Sep 17 2019, 1:57 PM · Research, Analytics

Sep 12 2019

JAllemandou awarded T232707: Requesting access to analytics cluster for Martin Gerlach a Stroopwafel token.
Sep 12 2019, 4:10 PM · Analytics, Operations, SRE-Access-Requests
MGerlach claimed T232123: Parse wikidumps and extract redirect information for 1 small wiki, romanian .

Martin will work on this project as part of his onboarding

Sep 12 2019, 9:03 AM · Research, Analytics
MGerlach created T232707: Requesting access to analytics cluster for Martin Gerlach.
Sep 12 2019, 8:45 AM · Analytics, Operations, SRE-Access-Requests