Wed, Nov 13
Latest version below 
Todo: Discuss with analytics whether to do a joint office hours (see Leila's mail from 2019-11-09 )
Mon, Nov 11
Thu, Nov 7
Based on your feedback, I did an iteration on the announcement.
Channels to distribute?
- mailing lists: wiki-research-l, wikimedia-l (?), foundation-optional
- WikiResearch on twitter
Wed, Nov 6
Finally getting around to do some exploratory analysis.
I look at the following question:
Does a user, who interacts on a talk-page (via an edit), also contributes more edits to article-pages?
In short: For users with few edits, any additional interaction on a talk-page translates into a disproportionally large increase in the number of edits to article-pages. This suggests a crucial role of talk-page interactions for activity (and perhaps even productivity) on article-pages.
Mon, Nov 4
added summary of results and approach on meta: https://meta.wikimedia.org/wiki/Research:New_user_reading_patterns
Thu, Oct 24
For new users, almost 2/3 use the desktop version.
In contrast, regular reading sessions preferentially take place via the mobile-web version.
Wed, Oct 23
--What is the office hour for?--
- Enable better communication with wikimedia community around research on wikimedia projects
- direct point of contact with members of research team
- lower barrier for interaction
- centralized, open, and archived discussion
- We welcome research-related questions from anyone, researchers and participants in the Wikimedia movement alike, including volunteers, developers, affiliates, and beyond.
Tue, Oct 22
Mon, Oct 21
Aim: Query the reading sessions of users that did not create a new account (i.e. those users which did not login at any point during the time-window of focus).
Obviously, the number of these users is much larger. Therefore, we want to subsample a subset of the same (or at least comparable size as the new-user data).
Oct 16 2019
That is fantastic @JAllemandou
I was suspecting something along these lines but it was not sure where/how to track those changes.
Should be possible to fix now.
Thanks a lot.
When looking at the number of registration events over time, I find that there are between 100-200 events per hour.
However, at some point this number drops to exaclty 0 on 2019-07-23.
See plot here:
Sorry, didnt see it was already done. Closed
Oct 14 2019
Oct 11 2019
That solved it. Thanks.
@MoritzMuehlenhoff opening this again since I cannot access the cluster anymore, e.g. via 'ssh firstname.lastname@example.org'
This happended after I reinstalled ubuntu (and everything else) on my wmf-laptop. I kept all the ssh-config files and keys which worked before (all content from the .ssh-folder).
Oct 8 2019
@leila Yes, this looks good to me. Happy to discuss the other item this week.
Oct 7 2019
Oct 2 2019
Sep 30 2019
The historical redirect table is extracted from wmf.mediawiki_wikitext_history
The above code extracts for each revision_id the redirect-command (say #redirect or #REDIRECT or #Weiterleitung) and the redirect-page (i.e. where it redirects to).
My aim was to write code that could join that information into the wmf.mediawiki_history table for a single snapshot of a given wikiproject (see the notebook).
If we want to understand reading patterns, we want to use wmf.webrequests.
Sep 24 2019
Memory error persists
Main problem: memory error for large (and even not super large) wikis such as frwiki.
I implemented some of your suggestions from the discussion today with andrew
- processing a single query
- only keeping minimal amount of text (substrings of the redirect command and the redirect-page-title)
- not saving as pandas, but simply applying the count() function to see how many results we get.
Attached is a new notebook (executed with '''pyspark - YARN (large)'''.
Sep 23 2019
@Isaac : add Martin to team members. If you explain to me, I can do that too. Thanks.
Thanks for the feedback @JAllemandou
Sep 20 2019
I came up with a first solution on spark (see attached notebooks; I ran this on the notebook-server).
This creates a dataframe with all revision-entries that are identified as redirects based on the content (page_id, revision_id, redirect_page).
I tested on rowiki and it runs in no time.
I extract the redirect-aliases automatically, so in principle could be applied to any wiki.
Sep 19 2019
@MoritzMuehlenhoff Added separate key for Cloud VPS.
Sep 18 2019
@elukey thanks, works now. Closing this taks.
I can ssh into production servers.
However, I cannot access SWAP following this documentation . It seems that I havent been added to the wmf-LDAP group (as requested above), according to this.
Could you add me such that I have SWAP-access? Sorry if I am missing something.
Sep 17 2019
- Language dependent Redirect Codes
Sep 12 2019
Martin will work on this project as part of his onboarding