Fri, Jan 17
The first joint office hours will take place next week.
In order to have as large an audience as possible, we will send out a reminder to announce the office hours via mailing list, wikiresearch-twitter, etc..
Let me know if you would like to add anything/remove/edit anything.
Office hours scheduled. Therefore I close this task. Ongoing efforts will be tracked in the follow-up task.
Mon, Jan 13
weekly update: scanned literature on measurement of gender gaps (link). got in touch with @GoranSMilovanovic about wikidata-concept monitor and how it could be used for measuring content-gaps more generally.
Weekly update: Finished Research Brief sketching project.
Fri, Jan 3
Status Update 2019-12-03
Dec 18 2019
added to WMF staff calendar (via janna layton)
Dec 16 2019
In the discussion with Peter on finalizing from 2019-12-11 we agreed on the following refinement:
also posted to wikimedia-l.
it is also visible in the calendar on wmf-labs.
posted on the following mailing lists
Dec 13 2019
Posted on discuss-space
Dec 9 2019
Created page on mediawiki .
Announcement will be sent out this week (after Analytics' December office hours on 2019-12-09 to avoid confusion).
Dec 6 2019
Nov 28 2019
I would also like to request access. username: mgerlach
Nov 21 2019
contacted Analytics about joint office hours: @Milimetric repsonse:
- he thinks it is a great idea to restart as not many people showed up to Analytics' office hours
- he also volunteers to co-lead the first time
- also announce in tech-newsletter
- make sure announcement is enough in advance
- easy scheduling such as 'first monday every month'
Nov 19 2019
Nov 18 2019
Only considering the first 90 days of all editors that registered from 2018-01-01 to 2019-01-01 to better capture focus on junior editors.
Nov 13 2019
Latest version below 
Todo: Discuss with analytics whether to do a joint office hours (see Leila's mail from 2019-11-09 )
Nov 11 2019
Nov 7 2019
Based on your feedback, I did an iteration on the announcement.
Channels to distribute?
- mailing lists: wiki-research-l, wikimedia-l (?), foundation-optional
- WikiResearch on twitter
Nov 6 2019
Finally getting around to do some exploratory analysis.
I look at the following question:
Does a user, who interacts on a talk-page (via an edit), also contributes more edits to article-pages?
In short: For users with few edits, any additional interaction on a talk-page translates into a disproportionally large increase in the number of edits to article-pages. This suggests a crucial role of talk-page interactions for activity (and perhaps even productivity) on article-pages.
Nov 4 2019
added summary of results and approach on meta: https://meta.wikimedia.org/wiki/Research:New_user_reading_patterns
Oct 24 2019
For new users, almost 2/3 use the desktop version.
In contrast, regular reading sessions preferentially take place via the mobile-web version.
Oct 23 2019
--What is the office hour for?--
- Enable better communication with wikimedia community around research on wikimedia projects
- direct point of contact with members of research team
- lower barrier for interaction
- centralized, open, and archived discussion
- We welcome research-related questions from anyone, researchers and participants in the Wikimedia movement alike, including volunteers, developers, affiliates, and beyond.
Oct 22 2019
Oct 21 2019
Aim: Query the reading sessions of users that did not create a new account (i.e. those users which did not login at any point during the time-window of focus).
Obviously, the number of these users is much larger. Therefore, we want to subsample a subset of the same (or at least comparable size as the new-user data).
Oct 16 2019
That is fantastic @JAllemandou
I was suspecting something along these lines but it was not sure where/how to track those changes.
Should be possible to fix now.
Thanks a lot.
When looking at the number of registration events over time, I find that there are between 100-200 events per hour.
However, at some point this number drops to exaclty 0 on 2019-07-23.
See plot here:
Sorry, didnt see it was already done. Closed
Oct 14 2019
Oct 11 2019
That solved it. Thanks.
@MoritzMuehlenhoff opening this again since I cannot access the cluster anymore, e.g. via 'ssh firstname.lastname@example.org'
This happended after I reinstalled ubuntu (and everything else) on my wmf-laptop. I kept all the ssh-config files and keys which worked before (all content from the .ssh-folder).
Oct 8 2019
@leila Yes, this looks good to me. Happy to discuss the other item this week.
Oct 7 2019
Oct 2 2019
Sep 30 2019
The historical redirect table is extracted from wmf.mediawiki_wikitext_history
The above code extracts for each revision_id the redirect-command (say #redirect or #REDIRECT or #Weiterleitung) and the redirect-page (i.e. where it redirects to).
My aim was to write code that could join that information into the wmf.mediawiki_history table for a single snapshot of a given wikiproject (see the notebook).
If we want to understand reading patterns, we want to use wmf.webrequests.
Sep 24 2019
Memory error persists
Main problem: memory error for large (and even not super large) wikis such as frwiki.
I implemented some of your suggestions from the discussion today with andrew
- processing a single query
- only keeping minimal amount of text (substrings of the redirect command and the redirect-page-title)
- not saving as pandas, but simply applying the count() function to see how many results we get.
Attached is a new notebook (executed with '''pyspark - YARN (large)'''.
Sep 23 2019
@Isaac : add Martin to team members. If you explain to me, I can do that too. Thanks.
Thanks for the feedback @JAllemandou
Sep 20 2019
I came up with a first solution on spark (see attached notebooks; I ran this on the notebook-server).
This creates a dataframe with all revision-entries that are identified as redirects based on the content (page_id, revision_id, redirect_page).
I tested on rowiki and it runs in no time.
I extract the redirect-aliases automatically, so in principle could be applied to any wiki.
Sep 19 2019
@MoritzMuehlenhoff Added separate key for Cloud VPS.
Sep 18 2019
@elukey thanks, works now. Closing this taks.
I can ssh into production servers.
However, I cannot access SWAP following this documentation . It seems that I havent been added to the wmf-LDAP group (as requested above), according to this.
Could you add me such that I have SWAP-access? Sorry if I am missing something.
Sep 17 2019
- Language dependent Redirect Codes
Sep 12 2019
Martin will work on this project as part of his onboarding