Profile Information
Name: Abel Serrano Juste
IRC nickname on Freenode: Quasipodo / Akronix
Web Profile: https://www.akronix.es/
Resume (optional):
Location (country or state): Spain
Typical working hours (include your timezone): 17:00-21:00 (UTC+02:00) Mon - Fri
Synopsis
- Short summary describing your project and how it will benefit Wikimedia projects
Inequality metrics are a way to measure the distribution of work within the members of a community. Knowing if the work is very much concentrated in few people can point us to further issues like: lack of accessibility for newcomers, missing of neutrality and diversity (potentially conflicting with Wikipedia Neutral Point of View and democracy principles) or inefficacy on communication and collaboration within that community, among others; yet it could also represent few people are very well committed to that community and keeping it alive, and it needs a bit of attention to boost it up. Either way, inequality indicators are a good starting point for a better understanding of collaborative communities, in particular, wikis, as it has been shown in previous research [1][2].
My project would consist in implementing a subset of validated inequality metrics into Wikistats v2. The request for those metrics and discussion about them is already started in: https://phabricator.wikimedia.org/T195033.
The whole project involves going through the following technical steps:
- Use sqoop to pull in the data needed from the database replicas to hdfs
- Write an oozie job to compute the metrics
- Load the output into Cassandra
- Write an API endpoint to serve the metric over the dimensions we decide on
- Configure the wikistats frontend to connect to the API
- Possible Mentor(s): @Milimetric and @MGerlach
- Have you contacted your mentors already?: Yes.
Deliverables
Days/Dates | Milestone/Deadline/Subtask Accomplished |
---|---|
May 4 - Jun 1 | Community bonding period: spend time interacting with Analytics team at WikiMedia. Understand common practices and norms |
Jun 1 - Jun 14 | Learn about the technologies & infrastructure being used within the Analytics team. Read and understand documentation: [3] [4] |
Jun 15 - Jun 28 | Investigate about metrics, discuss options, inspect available data. Discuss about metric ideas and define related requirements |
Jun 15 - Jun 28 | Draft some metrics. Validate them |
Jun 29 - Jul 3 | Phase 1 Evaluations |
Jul 6 - Jul 17 | Get data with sqoop. Write metrics with oozie. Load output into Cassandra. |
Jul 20 - Jul 24 | Write tests for previous metrics code |
Jul 27 - Jul 31 | Phase 2 Evaluations |
Aug 3 - Aug 14 | Write API endpoints to serve metrics. Write tests. |
Aug 3 - Aug 14 | Work on Wikistats fronted to connect to the API. |
Aug 14 - Aug 21 | Test all the above. |
Aug 14 - Aug 21 | Final deploy. |
Aug 24 - Aug 31 | Last refinements: Document whatever is not documented, clean-up code, etc. |
Aug 31 - Sept 7 | Final Evaluation |
This is possibly a pessimistic planing, though. I think that I'll be able to extract some days off from there.
Participation
- Technical communication will happen in the corresponding Phabricator tasks or subtasks.
- Available on IRC, at least, during my working hours ( 17:00-21:00 UTC+02:00, Mon - Fri ).
- Communication through e-mail and through the analytics team mailing list for more thoughtful discussions.
- Weekly updates will be written to have a better overview of the current status of the project in my meta wiki user page.
- Bi-weekly reports on a blog will be written as a communication channel for both inside the Wikimedia community and the outside world.
- Source code should be uploaded and published to its corresponding Gerrit instance following the git flow of the team.
About Me
I graduated in 2015 of my Bachelor's in Computer Science in the Universidad Complutense de Madrid (UCM). During my time in the university, I founded a student association to support free & open-source software called LibreLabUCM. I also did one academic year abroad in Cyprus under the Erasmus program. After university, I was working for a bit doing tutoring, web development, IT support and system administration.
From March 2017 to Spring 2019 I got a position as an assistant researcher in the university to support the research around online collaborative communities, in particular wikis. During this time I have published some research papers, done some data processing, analysis and visualization with wiki data and attend different related events, like WMF Hackathons 2018 and 2019 or OpenSym 2018.
I'm currently finishing my Master's in Data Science and I should be done by the beginning of June. The university I'm enrolled in is a online university, so that has allowed me to be moving around during my studies. Lastly, I have experienced a bit of the live of a digital nomad.
I heard about Google Summer of Code while I was studying in the university and I found it as a very good opportunity to get started with real open-source projects, to develop computer skills (like remote working, good coding practices) and get precious knowledge and experience.
This project gives me the opportunity to do something with real impact and useful for the society by extension. I also want to get practice learning on how a big organization works, in both technically and organically. Finally, I value the Wikimedia Foundation knowledge-for-everyone mission as well as I admire and applaud the fact that is built fully upon volunteers and private donations.
Finally, I should highlight that for this summer I'd like to live in some sort of an intentional community which entails some collective work plus other tasks that would take me part-time. Nevertheless, I still believe that my proposed dedication would be enough to get the job well done and, of course, I will look for a place with good internet connection and commit with the minimum of working hours that I presented in this proposal.
Past Experience
During my time in the university I have been mostly downloading, processing and analyzing MediaWiki data. I also led the development of WikiChron, a web app to visualize metrics and networks upon wiki communities from their historical data. As a result of those two years working as an assistant researcher at the university I published four articles and presented my work in both scientific congress [5], but also in some other events (like in the wikimedia hackathon 2018).
I have also done some small contributions to other open source projects from time to time (you can see some of my contributions in my github profile) . My most remarkable involvement right now is in the Trustroots travelers social network.
Regarding code directly owned by Wikimedia itself, I can only recall this very little fix I did to the docs of mediawiki-utilites: https://github.com/mediawiki-utilities/python-mwtypes/pull/2.
Lastly, I want to add that I'm a editor of Wikipedia since 2013 and I am an active editor of other wikis.
Any Other Info
I have coded already some inequality metrics in Python and packed them down in a library available in pip: https://github.com/Grasia/inequality_coefficients/
Also, I have coded some scripts to download, process and filter wiki data: https://github.com/Grasia/wiki-scripts.
References:
[1]: F. Ortega, J. M. Gonzalez-Barahona, and G. Robles. 2008. On the Inequality of Contributions to Wikipedia. In Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008). 304–304.
[2]: Abel Serrano, Javier Arroyo, and Samer Hassan. 2018. Participation Inequality in Wikis: A Temporal Analysis Using WikiChron. In Proceedings of the 14th International Symposium on Open Collaboration (OpenSym '18). ACM, New York, NY, USA, Article 12, 7 pages. DOI: 10.1145/3233391.3233536.
[3]: Documentation for the Analytics technology set
[4]: Wikistats 2 documentation
[5]: See the publications section on my website to refer to them: https://www.akronix.es/publications.html