Page MenuHomePhabricator

Misc. improvements to MediaWikiAnalysis (which is part of the MetricsGrimoire toolset)
Closed, DeclinedPublic

Description

MediaWikiAnalysis is a tool to collect statistics from MediaWiki sites, via the MediaWiki API. It is a part of the MetricsGrimoire toolset, and it is currently used for getting information from the MediaWiki.org site, among others.

The stats currently collected by MediaWiki are only a part of what is feasible to collect, and the tool itself could be improved. Some possible directions:

  1. Explore in detail the MediaWiki API and extract as much information from it as possible.
  2. Improve efficiency and incremental retrieval of data
  3. Propose (and if possible, implement) changes to the MediaWiki API if needed, to support advanced collection of data.
  4. Use SQLAlchemy instead of MySQLdb for managing the MediaWikiAnalysis database.

Optionally, candidates can as well develop a library, using Python/Pandas, for analyzing the resulting database, computing the most relevant metrics. The current GrimoireLib can be an inspiration for this line of development.

When preparing their proposals, candidates are urged to analyze the problems that may arise while developing the proposed lines, and specify how they are going to deal with them, and in general the approach to be followed to improve efficiency, incremental collection, and to find out which modifications to the API would be convenient.

  • Mentors: Alvaro del Castillo, Daniel Izquierdo, Jesus M. Gonzalez-Barahona
  • Primary mentor: @jgbarah
  • Co-mentor: @Dicortazar
  • Other mentors: (optional, Phabricator username)
  • Skills: Python, SQL, PHP (only in case of proposing changes to MediaWiki API)
  • Estimated project time for a senior contributor: 2-3 weeks
  • Microtasks: T114437 T114439 T114440 T116509

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Qgil triaged this task as Low priority.Feb 10 2015, 2:23 PM
Qgil added a project: wikimedia.biterg.io.
Qgil set Security to None.
Qgil added subscribers: Acs, Dicortazar.
Qgil added a subscriber: Qgil.

@Dicortazar, @Acs, are you still interested in proposing this project for GSoC / OPW? A new round is starting.

Qgil moved this task from Backlog to Need Discussion on the Possible-Tech-Projects board.
Qgil added a comment.Feb 11 2015, 1:44 PM

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

Proposing this task for the ECT-March-2015 sprint

Qgil raised the priority of this task from Low to Normal.Mar 2 2015, 8:28 AM

GSoC / Outreach are around the corner, and we need to decide whether this possible tech projects is in or out.

Qgil lowered the priority of this task from Normal to Lowest.Mar 2 2015, 9:59 AM
Qgil removed a project: ECT-March-2015.

Not for this round.

prnk28 added a subscriber: prnk28.Sep 11 2015, 3:20 PM
Qgil added a comment.Sep 23 2015, 9:07 AM

This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

Qgil added a comment.Sep 23 2015, 9:35 AM

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

Hi, I would like to know more about this project. I will go through the links in the description to gain a rough background of what needs to be done. I think I would like to work on this as a part of @Outreachy-Round-11 once I have a better understanding of what is expected of it, and if not through Outreachy, I would be more than happy to contribute outside of it as well. Thanks.

fbstj awarded a token.Sep 29 2015, 8:50 AM
Qgil added a subscriber: jgbarah.Sep 29 2015, 9:47 AM

@Acs, @Dicortazar, @jgbarah, do you still want to propose this project? Are you able to mentor it?

Hi, I would like to know more about this project.

Hi @Anmolkalia, thanks for your interest in contributing! If you have specific questions about this task, please don't hesitate to explicitly ask them in this task!
And for general information how to get started working on code, please see How to become a MediaWiki hacker. Thanks!

Hi @Aklapper. That sounds very encouraging :) I recently made a small contribution to the android Wikipedia App, so I do have some idea of how to start with the development work. As far as this project is concerned, I am especially interested in this because I am very interest in working on Information Retrieval. But I am not able to find documentation pertaining to the tools in question so I am not sure where to begin. Please let me know if you have some pointers on how I should start. Thanks.

@Acs, @Dicortazar, @jgbarah, do you still want to propose this project? Are you able to mentor it?

Yes, I think so. I'm going to comment on the specific questions that @Anmolkalia is asking.

jgbarah added a comment.EditedSep 29 2015, 9:16 PM

Hi @Aklapper. That sounds very encouraging :) I recently made a small contribution to the android Wikipedia App, so I do have some idea of how to start with the development work. As far as this project is concerned, I am especially interested in this because I am very interest in working on Information Retrieval. But I am not able to find documentation pertaining to the tools in question so I am not sure where to begin. Please let me know if you have some pointers on how I should start. Thanks.

Hi, @Anmolkalia. Thanks a lot for your interest.

Probably you can start by browsing the MetricsGrimoire website, to have some context. Probably you already know where is the code for WikiMediaAnalysis. The code is not complex, most of it is in mediawiki_analysis.py.

The intention of the program is to get all history in the MediaWiki system of interterest, and produce a database with it, organized in a way which is similar to other MetricsGrimoire tools, and easy to query to calculate parameters of interest. The data stored, or to be stored, includes all changes to all pages (such as edits, changes in name, etc.), with all the available information for each change (author, date, kind of change, etc.). Right now, the data retrieved is not complete, some more data can be obtained.

A good starting point could be to do some analysis of the MediaWiki API, characterize all the information that can be obtained, and identify which new information could be interesting for MediaWikiAnalyzer.

Anmolkalia added a comment.EditedSep 30 2015, 9:48 AM

@jgbarah, thank you for the guidance. I am on it. Thanks.

Qgil updated the task description. (Show Details)Sep 30 2015, 2:23 PM

Thank you @jgbarah! Please update the description and propose some microtasks. When this is done, we will promote this project idea to Outreachy-Round-11 candidates.

Hi @jgbarah, I went through the MediaWiki API documentation and understood the code in mediawiki_analysis.py. I was able to understand most of what I read. Since it was a lot of information to go absorb, I am considering going through both once more and document the exact information that mediawiki_analysis.py is capable of mining and also, what all can potentially be mined from the API and share it here.
Also, as for organizing the data in a database, I noticed that the code creates a relational database containing three tables, one for wiki_pages_revs, another for wiki_pages and one for people. What schema we finally choose for our database should depend on what information we think is useful and finally extract. So that will also follow the above suggested documentation process.
Do you have anything else in mind or should I go ahead with this? Thanks.

jgbarah updated the task description. (Show Details)Oct 1 2015, 9:49 PM
jgbarah updated the task description. (Show Details)Oct 1 2015, 9:58 PM
jgbarah updated the task description. (Show Details)Oct 1 2015, 10:01 PM
jgbarah updated the task description. (Show Details)

Thank you @jgbarah! Please update the description and propose some microtasks. When this is done, we will promote this project idea to Outreachy-Round-11 candidates.

Done!

Also, as for organizing the data in a database, I noticed that the code creates a relational database containing three tables, one for wiki_pages_revs, another for wiki_pages and one for people. What schema we finally choose for our database should depend on what information we think is useful and finally extract. So that will also follow the above suggested documentation process.

The idea is to extract as much information as possible, since it is difficult to know what somebody is going to need. If needed, we could have options for letting users avoid the most time-consuming retrievals.

Usually, we try to organize the information in a format close to how the API provides data, but with some ideaa about how it is going to be queried. The current schema, being simple, tries to respect this.

Do you have anything else in mind or should I go ahead with this? Thanks.

I wrote some microtasks, maybe you can start by selecting one of them. But I'm open to suggestions.

@jgbarah, I went through the microtasks. They sound pretty good to me and sum up all of what we aim to do. So let us start with the one which can be completed in a month's time, because in order to be eligible for Outreachy, I need to finish atleast one microtask by 2nd November.

Qgil added a comment.Oct 2 2015, 10:23 AM

@jgbarah, a co-mentor is required. Are Alvaro del Castillo, Daniel Izquierdo (mentioned in the description) on board?

jgbarah updated the task description. (Show Details)Oct 2 2015, 3:49 PM

@jgbarah, a co-mentor is required. Are Alvaro del Castillo, Daniel Izquierdo (mentioned in the description) on board?

Added @Dicortazar, maybe Álvaro or somebody else will also join.

@jgbarah, I went through the microtasks. They sound pretty good to me and sum up all of what we aim to do. So let us start with the one which can be completed in a month's time, because in order to be eligible for Outreachy, I need to finish atleast one microtask by 2nd November.

Depending on your skills and previous experience, all of them could completed in much less than a month. But of course I suggest that you select the one that you find both more appealing to you, and the one that is more simple to you, in terms of your past experience and skills (if this is possible).

This looks interesting, and I can help with exploring and working with the MediaWiki API.

@jgbarah @Dicortazar

Hi,

I am Ashita Prasad and I am interested in pursuing this project for @Outreachy-Round-11.
I am an avid Pythonista and have experience in web development (HTML/CSS/JS/PHP) and databases/data-warehouses (MySQL/Postgres/Netezza/Teradata).
Since, I am new to this project. Can you please guide me to gain some traction and contribute to this project.

Regards,
Ashita Prasad

Can you please guide me to gain some traction and contribute to this project.

Hi @ashitaprasad. Thanks for your interest! Please see https://www.mediawiki.org/wiki/Outreach_programs/Life_of_a_successful_project#Answering_your_questions for information on best audiences and successful questions. Thank you! :)

Can you please guide me to gain some traction and contribute to this project.

Hi @ashitaprasad. Thanks for your interest! Please see https://www.mediawiki.org/wiki/Outreach_programs/Life_of_a_successful_project#Answering_your_questions for information on best audiences and successful questions. Thank you! :)

Thanks, @Aklapper.

Hi, @ashitaprasad. If you´re interested, ask me any question you may have about the description of this project, have a look at the source code for WikiMediaAnalysis, another one at the WikiMedia API, and then select one of the microtasks to move ahead...

This looks interesting, and I can help with exploring and working with the MediaWiki API.

Please, go ahead!

@Tmalhotra, this is another project you can look at for contributing.

ashitaprasad added a comment.EditedOct 16 2015, 12:43 PM

@jgbarah Thanks for the reply and giving me a head start.
I will explore more about the project and select a microtask and inform you soon.

01tonythomas added a subscriber: 01tonythomas.

I am shifting this to Outreachy-Round-11 as the project description has two mentors, micro-tasks and looks ready for the 11th edition of Outreachy ( Dec 2015 - Mar 2016 ) . Potential candidates should start by submitting their proposals as a blocker for this task, by November 02.

Feel free to revert it back, if this task has some relevant issues which might block its completion in this term of Outreachy.

I am shifting this to Outreachy-Round-11 as the project description has two mentors, micro-tasks and looks ready for the 11th edition of Outreachy ( Dec 2015 - Mar 2016 ) . Potential candidates should start by submitting their proposals as a blocker for this task, by November 02.
Feel free to revert it back, if this task has some relevant issues which might block its completion in this term of Outreachy.

Perfect from my side, thanks.

jgbarah updated the task description. (Show Details)Oct 19 2015, 9:26 PM
Anmolkalia updated the task description. (Show Details)Oct 25 2015, 3:06 AM
Anmolkalia updated the task description. (Show Details)Oct 27 2015, 10:19 AM
Aklapper renamed this task from Improving MediaWikiAnalysis to Misc. improvements to MediaWikiAnalysis (which is part of the MetricsGrimoire toolset).Nov 9 2015, 11:26 AM
Sumit added a subscriber: Sumit.Feb 19 2016, 8:16 PM
NOTE: Outreachy round 12 applications are now open and GSoC 2016 is round the corner. This project was featured for Outreachy round 11 and has a well defined scope. Are you ready to mentor the project this season? If yes, then we'll feature this for Outreachy round 12 and GSoC 2016 as well. Please reply back in comments.

Hey there,

I'm interested in doing this as a GSoC project.
@jgbarah @Dicortazar are you still interested in being a mentor ?

prnk28 removed a subscriber: prnk28.Feb 22 2016, 4:15 PM

Hi. Sorry for the delay in answering.

We're currently developing some new tools that will substitute WikiMediaAnalysis. For now, it is not likely that we´re going to find a way of proposing tasks for OutreachY in them, at least until the software is a bit more mature...

Thanks a lot for your interest, anyway.

In that case, I recommend to decline this task. I'm removing Possible-Tech-Projects for now.