Jan 26 2016
Nov 8 2015
Oct 29 2015
Oct 28 2015
Oct 27 2015
Oct 25 2015
Oct 20 2015
@jgbarah, I hope this is what you were looking for. https://github.com/anmolkalia/MediaWikiAnalysis/tree/new
Oct 18 2015
I'll get that done by tomorrow.
Oct 17 2015
So, I was able to correct the code. This is working fine now., I have solved all the bugs I came across.
I hope this works fine. Do let me know if something else is required.
I found a not so elegant solution to it. So the code is working fine now. I will try to come up with a better solution to the problem.
Alright, so that encoding problem is also there in the old code. I a trying to figure out why.
Okay, I was able to fix that. The databases are exactly the same. So, there is no problem with storage of the data.
Hi @jgbarah. I compared the databases obtained with the original code and the one I updated. The values getting updated in the databases are the same. The difference is when I use "SELECT * FROM mwdb.wiki_pages WHERE title NOT IN (SELECT title FROM mdb.Wiki_pages);", I get "ERROR 1267 (HY000): Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation '='". So there is some problem with the collation of the tilte column. I will try to find a workaround.
Oct 15 2015
Hi, @jgbarah, I am getting that error if I run the code again, basically when the database already contains the values that are being retrieved. Meaning, the print statements in the except part of insert_page function is giving this error. I'll go through what you sent and get back to you. Thank you for the help.
Oct 11 2015
There seems to be a problem with the print statement in the insert_page function, line 101. It seems to be working if I am using only pageid in the print statement.
Hi, @jgbarah, I made the changes in the files., I am getting this error "UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)" when I run "./mediawiki_analysis.py --database mdb --db-user root --url https://www.wikipedia.org/w" . I am trying to figure out why. It has something to do with the utf-8 encoding.
Oct 7 2015
Hi, @jgbarah, I am facing a problem in running the original mediawiki_analysis.py. I am writing this in the terminal "./mediawiki_analysis.py --database mwdb --db-user root --url https://en.wikipedia.org/w" and getting "./mediawiki_analysis.py:141: Warning: Out of range value for column 'date' at row 1". What should I do about it. The values in the table seem to be getting updated though
Oct 6 2015
Hi, @jgbarah, this is the what the mapping file looks like. Let me know if this is fine.
Hi, I have a doubt. The dbms we will be using in the backend is till MySQL, right? So, I should continue using MySQL datatypes? Sibyl seems to be using mysql.,
Oct 5 2015
Right, I am on it :)
Oct 4 2015
Oct 3 2015
@jgbarah, I'll start with this one. Sounds engaging and I will get to learn plenty from it. So my task here would be to change the code in mediawiki_analysis.py to replace the usage of MySQLdb with SQLAlchemy?
Oct 2 2015
@jgbarah, I went through the microtasks. They sound pretty good to me and sum up all of what we aim to do. So let us start with the one which can be completed in a month's time, because in order to be eligible for Outreachy, I need to finish atleast one microtask by 2nd November.
@jgbarah, since I am supposed to complete atleast one microtask before the application deadline, I think this one can be done in that much time. Do you think I should start with this one?
Oct 1 2015
Hi. I am new to image recognition, so I think this could be a great learning opportunity for me. I would really appreciate it if I could get some help understanding the scope of this project and also help me find mentors, or suggest how I could get in touch with possible mentors. I am interested in applying for this project in Outreachy 11, but if that doesn't work out, I think this would still be a really great learning opportunity for me and I would like to contribute to it nonetheless. Thanks.
Hi @jgbarah, I went through the MediaWiki API documentation and understood the code in mediawiki_analysis.py. I was able to understand most of what I read. Since it was a lot of information to go absorb, I am considering going through both once more and document the exact information that mediawiki_analysis.py is capable of mining and also, what all can potentially be mined from the API and share it here.
Also, as for organizing the data in a database, I noticed that the code creates a relational database containing three tables, one for wiki_pages_revs, another for wiki_pages and one for people. What schema we finally choose for our database should depend on what information we think is useful and finally extract. So that will also follow the above suggested documentation process.
Do you have anything else in mind or should I go ahead with this? Thanks.
Sep 30 2015
@jgbarah, thank you for the guidance. I am on it. Thanks.
Sep 29 2015
Hi @Aklapper. That sounds very encouraging :) I recently made a small contribution to the android Wikipedia App, so I do have some idea of how to start with the development work. As far as this project is concerned, I am especially interested in this because I am very interest in working on Information Retrieval. But I am not able to find documentation pertaining to the tools in question so I am not sure where to begin. Please let me know if you have some pointers on how I should start. Thanks.
Hi. I think this project is very interesting. I want to work on this in the upcoming round of Outreachy and I want to know what is expected out of me in terms of the skills and background knowledge required. Please let me know how I can contribute. Thanks.
Hi. This sounds pretty interesting and I would like to contribute to this project in the upcoming round of Outreachy. Please let me know what material I can look up for background knowledge. I have functional knowledge of Python so I feel I will be able to make some worthwhile contributions to this task. Thanks.
Hi. I would like to contribute to this in the upcoming round of Outreachy and hence I would like to understand more about basic background knowledge and skills required for this project. Please let me know how I can contribute. Thanks.
Hi, I would like to know more about this project. I will go through the links in the description to gain a rough background of what needs to be done. I think I would like to work on this as a part of @Outreachy-Round-11 once I have a better understanding of what is expected of it, and if not through Outreachy, I would be more than happy to contribute outside of it as well. Thanks.
Hi, I would like to know more about this project and what goals it wishes to accomplish. I think I would like to work on it as a part of @Outreachy11 once I have a better understanding of what is expected of it. Thanks.
Sep 11 2015
Sep 9 2015
Hi @Dbrant. I was caught up in organising the tech-fest in my college the past few days and never got a chance to submit the code on gerrit. I'll get it done by tonight. Really sorry for the inconvenience.
Aug 24 2015
@Dbrant, what should I do next? Submit the code on gerrit? Thanks.
Aug 21 2015
Gradle sync is failing for me. I am unable to figure out why. I have also adjusted the proxy settings. But there still seems to be a problem.,
@Dbrant, I am getting these errors http://i.imgur.com/l4mOpAJ.png when I am importing. I am working behind a proxy. I just simply imported the project and imported the ide proxy settings to this project, but it is showing the same errors.
Aug 19 2015
@Dbrant, I am facing these errors on building. Here is a screenshot. I went through http://i.imgur.com/l4mOpAJ.png. Where do I get information regarding setting up the development environment? Thank you very much for your help.
Aug 18 2015
Hi @Deskana. I was able to find that string and replace it, but I am facing errors such as "Failed to resolve: junit:junit:4.12" so haven't been able to test the app. I will try to resolve these, but I guess simply replacing the value of the string "error_network_error" with "Can't connect to the internet" should be able to solve the purpose.
Aug 17 2015
@Deskana, I am on it :) I am facing some problem using ssh "git clone ssh://<USERNAME>@gerrit.wikimedia.org:29418/apps/android/wikipedia". I am getting the following error "ssh: Could not resolve hostname gerrit.wikimedia.org: Name or service not known". I am trying to resolve it. I'll let you know once it is done. Thanks.
Aug 15 2015
Hi. I am a new contributor and I would like to contribute to this. Since I am new around here, I would need a bit of a background as to what is exactly required and what tools need to be used for the same. Please let me know how to proceed further. Thank you.
Aug 12 2015
I am facing the same problem. Were you able to find a solution?