Page MenuHomePhabricator

Weekly Reports for Accuracy Review of Wikipedia
Closed, ResolvedPublic


Weekly progress made on the GSoC project T129536 - Accuracy Review of Wikipedia

Community Bonding Period: April 22 - May 22


  • Detailed Bonding Period report can be viewed here T135665
  • Discussed the project plan with the mentor
  • Initiated Village Pump and Administrator's Notice board discussions for finding out appropriate backlog categories that can be implemented by our flagging algorithms
  • Began taking the course CS224d - Stanford lectures on 'Deep Learning for Natural Language Processing' for help with implementing a neural net
  • Went over chapters 9 and 10 from the book Flask Web Development by Miguel Grinberg (I had completed the first 8 chapters during the application period)
  • Held a meeting with the mentor on Google Hangouts

Meeting 1: 10th May 2016 - UTC 5pm

The following was the agenda of the meeting. Details can be viewed in the 'Meeting Minutes' section of T135665

  1. Architecture of a neural net we may want to implement
  2. Wikipedia Backlogs
  3. Citation Hunt
  4. Top 3 flagging concerns that we want to address
  5. Administrative tasks for GSoC

Week 1: May 23 - May 29


  • T136290 - Refactored the wikiwho api code to work from the shell rather than the browser. It takes in an article's name and outputs the revision ids in JSON format.
  • T136292 - Defined roles (reviewer and admin) for the usage of the app. Added necessary permissions for various actions such as comment, review and administer. Still need to fix errors arising during db migration.
  • Held a meeting with the mentor on Google Hangouts

Meeting 2: 28th May 2016 - UTC 4am

The following was the agenda of the meeting. Details can be viewed in T136536

  1. Useful tools during the coding phase
  2. Generating article dumps
  3. Creating user roles and profile pages
  4. Clarity on the blueprint version of the bot and project schedule

Weeks 2 & 3: May 30 - June 12


  • T136292 - Fixing the db issues arising due to SQLite. Might rename tables as of now. Need a better solution in the future
  • T137638 - Created reviewer profiles
    • Created a landing page for each reviewer
    • Added functionality to view and edit profile
    • Admins additionally have permission to edit profiles of other users
    • Also added functionality to update their profile picture.
  • Held the biweekly meeting with the mentor
  • Added the Apache 2.0 license for the app
  • Updated the blog post here

Meeting 3: 11th June 2016 - UTC 4am

The following was the agenda of the meeting. Details can be viewed in T137639

  1. Discuss and resolve db issues
  2. Format of the review system
  3. License for the app
  4. DataFlowBot

Weeks 4 & 5: June 13 - June 26


  • Updated the db schema doc here
  • Went over the vlerm documentation
  • Coded the reviewsystem to work from command line. Design of the initial version is here and code is here.
  • Held a meeting with the mentor on Google Hangouts
  • Mid-term evaluation

Meeting 4: 26th June 2016 - UTC 1 pm

The following was the agenda of the meeting.

  1. New version of minireview which makes use of a single directory for file storage
  2. Performing datetime analysis
  3. Setting up a labs instance

Week 6: June 27 - July 3

  • T136290 - Updated the wikiwho code to extract dates from revids. Extracted dates were in 2014-05-31T00:14:13Z format
  • T138953 - Converted the extracted dates to Julian date format to help in the analysis. Found the mean date of an article. Code is here.
  • T139077 - Wrote a script to obtain all article names along with urls from a given WP:Backlog category. Further, this script creates question files in the 'records' directory for each article in given backlog category and month. The file contents are of the form: <category_name>\n<article_title>\n<url>
  • T139078 - Worked on displaying urls/diffs/permalinks in iframes. Code is here.
  • T139261 - Added url hyperlinking functionality in the texts. Code is here.
  • Held Meeting 5 with the mentor on Google Hangouts on 3rd July 2016 at UTC 1 pm

Weeks 7 & 8: July 4 - July 17

  • Worked on the inspect() function and inspect.html templates to display summary statistics for each file type
  • Fixed errors in ratio and recommend.html
  • T141295 - Refactored the templates to use Bootstrap. Commit is here. Need to better the design and style. This will be done later.
  • Made wikiwho search for the word ‘recent’ in article dumps (although we switched over to using the search function for this purpose). Defined function findWord() for this purpose.
  • T140559 - Used Wikipedia's Special Search to get articles containing the word 'recent' or 'recently'. Then used wikiwho to find the date when these words were inserted. Generated questions with article title, link and date. Code is here.
  • T140568 - Worked on getting the page views per article for the past month from the pageviews API. Code is here.
  • T140560 - Wrote a script for creating questions from low Flesch-Kincaid readability scores and high pageview counts. Almost complete. Fixing a few issues.
  • Discussed the action plan for the next stage of the project with the mentor. List of tasks to be completed has been updated and documented.

Weeks 9 & 10: July 18 - July 31

  • Wrote a new blog post describing the new system
  • T141840 - Began work on the project report- completed a major portion. Hypothesis testing part is remaining.
  • Fixed the UnicodeDecode error in /answer
  • T140560 - Script to generate questions from poor readability scores. Combined standardized fk-scores with pageviews to get the final score for ranking an article
  • T140684 - Tried various methods to handle caching like setting headers, turning off autocomplete, etc. Still needs to be resolved satisfactorily though.
  • Work was disrupted for a few days due to bandwidth outages

Week 11: Aug 1 - Aug 7

  • T141842 - Wrote help docs, added a new endpoint /help, also linked specific help docs from each endpoint
  • T142038 - Added custom error pages for errors 404 and 500
  • T140785 - Completed the optional registration system
    • Wrote the register() function and register.html templates. Functionality for sending salted SHA-512 of the email address to registered reviewers, which they can use as a token while answering questions
    • New directory called 'registered' with filenames as token for each user . File contains user info and files he/she worked on
    • Set the token in the session key, appended tokens to files when token is set in session key, checking and setting tokens in new end-point /token
    • Username and email validation. Checked if clashes present with the existing registered users
  • Used the Categorymembers API to get articles from different backlog categories. Integrated it with pageviews API to extract top 20%.

Week 12: Aug 8 - Aug 14

  • T140781 - Logging and archiving mechanism for all GET and POST requests to /ask, /answer and /recommend
  • T141899 - Questions from student edits - takes into account edit size and pageviews for ranking
  • T142041 - Integrating pageviews for all questions
  • T140683 - POSTing questions to ask
  • T141841 - Decided on initial set of 500 questions each for the ToolLabs and PythonAnywhere instances

Related Objects

Event Timeline

@prnk28 : Please keep your weekly reports in snyc.

@prnk28 : Please update your status report for the week! Its better to not to keep it to the end of the week, and do it on a daily basis :)

@prnk28 : Please keep your status updates in sync. I would strongly recommend you to make it on a regular basis from today :)

Thank you for the weekly reports. Feel free to close this down, as the program just got over.