Page MenuHomePhabricator

Community Bonding Period Report for Accuracy Review of Wikipedia
Closed, ResolvedPublic

Description

Bonding Period Report (April 22nd - May 22nd)

Work Done

  • Discussed the project plan with the mentor
  • Initiated Village Pump and Administrator's Notice board discussions for finding out appropriate backlog categories that can be implemented by our flagging algorithms. Discussions can be viewed here and here respectively.
  • Began taking the course CS224d - Stanford lectures on 'Deep Learning for Natural Language Processing' for help with implementing a neural net
  • Went over chapters 9 and 10 from the book Flask Web Development by Miguel Grinberg (I had completed the first 8 chapters during the application period)

Meeting

  • 1st meeting held on Google Hangouts on 10th May 2016 UTC 5pm
  • Meeting Minutes can be viewed below

Lessons Learnt

  • Create meeting minutes report not more than two days from actual meeting

Problems Faced

  • Communication - Poor video call quality (due to low bandwidth?), had to switch over to chat

Communication Plan

  • Communication with mentors via emails daily or every two days
  • Biweekly meetings on Hangouts
  • Weekly progress to be reported on the tracking task

Meeting Minutes

Date and Time

  • 10th May 2016 - UTC 5pm

Agenda

  1. Architecture of a neural net we may want to implement
  2. Wikipedia Backlogs
  3. Citation Hunt
  4. Top 3 flagging concerns that we want to address
  5. Administrative tasks for GSoC

Details

1. Architecture of a Neural Net

  • Discussed the machine learning algorithms that we may look at implementing
  • Analyzed the ORES wp10 classifier.
  • Discussed the possibility of using LSTMs or RNNs for our neural net
  • Data collection from the categories in WP:backlogs

2. Wikipedia Backlogs

  • Discussed the backlog categories that we need to first start working on
  • A few candidate categories shortlisted were were:
    • Lacking references
    • Accuracy -broken/outdated citations, needs updating, has original research
    • Content - copied and pasted
    • Links - has dead links
    • Style - needs copy edit
  • Of these, outdated content was decided as our top priority, followed by copy-edit type of articles if we are able to implement a fairly robust grammar checker

3. Citation Hunt

  • Discussed the possibility of integrating with Citation Hunt
  • Checkout their API

4. Top 3 flagging concerns

  • Finalize the list by 21st May

5. Administrative tasks for GSoC

  • Create 'Wikipedias-Accuracy-Review' Project on Phabricator
  • Create Meeting Minutes Report
  • Initiate discussions on Village Pump and Administrators' Notice Board
  • Create Bonding Period Report