Bonding Period Report (April 22nd - May 22nd)
Work Done
- Discussed the project plan with the mentor
- Initiated Village Pump and Administrator's Notice board discussions for finding out appropriate backlog categories that can be implemented by our flagging algorithms. Discussions can be viewed here and here respectively.
- Began taking the course CS224d - Stanford lectures on 'Deep Learning for Natural Language Processing' for help with implementing a neural net
- Went over chapters 9 and 10 from the book Flask Web Development by Miguel Grinberg (I had completed the first 8 chapters during the application period)
Meeting
- 1st meeting held on Google Hangouts on 10th May 2016 UTC 5pm
- Meeting Minutes can be viewed below
Lessons Learnt
- Create meeting minutes report not more than two days from actual meeting
Problems Faced
- Communication - Poor video call quality (due to low bandwidth?), had to switch over to chat
Communication Plan
- Communication with mentors via emails daily or every two days
- Biweekly meetings on Hangouts
- Weekly progress to be reported on the tracking task
Meeting Minutes
Date and Time
- 10th May 2016 - UTC 5pm
Agenda
- Architecture of a neural net we may want to implement
- Wikipedia Backlogs
- Citation Hunt
- Top 3 flagging concerns that we want to address
- Administrative tasks for GSoC
Details
1. Architecture of a Neural Net
- Discussed the machine learning algorithms that we may look at implementing
- Analyzed the ORES wp10 classifier.
- Discussed the possibility of using LSTMs or RNNs for our neural net
- Data collection from the categories in WP:backlogs
2. Wikipedia Backlogs
- Discussed the backlog categories that we need to first start working on
- A few candidate categories shortlisted were were:
- Lacking references
- Accuracy -broken/outdated citations, needs updating, has original research
- Content - copied and pasted
- Links - has dead links
- Style - needs copy edit
- Of these, outdated content was decided as our top priority, followed by copy-edit type of articles if we are able to implement a fairly robust grammar checker
3. Citation Hunt
- Discussed the possibility of integrating with Citation Hunt
- Checkout their API
4. Top 3 flagging concerns
- Finalize the list by 21st May
5. Administrative tasks for GSoC
- Create 'Wikipedias-Accuracy-Review' Project on Phabricator
- Create Meeting Minutes Report
- Initiate discussions on Village Pump and Administrators' Notice Board
- Create Bonding Period Report