Page MenuHomePhabricator

Unified language proofing tools integration framework (GSoC'15 Proposal)
Closed, DuplicatePublic

Description

This is a GSoc/Outreachy project proposal for integrating language proofing support in VisualEditor.

Profile Information

Name : Ankita Kumari
Email : kumariankita002@gmail.com
IRC nick : ankita-ks
Location : India
Time Zone : UTC + 5:30
Typical working hours : 8PM to 3AM (Weekdays), 11PM to 3AM (Weekends)

Project :
Integrate support for language proofing in VisualEditor using an external open source language proofing system (LanguageTool).

Possible Mentor : Amir Aharoni (@Amire80)
Co Mentor : @eranroz

Abstract

Wikipedia is expanding its horizons and every day millions of new users engage with it. A lot of the people who contribute to or want to do so might not be as comfortable with the source wiki editor. Visual Editor provides a brilliant WYSIWYG interface, and with support for language proofing it should reduce the number of grammatical and spelling errors which are often overlooked in several re-edits. So far it does not have an integrated tool for language proofing. There are isolated bots or gadgets in some wikipedia communities but there is no aggregated uniform implementation. From a developer’s perspective, I feel this integration will enhance the the user experience of VisualEditor significantly.

Design Details :

I plan to add a button to the VisualEditor toolbar. Once a User is done editing the document, he/she can just click the button to scan for possible grammatical or spelling errors.
Here are a couple of mockups to detail upon the idea.

Implementation Details

Approach
I am going to listen to the changes on the document.
Once the User presses the 'check' button, all the text is collected and sent to the LanguageTool server running locally.
Once the response from LanguageTool server is received, the text is annotated accordingly.

Project Architecture

LanguageTool Server will run in the backend similar to Parsoid. The DM is the layer above it with UI and CE on top. Any text rendered on CE will be transferred to LanguageTool server for proofing. The server's response is communicated back.

  • ve.ui : Toolbars and Inspectors (User Interface)
  • ve.ce : Rendering, selection and Input (Content Editable)
  • ve.dm : Linear model and Transaction System (Data Model)

Development Plan

Phase 1 :

  • Set up basic infrastructure (shouldn’t take much time as I have set up both LanguageTools and MediaWiki with VisualEditor successfully)
  • Set up LanguageTools server inside MediaWiki

Phase 2 :

  • Support within ve.ce to extract text
  • Support within ve.ce to query LT server

Phase 3 :

  • Support within ve.ce to process the response from LT server to annotate text
  • Add toolbar button to VisualEditor to turn LanguageProofing on or off.

Parts that might require extensive work :

  • Annotating text according to the response from LT server
  • UI integration
  • Once the integration is functional, an optimized algorithm to send text to LT server would be required.Sending data on every update would be very expensive and unnecessary.
  • Extensive testing to see that all the supported languages function smoothly

Components/modules will the proposed work modify or create
I am going to work on three aspects :-

  • Setting up of LanguageTool Server inside Mediawiki and its integration with ve.ce
  • Integration with the UI
  • Testing

Tentative Project Timeline

May 25 - June 3Implement LanguageTool server in MediaWiki
June 4 - June 14Extraction of text nodes from ve.ce
June 15 - June 25Querying LT server
June 26 - July 10Processing of response from LT server
July 11 - July 31Annotation of text marked in the xml response generated above
August 1 - August 10Adding toolbar button to the Editor View.
August 11 - August 21Extensive testing, documentation and clean up of code.

Deliverables at mid-term evaluation

  • Partially integrated Language Proofing : VisualEditor will be able to query LanguageTool Servers and generate an XML response.
  • Testing modules for all the modules implemented so far.

Final Deliverables

  • VisualEditor with integrated LanguageTool. The toolbar will have an additional button to provide for proofing. When the button is clicked, the grammatical mistakes will be highlighted in green and spelling mistakes in red. When the user clicks on the highlighted word, he is shown a list of alternative options and example use cases of the word or phrase.
  • Documentation of the code detailing all the steps taken and changes made to the original codebase.
  • Testing modules for the new feature so that it can be tested across a large number of wikis in supported languages.

Work done so far

I have mediawiki and VisualEditor set up both locally and on vagrant. I contributed some documentation for setting up mediawiki on vagrant from behind proxy.
I have been tinkering around with VisualEditor to get familiar with the codebase. I submitted a patch to fix the empty transclusion box problem.
I have successfully setup LanguageTool as a network service on my system. I used it to add language proofing support to a locally hosted website. I also added support for Hindi to LanguageTool. This was done as a proof of concept to determine if language proofing can be extended to new languages with ease.

Skills relevant for the project :
I have had elementary experience with JavaScript and PHP which is mostly used in VisualEditor. I am very comfortable with Java, which is what Language Editor is based on.
Apart from this I am a quick learner and can easily adapt to new languages and frameworks.

Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them :

I have had some experience working with Wikidata in the past. One of the relevant projects was building a search engine over 40GB of Wikipedia data, which provides efficient indexing of documents and retrieval of search queries. This project gave me a fairly detailed idea of the structure of text in Wikipedia pages. This would come in handy while extracting text for language proofing.
I also worked on a project which detected subtopics related to an entity in Tweets. As a part of this project I got fairly familiar with text parsing and various tools like TagMe API and Lucene. This knowledge will help me work my way around LanguageTool.

About Me

Education : Computer Science and Engineering undergraduate student at International Institute of Information Technology
Commitments : As of now I have no prior commitments between 25th May to 25th August. I intend to contribute about 35~40 hours per week towards the project.
What drives me to do this?
I have been using open source software for quite some now but I am a beginner when it comes to contributing to open source. I am a Linux enthusiast and I love scripting little tasks that make my life easier on my Linux-box.
I chose Wikimedia Foundation as an organization because I strongly believe that knowledge should be free for the use of all. I am also very interested in products that promote social engagement. I wouldn’t exactly call myself a grammar nazi, but tiny grammatical mistakes do annoy me a little. So I feel that by doing this, I would be doing my bit towards promotion of education and knowledge for all.

Event Timeline

Ankita-ks claimed this task.
Ankita-ks raised the priority of this task from to Needs Triage.
Ankita-ks updated the task description. (Show Details)
Ankita-ks added subscribers: Ankita-ks, Amire80, eranroz.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 24 2015, 3:16 PM
Ankita-ks renamed this task from GSoC'15/Outreachy Proposal for Newsletter Extension Project to GSoC'15/Outreachy Proposal for Unified language proofing tools integration framework.Mar 24 2015, 3:17 PM
Ankita-ks set Security to None.
Ankita-ks updated the task description. (Show Details)Mar 24 2015, 3:21 PM
Ankita-ks updated the task description. (Show Details)Mar 24 2015, 3:46 PM
Ankita-ks updated the task description. (Show Details)Mar 24 2015, 3:54 PM
Ankita-ks updated the task description. (Show Details)
Ankita-ks updated the task description. (Show Details)Mar 24 2015, 3:56 PM
Ankita-ks updated the task description. (Show Details)Mar 24 2015, 4:27 PM
Ankita-ks updated the task description. (Show Details)
Ankita-ks updated the task description. (Show Details)Mar 24 2015, 4:36 PM
Ankita-ks updated the task description. (Show Details)
Ankita-ks updated the task description. (Show Details)Mar 24 2015, 5:31 PM
Ankita-ks updated the task description. (Show Details)Mar 24 2015, 6:44 PM

@Ankita-ks please mention the microtask(s) you have worked/are working on clearly. I see that you have one open patch. Add reviewers to it so folks can review it and get it merged.

dnaber added a subscriber: dnaber.Mar 26 2015, 9:57 PM
Ankita-ks updated the task description. (Show Details)Mar 27 2015, 4:37 PM
Qgil triaged this task as Normal priority.Apr 30 2015, 10:03 AM
Ankita-ks renamed this task from GSoC'15/Outreachy Proposal for Unified language proofing tools integration framework to Unified language proofing tools integration framework (GSoC'15 Proposal).May 1 2015, 6:28 PM