This is a GSoc/Outreachy project proposal for integrating language proofing support in VisualEditor.
Name : Ankita Kumari
Email : email@example.com
IRC nick : ankita-ks
Location : India
Time Zone : UTC + 5:30
Typical working hours : 8PM to 3AM (Weekdays), 11PM to 3AM (Weekends)
Integrate support for language proofing in VisualEditor using an external open source language proofing system (LanguageTool).
Wikipedia is expanding its horizons and every day millions of new users engage with it. A lot of the people who contribute to or want to do so might not be as comfortable with the source wiki editor. Visual Editor provides a brilliant WYSIWYG interface, and with support for language proofing it should reduce the number of grammatical and spelling errors which are often overlooked in several re-edits. So far it does not have an integrated tool for language proofing. There are isolated bots or gadgets in some wikipedia communities but there is no aggregated uniform implementation. From a developer’s perspective, I feel this integration will enhance the the user experience of VisualEditor significantly.
Design Details :
I plan to add a button to the VisualEditor toolbar. Once a User is done editing the document, he/she can just click the button to scan for possible grammatical or spelling errors.
Here are a couple of mockups to detail upon the idea.
I am going to listen to the changes on the document.
Once the User presses the 'check' button, all the text is collected and sent to the LanguageTool server running locally.
Once the response from LanguageTool server is received, the text is annotated accordingly.
LanguageTool Server will run in the backend similar to Parsoid. The DM is the layer above it with UI and CE on top. Any text rendered on CE will be transferred to LanguageTool server for proofing. The server's response is communicated back.
- ve.ui : Toolbars and Inspectors (User Interface)
- ve.ce : Rendering, selection and Input (Content Editable)
- ve.dm : Linear model and Transaction System (Data Model)
Phase 1 :
- Set up basic infrastructure (shouldn’t take much time as I have set up both LanguageTools and MediaWiki with VisualEditor successfully)
- Set up LanguageTools server inside MediaWiki
Phase 2 :
- Support within ve.ce to extract text
- Support within ve.ce to query LT server
Phase 3 :
- Support within ve.ce to process the response from LT server to annotate text
- Add toolbar button to VisualEditor to turn LanguageProofing on or off.
Parts that might require extensive work :
- Annotating text according to the response from LT server
- UI integration
- Once the integration is functional, an optimized algorithm to send text to LT server would be required.Sending data on every update would be very expensive and unnecessary.
- Extensive testing to see that all the supported languages function smoothly
Components/modules will the proposed work modify or create
I am going to work on three aspects :-
- Setting up of LanguageTool Server inside Mediawiki and its integration with ve.ce
- Integration with the UI
Tentative Project Timeline
|May 25 - June 3||Implement LanguageTool server in MediaWiki|
|June 4 - June 14||Extraction of text nodes from ve.ce|
|June 15 - June 25||Querying LT server|
|June 26 - July 10||Processing of response from LT server|
|July 11 - July 31||Annotation of text marked in the xml response generated above|
|August 1 - August 10||Adding toolbar button to the Editor View.|
|August 11 - August 21||Extensive testing, documentation and clean up of code.|
Deliverables at mid-term evaluation
- Partially integrated Language Proofing : VisualEditor will be able to query LanguageTool Servers and generate an XML response.
- Testing modules for all the modules implemented so far.
- VisualEditor with integrated LanguageTool. The toolbar will have an additional button to provide for proofing. When the button is clicked, the grammatical mistakes will be highlighted in green and spelling mistakes in red. When the user clicks on the highlighted word, he is shown a list of alternative options and example use cases of the word or phrase.
- Documentation of the code detailing all the steps taken and changes made to the original codebase.
- Testing modules for the new feature so that it can be tested across a large number of wikis in supported languages.
Work done so far
I have mediawiki and VisualEditor set up both locally and on vagrant. I contributed some documentation for setting up mediawiki on vagrant from behind proxy.
I have been tinkering around with VisualEditor to get familiar with the codebase. I submitted a patch to fix the empty transclusion box problem.
I have successfully setup LanguageTool as a network service on my system. I used it to add language proofing support to a locally hosted website. I also added support for Hindi to LanguageTool. This was done as a proof of concept to determine if language proofing can be extended to new languages with ease.
Skills relevant for the project :
Apart from this I am a quick learner and can easily adapt to new languages and frameworks.
Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them :
I have had some experience working with Wikidata in the past. One of the relevant projects was building a search engine over 40GB of Wikipedia data, which provides efficient indexing of documents and retrieval of search queries. This project gave me a fairly detailed idea of the structure of text in Wikipedia pages. This would come in handy while extracting text for language proofing.
I also worked on a project which detected subtopics related to an entity in Tweets. As a part of this project I got fairly familiar with text parsing and various tools like TagMe API and Lucene. This knowledge will help me work my way around LanguageTool.
Education : Computer Science and Engineering undergraduate student at International Institute of Information Technology
Commitments : As of now I have no prior commitments between 25th May to 25th August. I intend to contribute about 35~40 hours per week towards the project.
What drives me to do this?
I have been using open source software for quite some now but I am a beginner when it comes to contributing to open source. I am a Linux enthusiast and I love scripting little tasks that make my life easier on my Linux-box.
I chose Wikimedia Foundation as an organization because I strongly believe that knowledge should be free for the use of all. I am also very interested in products that promote social engagement. I wouldn’t exactly call myself a grammar nazi, but tiny grammatical mistakes do annoy me a little. So I feel that by doing this, I would be doing my bit towards promotion of education and knowledge for all.