This is a GSoc/Outreachy project proposal for integrating language proofing support in VisualEditor.
== Profile Information ==
**Name** : Ankita Kumari
**Email** : kumariankita002@gmail.com
**IRC nick** : ankita-ks
**Location** : Hyderabad, India
**Time Zone** : UTC + 5.305:30
**Typical working hours** : 8PM to 3AM (Weekdays), 11PM to 3AM (Weekends)
**[[ https://phabricator.wikimedia.org/T89107 | Project ]]** : Integrate support for language proofing in VisualEditor using an external open source language proofing system (LanguageTool).
**Possible project Mentor** : Amir Aharoni (@Amire80), @eranroz
== Why implement Language Proofing in VisualEditor? ==
Wikipedia is expanding its horizons and every day millions of new users engage with it. A lot of the people who contribute to or want to do so might not be as comfortable with the source wiki editor. Visual Editor provides a brilliant WYSIWYG interface, and with language proofing it should reduce the number of grammatical and spelling errors which are often overlooked in several re-edits. Wikipedia also has very extensive support for Language with it’s various projects like Content Translation, Universal Language Selector, translate, etc. But so far it does not have an integrated tool for language proofing in any of the editors. There are isolated tools in some wikipedia communities but there is no aggregated uniform implementation:
Integrate support for language proofing in VisualEditor using an external open source language proofing system (LanguageTool).
**Why am I interested in doing this?**
I am very interested in products that promote social engagement. I feel the WYSIWYG interface of the Visual Editor is an effective way to let more people add content to Wikipedia.Possible Mentor** : Amir Aharoni (@Amire80)
I have come across so many wiki pages which have small grammatical errors which can be avoided if there was a basic proofing support provided by the editor. This problem is solvable. From a developer’s perspective, I feel this integration will enhance the the user experience of VisualEditor significantly.**Co Mentor** : @eranroz
== Please describe your experience with the organization's product as a user and as a contributor : ==== Abstract ==
I feel the user-interface is simple and minimalistic while providing the required functionalities.Wikipedia is expanding its horizons and every day millions of new users engage with it. A lot of the people who contribute to or want to do so might not be as comfortable with the source wiki editor. Visual Editor provides a brilliant WYSIWYG interface, and with support for language proofing it should reduce the number of grammatical and spelling errors which are often overlooked in several re-edits. So far it does not have an integrated tool for language proofing. There are isolated bots or gadgets in some wikipedia communities but there is no aggregated uniform implementation. From a developer’s perspective, Everything is accessible fromI feel this integration will enhance the toolbar at the tophe user experience of the editing area and does not interfere with the editing.VisualEditor significantly.
== Design Details : ==
But the toolbar does not contain any support for the language. Proofreading is a handy feature in any editor. Spell checks supported by browsers / operating systems are not sophisticated enough to cater to the complex needs of Wikipedia content contributors. Thus I look at contributing a whole new feature for the VisualEditor and also assist the community in any other tasks in whatever way I can.
The release of VisualEditor is much awaited since the beta version has been available for quite some time now but it has some bugs, many of them related to the UI design and User Interface issuesI plan to add a button to the VisualEditor toolbar. For instance there was the problem of the Transclusion Box being empty when there was no template added ([[ https://phabricator.wikimedia.org/T52281 | T52281 ]]) which might confuse a new user as to what to do next.Once a User is done editing the document, I sent a [[ https://gerrit.wikimedia.org/r/#/c/198131/ | patch ]] to fix this problemhe/she can just click the button to scan for possible grammatical or spelling errors.
Here are a couple of mockups to detail upon the idea.
I also have some experience setting up mediawiki on a vagrant and I have contributed some [[ https://www.mediawiki.org/w/index.php?title=MediaWiki-Vagrant&action=history | documentation ]] for the same.{F106088}
== Please describe your experience with any other FOSS projects as a user and as a contributor: ==
I have had some experience with LanguageTool as well. It is an excellent open source tool and allows addition of support for new languages. I added [[ https://github.com/languagetool-org/languagetool/pull/246 | support for Hindi Language ]] as a proof of concept. This was a part of determining if addition of new languages is feasible once this is integrated with VisualEditor.{F106090}
I have also set up LanguageTool setup as a network service. I used it to [[ https://github.com/languagetool.org/languagetool/commit/0e54aee0c733c12c5aad94e7930987460e48c040 | add language proofing support ]] to a website hosted locally on my system. This task was done to determine if LanguageTool server can be run locally without any glitch.= Implementation Details =
I would like to see Language Proofing grow into a whole new project in itself because of the immense scope it holds in terms of providing support for many different languages on Wikipedia.**Approach**
I hope to implement this as part of my internship.
= Please describe the details and the timeline of the work you plan to accomplish on the project you are most interested in: =am going to listen to the changes on the document.
Once the User presses the 'check' button, all the text is collected and sent to the LanguageTool server running locally.
Once the response from LanguageTool server is received, the text is annotated accordingly.
== Project Architecture ==
{F103590}
LanguageTool Server will run in the backend alongwithsimilar to Parsoid. The DM is the layer above it with UI and CE on top. Any text rendered on CE will be transferred to LanguageTool server for proofing. The server's response is communicated back.
- ve.ui : Toolbars and Inspectors (User Interface)
- ve.ce : Rendering, selection and Input (Content Editable)
- ve.dm : Linear model and Transaction System (Data Model)
== Development Plan ==
**Phase 1 :**
- Set up basic infrastructure (shouldn’t take much time as I have set up both LanguageTools and MediaWiki with VisualEditor successfully)
- Set up LanguageTools server inside MediaWiki
**Phase 2 :**
- Support within ve.ce to extract text
- Support within ve.ce to query LT server
**Phase 3 :**
- Support within ve.ce to process the response from LT server to annotate text
- Add toolbar button to VisualEditor to turn LanguageProofing on or off.
**Parts that might require extensive work :**
- Annotating text according to the response from LT server
- UI integration
- Once the integration is functional, an optimized algorithm to send text to LT server would be required.Sending data on every update would be very expensive and unnecessary.
- Extensive testing to see that all the supported languages function smoothly
**Components/modules will the proposed work modify or create**
I am going to work on three aspects :-
- Setting up of LanguageTool Server inside Mediawiki and its integration with ve.ce
- Integration with the UI
- Testing
**How am I going to do it?**
I am going to listen to the changes on the document.
If the update is a character insertion, I get the character updated. If the character inserted was space, I collect all the text from that position to the beginning of the sentence and send it to the LanguageTool server running locally. If the text was pasted, I retrieve the pasted text and send the entire pasted text to LanguageTool Server.
Once the response from LanguageTool server is received, the text is annotated accordingly.
== Tentative Project Timeline ==
| May 25 - June 3 | Implement LanguageTool server in MediaWiki
| June 4 - June 14 | Extraction of text nodes from ve.ce
| June 15 - June 25 | Querying LT server
| June 26 - July 10 | Processing of response from LT server
| July 11 - July 31 | Annotation of text marked in the xml response generated above
| August 1 - August 10 | Adding toolbar button to the Editor View.
| August 11 - August 21 | Extensive testing, documentation and clean up of code.
== Deliverables at mid-term evaluation ==
- Partially integrated Language Proofing : VisualEditor will be able to query LanguageTool Servers and generate an XML response.
- Testing modules for all the modules implemented so far.
== Final Deliverables ==
- VisualEditor with integrated LanguageTool. The toolbar will have an additional button which can turnto provide for proofing on or off. When the proofing is onbutton is clicked, the grammatical mistakes will be underlinedhighlighted in green and spelling mistakes in red. When the user clicks on the highlighted word, he is shown a list of alternative options and example use cases of the word or phrase.
- Documentation of the code detailing all the steps taken and changes made to the original codebase.
- Testing modules for the new feature so that it can be tested across a large number of wikis in supported languages.
== Work done so far ==
I have mediawiki and VisualEditor set up both locally and on vagrant. I contributed some [[ https://www.mediawiki.org/w/index.php?title=MediaWiki-Vagrant&action=history | documentation ]] for setting up mediawiki on vagrant from behind proxy.
I have been tinkering around with VisualEditor to get familiar with the codebase. I submitted a [[ https://gerrit.wikimedia.org/r/#/c/198131/ | patch ]] to fix the [[ https://phabricator.wikimedia.org/T52281 | empty transclusion box problem ]].
I have successfully setup LanguageTool as a network service on my system. I used it to add language proofing support to a locally hosted website. I also added support for Hindi to LanguageTool. This was done as a proof of concept to determine if language proofing can be extended to new languages with ease.
**Skills relevant for the project :**
I have had elementary experience with JavaScript and PHP which is mostly used in VisualEditor. I am very comfortable with Java, which is what Language Editor is based on.
Apart from this I am a quick learner and can easily adapt to new languages and frameworks.
== Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them : ==
I have had some experience working with Wikidata in the past. One of the relevant projects was building a search engine over 40GB of Wikipedia data, which provides efficient indexing of documents and retrieval of search queries. This project gave me a fairly detailed idea of the structure of text in Wikipedia pages. This would come in handy while extracting text for language proofing.
I also worked on a [[ http://www.slideshare.net/ankitasingh002 | project ]] which detected subtopics related to an entity in Tweets. As a part of this project I got fairly familiar with text parsing and various tools like TagMe API and Lucene. This knowledge will help me work my way around LanguageTool.
== Previous Open Source Experience : ==== About Me ==
I have been using open source software for quite some now but I am a beginner when it comes to contributing to open source. I am a Linux enthusiast and I love scripting little tasks that make my life easier on my Linux-box.
== Summer Plans : ==
As of now I have no prior commitments between 25th May to 25th August. I intend to contribute about 35~40 hours per week towards the project.
== Education completed or in progress (include university, major/concentration, degree level, and graduation year): ==**Education** : Computer Science and Engineering undergraduate student at International Institute of Information Technology
I am a 4th year student**Commitments** : As of Computer Science at International Institute of Information Technology. I graduate in September,now I have no prior commitments between 25th May to 25th August. 2015.
== How did you hear aboutI intend to contribute about 35~40 hours per week towards the program? ==ject.
**What drives me to do this?**
A very active Open Source Development Group exists in my university. I found out about this program from them.
== Are you applying for Google Summer of Code and, if so, with what organization(s)?I have been using open source software for quite some now but I am a beginner when it comes to contributing to open source. ==I am a Linux enthusiast and I love scripting little tasks that make my life easier on my Linux-box.
YeI chose Wikimedia Foundation as an organization because I strongly believe that knowledge should be free for the use of all. I am also very interested in products that promote social engagement. I wouldn’t exactly call myself a grammar nazi, but tiny grammatical mistakes do annoy me a little. So I feel that by doing this, I have applied for Google SummerI would be doing my bit towards promotion of Code 2015 too for MediaWiki.
education and knowledge for all.