This is a GSoc/Outreachy project proposal for integratin language proofing support in VisualEditor.
== Profile Information ==
**Name** : Ankita Kumari
**Email** : kumariankita002@gmail.com
**IRC nick** : ankita-ks
**Location** : Hyderabad, India
**Time Zone** : UTC + 5.30
**Typical working hours** : 8PM to 3AM (Weekdays), 11PM to 3AM (Weekends)
**Project** : Integrate support for language proofing in VisualEditor using an external open source language proofing system (LanguageTool).
**Possible project Mentor** : Amir Aharoni (@Amire80), @eranroz
== Why implement Language Proofing in VisualEditor? ==
Wikipedia is expanding its horizons and every day millions of new users engage with it. A lot of the people who contribute to or want to do so might not be as comfortable with the source wiki editor. Visual Editor provides a brilliant WYSIWYG interface, and with language proofing it should reduce the number of grammatical and spelling errors which are often overlooked in several re-edits. Wikipedia also has very extensive support for Language with it’s various projects like Content Translation, Universal Language Selector, translate, etc. But so far it does not have an integrated tool for language proofing in any of the editors. There are isolated tools in some wikipedia communities but there is no aggregated uniform implementation.
**Why am I interested in doing this?**
I am very interested in products that promote social engagement. I feel the WYSIWYG interface of the Visual Editor is an effective way to let more people add content to Wikipedia.
I have come across so many wiki pages which have small grammatical errors which can be avoided if there was a basic proofing support provided by the editor. This problem is solvable. From a developer’s perspective, I feel this integration will enhance the the user experience of VisualEditor significantly.
== Please describe your experience with the organization's product as a user and as a contributor : ==
I feel the user-interface is simple and minimalistic while providing the required functionalities. Everything is accessible from the toolbar at the top of the editing area and does not interfere with the editing.
But the toolbar does not contain any support for the language. Proofreading is a handy feature in any editor. Spell checks supported by browsers / operating systems are not sophisticated enough to cater to the complex needs of Wikipedia content contributors. Thus I look at contributing a whole new feature for the VisualEditor and also assist the community in any other tasks in whatever way I can.
The release of VisualEditor is much awaited since the beta version has been available for quite some time now but it has some bugs, many of them related to the UI design and User Interface issues. For instance there was the problem of the Transclusion Box being empty when there was no template added ([[ https://phabricator.wikimedia.org/T52281 | T52281 ]]) which might confuse a new user as to what to do next. I sent a [[ https://gerrit.wikimedia.org/r/#/c/198131/ | patch ]] to fix this problem.
I also have some experience setting up mediawiki on a vagrant and I have contributed some [[ https://www.mediawiki.org/w/index.php?title=MediaWiki-Vagrant&action=history | documentation ]] for the same.
== Please describe your experience with any other FOSS projects as a user and as a contributor: ==
I have had some experience with LanguageTool as well. It is an excellent open source tool and allows addition of support for new languages. I added [[ https://github.com/languagetool-org/languagetool/pull/246 | support for Hindi Language ]] as a proof of concept. This was a part of determining if addition of new languages is feasible once this is integrated with VisualEditor.
I have also set up LanguageTool setup as a network service. I used it to [[ https://github.com/languagetool.org/languagetool/commit/0e54aee0c733c12c5aad94e7930987460e48c040 | add language proofing support ]] to a website hosted locally on my system. This task was done to determine if LanguageTool server can be run locally without any glitch.
I would like to see Language Proofing spiral into a whole new project in itself because of the immense scope it holds in terms of providing support for many different languages on Wikipedia.
I hope to implement this as part of my internship.
= Please describe the details and the timeline of the work you plan to accomplish on the project you are most interested in: =
== Project Architecture ==
{F103590}
- ve.ui : Toolbars and Inspectors (User Interface)
- ve.ce : Rendering, selection and Input (Content Editable)
- ve.dm : Linear model and Transaction System (Data Model)
== Development Plan ==
**Phase 1 :**
- Set up basic infrastructure (shouldn’t take much time as I have set up both LanguageTools and MediaWiki with VisualEditor successfully)
- Set up LanguageTools server inside MediaWiki
**Phase 2 :**
- Support within ve.ce to extract text
- Support within ve.ce to query LT server
**Phase 3 :**
- Support within ve.ce to process the response from LT server to annotate text
- Add toolbar button to VisualEditor to turn LanguageProofing on or off.
**Parts that might require extensive work :**
- Annotating text according to the response from LT server
- UI integration
- Once the integration is functional, an optimized algorithm to send text to LT server would be required.Sending data on every update would be very expensive and unnecessary.
- Extensive testing to see that all the supported languages function smoothly
**Components/modules will the proposed work modify or create**
I am going to work on three aspects :-
- Setting up of LanguageTool Server inside Mediawiki and its integration with ve.ce
- Integration with the UI
- Testing
**How am I going to do it?**
I am going to watch for updates on the Surface of the editor from `ve.ce.Surface.prototype.onModelDocumentUpdate()`.
If the update is a character insertion, I get the character updated via `ve.ce.Surface.prototype.onDocumentKeyPress()`. If the character inserted was space, I collect all the text from that position to the beginning of the sentence and send it to the LanguageTool server running locally. If the text was pasted, I retrieve the pasted text via `ve.ce.Surface.prototype.onPaste()` and send the entire pasted text to LanguageTool Server.
Once the response from LanguageTool server is received, the text is annotated accordingly.
== Tentative Project Timeline ==
| May 25 - June 3 | Implement LanguageTool server in MediaWiki
| June 4 - June 14 | Extraction of text nodes from ve.ce
| June 15 - June 25 | Querying LT server
| June 26 - July 10 | Processing of response from LT server
| July 11 - July 31 | Annotation of text marked in the xml response generated above
| August 1 - August 10 | Adding toolbar button to the Editor View.
| August 11 - August 21 | Extensive testing, documentation and clean up of code.
== Deliverables at mid-term evaluation ==
- Partially integrated Language Proofing : VisualEditor will be able to query LanguageTool Servers and generate an XML response.
- Testing modules for all the modules implemented so far.
**Skills relevant for the project :**
I have had elementary experience with JavaScript and PHP which is mostly used in VisualEditor. I am very comfortable with Java, which is what Language Editor is based on.
Apart from this I am a quick learner and can easily adapt to new languages and frameworks.
== Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them : ==
I have had some experience working with Wiki-data in the past. One of the relevant projects was building a search engine over 40GB of Wikipedia data, which provides efficient indexing of documents and retrieval of search queries.
I also worked on a [[ http://www.slideshare.net/ankitasingh002 | project ]] which detected subtopics related to an entity in Tweets.
== Previous Open Source Experience : ==
I have been using open source software for quite some now but I am a beginner when it comes to contributing to open source. I am a Linux enthusiast and I love scripting little tasks that make my life easier on my Linux-box.
== Summer Plans : ==
As of now I have no previous commitments between 25th May to 25th August. I intend to contribute about 35~40 hours per week towards this internship.
== Education completed or in progress (include university, major/concentration, degree level, and graduation year): ==
I am a final year student of Computer Science at International Institute of Information Technology.
== How did you hear about the program? ==
A very active Open Source Development Group exists in my university. I found out about this program from them.
== Are you applying for Google Summer of Code and, if so, with what organization(s)? ==
Yes, I have applied for Google Summer of Code 2015 too for MediaWiki.