Page MenuHomePhabricator

Unified language proofing tools integration framework
Open, LowestPublic

Description

See https://www.mediawiki.org/wiki/Extension:LanguageTool

Wikipedia communities in some languages developed automatic or semi-automatic tools to improve the quality of language or typography. Some examples are:

  1. The Wikificator tool in the Russian Wikipedia (similar tools exist in Ukrainian, Belarusian and possibly other Wikipedias)
  2. The Checkty gadget in the Hebrew Wikipedia, a semi-automatic script for fixing common grammar mistakes, as well as another list of automatic replacements usually performed by a bot.
  3. The orthography converter in the Belarusian-Taraškievica Wikipedia.
  4. Extra edit buttons in the Persian Wikipedia.
  5. ... and many other tools in other languages.

These tools are written as bots, gadgets or user scripts, and each project implements them in a different internal framework and with a different UI. It would be useful to unify at least some of these tools into a single internal framework - for example (but not necessarily) to store the replacement rules as a uniform JSON data structure rather than disparate JavaScript variables. Using external open source software, such as LanguageTool, is acceptable as well, as long as the functionality that the different language communities are using is preserved. Finally, this framework should have a single interface that would be usable with both the wiki syntax source editor and the VisualEditor.

  • Skills: JavaScript, regular expressions, data abstraction. Knowledge of the (human) languages in question is not required, but can be helpful.
  • Suggested micro-task: Fix a bug related to a VisualEditor toolbar button.
  • Primary mentor: Amir E. Aharoni (@Amire80)
  • Co-mentor: @eranroz

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@eranroz Would you be interested in mentoring this project along with Amir? :)

I can mentor this project with Amir.

Hi, everyone.
I would like to try this project for the GSoC/Outreachy program. As a newcomer to the FOSS world, I need some advice on how to get started. Anyone can help me? Thanks.

Hi I'd like to work on this project. Could you please give some pointers about the project? Thanks

@blackops0057, @Xi1919 I suggest you read the description of the task and explore the tools specified to get a better understanding of it. Ask here if you have specific doubts.

@blackops0057, @Xi1919 : Most tools for language checks use regular expressions, with set of predefined rules for detecting errors.
You can read the details of the tools from the links in the description or just take for example the example of https://languagetool.org/ .
This project will unify the tools for language proofing by providing a dedicated storage for rules (patterns), with ability to easily access it (using AJAX) by different tools and ability for users to extend the set of rules.

This infrastructure can be used by different tools with various directions:

  1. Bots - for actively fixing patterns of errors
  2. Gadgets/User scripts - For fixing patterns that require human intervention or exposing user interface for fixing errors (integration with visual editor, or a "typo fixing game" etc)

Interesting future directions can be extending this patterns set (for example by using machine learning on common fixes done by Wikipedia editors).

For getting started:

  1. Take a look on the various tools
  2. Think on the common requirements from such tools (what patterns are automatic/semi automatic? what properties define a pattern - is it just pattern and fix (for automatic), a pattern and hint (for semi automatic), exceptions (when the pattern isn't error) etc.

Hello everyone !!
Can you provide the specific bug links for microtask.

@Mdshekh: Hi, https://www.mediawiki.org/wiki/Annoying_little_bugs lists to some general L10N Engineering starter bugs under "Language Engineering". Hope that helps?

@Aklapper Hii, microtask above is to fix a bug related to visual editor toolbar button,but starter bugs under "Language Engineering" aren't related to visual editor.

We're holding an IRC meeting on March 25, at 1700 UTC for prospective GSoC and Outreachy participants with Wikimedia, on #wikimedia-office channel. Do join us!

Hello! The IRC meeting tomorrow has been shifted to #wikimedia-ect channel. Looking forward to seeing you there. :)

@Amire80, @eranroz, you need to sign up to as mentors urgently, and you need to rate your proposals there. https://www.google-melange.com/gsoc/homepage/google/gsoc2015

Amire80 raised the priority of this task from Lowest to High.Jul 8 2015, 12:37 PM
Amire80 moved this task from Backlog to Project administration on the VisualEditor-LanguageTool board.

I set up a proper board for the project as a general MediaWiki extension project: VisualEditor-LanguageTool.

It has columns by topics. The most important ones for the GSoC are the ones in the "Frontend features" column, because this is the core functionality. This column will probably grow in the coming weeks ;)

I am adding the most essential tasks as blockers of this one, both from the feature side and the GSoC administration side.

Hello!

End of GSoC is fast approaching. 17 August is "Suggested pencils down" deadline and 21 August is "Firm pencils down" deadline. It is expected that you don't dive into new features which might take longer than two weeks to complete and instead work on polishing up your project, testing thoroughly and getting your code merged into the main branch. I hope this project is almost complete so you can merge it and make it available to everyone as quickly as possible. :)

A few questions (for both mentors and student):

  • Are you confident in completing the project on time?
  • By when do you think you can merge the code, if at all?
  • Are there any major blockers or important missing features?

We are looking for projects which are (nearly) complete to feature on our post on Wikimedia and Google OSPO's blogs (for example: http://google-opensource.blogspot.in/2015/02/google-summer-of-code-wrap-up-processing.html). If you're interested in getting yours up there, hurry up and get this finished!

The hard deadline on getting code merged is September. T101393: Goal: All completed GSoC and Outreachy projects have code merged and deployed by September for details.

We'll be asking the students to demo their projects towards the end of the program as well.

Good luck!

Hi, I have associated two blocked-by tasks with this project.

For the student:

  1. Please go through the checklist in the end-term evaluation and fill out the fields which require any links. The checkboxes are for the mentor(s) only. Adding information on the past projects page is your task.
  2. Ensure that you have completed all the items listed in the end-term evaluation task. If there's a strong reason about why a particular item was not completed, please comment on the task and we shall look into it.
  3. Wrap-up report is mandatory and so is a demo-able link to the project (either in production or in a demo server).
  4. If you want your project to be featured in the blogpost on the Google OSPO blog, kindly comment back with a short, catchy description of the project along with a screenshot.

What is the situation with LanguageTool? Are you planning to deploy it in Wikimedia for VisualEditor users?

@Qgil : Yes, but there are too many bugs right now. I am working on setting up some test wikis in different languages so that the bugs can be pruned out.

No problem! Since your GSoC project passed and your extension is functional in Labs, I think we can consider this project as "merged" as possible at this point. Deployment to Wikimedia is a different story more demanding for new extensions, and there is a task to track it: T105153: Make the LanguageTool deployable to the Wikimedia cluster

It would be good to identify the bugs that are still blocking this task here, if any. Bugs so important to make this task unresolved, bugs so important that someone installing this extension in their own wiki should be aware of them. Then, it would be good to identify the bugs that are blocking the deployment to Wikimedia servers, and make them blockers of T105153.

Just for reference, we are doing something similar at MediaWiki-extensions-Newsletter.

Was there any progress in October? Is the deployment to Wikimedia still planned?

There has not been any progress in October. I haven't been able to find time. The work is still stuck at setting up test servers for finding more bugs. I need to finish that. Hopefully I will be able to do that during the upcoming holidays. :|

The work is still stuck at setting up test servers for finding more bugs. I need to finish that. Hopefully I will be able to do that during the upcoming holidays. :|

Hi @Ankita-ks, did you find some time? :-/

Deskana lowered the priority of this task from High to Lowest.Feb 23 2018, 5:04 PM

This task has been open and assigned for a long time. All its subtasks are resolved. Could someone explain what is left to do here?
@Ankita-ks: Do you still plan to work on this? :)

This task has been open and assigned for a long time. All its subtasks are resolved. Could someone explain what is left to do here?
@Ankita-ks: Do you still plan to work on this? :)

No reply, hence assuming this task can be closed as resolved. If I'm wrong, please set the status of this report back to "Open" via the Add Action...Change Status dropdown and elaborate what is left to do. Thanks!

Amire80 removed Ankita-ks as the assignee of this task.

Not totally done. Some day.