We added around 4,00,000 pages to tamil wikisource site. http://ta.wikisource.org using the text from google OCR for the Tamil public domain books.
Those text content have many repeated spell errors.
We are collecting the errors and correct spellings here as comma separated lines.
https://ta.wikisource.org/s/2ojl
Wrote a program to read this page, search and replace those words in tamil wikisource pages.
Code is here : https://github.com/tshrinivasan/tools-for-wiki/tree/master/fix_spellerrors_tawikisource
used pywikibot for this.
We are thinking on collecting the wrong words and fixes from public.
Now, need a solution to collect the error words and fixes.
As there are many words(more than 5000), putting everything on a single wiki page is not good.
Is there any way to create a simple form on wiki to users to enter data ? The data should update a wiki page. Duplicate entries should be avoided.
Thanks.