For a long time Indic languages Wikisource projects depended totally on manual proofreading, which not only wasted a lot of time, but also a lot of energy. Recently Google has released OCR software for more than 20 Indic languages. This software is far far better and accurate than the previous OCRs. But it has many limitations. Uploading the same large file two times (one time for Google OCR and another at Commons) is not an easy solution for most of the contributors, as Internet connection is way slow in India. What I suggest is to develop a tool which can feed the uploaded pdf or djvu files of Commons directly to Google OCRs, so that uploading them 2 times can be avoided. -- Bodhisattwa (talk) 13:50, 10 November 2015 (UTC)
This card tracks a proposal from the 2015 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey
This proposal received 39 support votes, and was ranked #25 out of 107 proposals. https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Wikisource#Tool_to_use_Google_OCRs_in_Indic_language_Wikisource
'''Update'''
A python script has been developed which can download a djvu or pdf file from Commons, split them into individual pages based on the number of columns, upload the file to Google drive for OCR, download the Google-OCRed text and upload the text to respective pages of the file. The script currently works on GNU/Linux system* [[ https://github.com/tshrinivasan/OCR4wikisource|OCR4Wikisource script]] - A total of more than 9,00,000 pages OCRed in Bengali and Tamil Wikisource using this script. It has been tested and is utilised(works only in Bengali and Tamil Wikisource on a daily basis.Linux OS)
https://github.com/tshrinivasan/OCR4wikisource
; Enhancement required
* Windows version [[http://tools.wmflabs.org/ws-google-ocr/ |ws-google-ocr]] Tool in tool lab
* [[Wikisource:Google OCR/script.js |Google OCR Javascript]] in Multilingual Wikisource
* Tool Labs hosting
== Hackathon coordination ==
=== Volunteers ===
Volunteers confirmed:
* (@YourPhabricatorUsername, #YourIRCUsername, interest & skills volunteered to this project, at the event or remote online?)
Volunteers interested (if you are not yet sure, or want to learn more):
* (@YourPhabricatorUsername, #YourIRCUsername, interest & skills volunteered to this project, at the event or remote online?)
=== Progress report ===
* (Write down achievements, blockers requiring attention, and other important updates. This information will be priceless at the hackathon showcase and after the event!)