Name: Sandeep Subramanian
Location: Berkeley, CA
Working Hours: 8-1; 4-7 (PDT)
I am a second year undergraduate student intending to pursue chemistry and computer science at the University of California, Berkeley (UCB).
I have been a fanatic of geography and languages since childhood. I have strongly believed that anyone should be able to advance their knowledge independent of language, and the exchange of information over the Internet is an extension of this principle. I myself can read and write over a dozen scripts and have won several national geography competitions.
By chance, I was exposed to programming during my second semester of college and absolutely loved it. I am now incredibly impassioned to use my newfound love for programming to remove language barriers from the web and make knowledge available to all.
Wikipedia is my home that doesn’t require a physical roof. I love contributing to Wikipedia, especially on pages concerning languages, geography, and Indian music. I have recently gotten into pywikibot, and I am working on building bots that sync airport destination pages and their maps.
Wikipedia is beyond doubt my constant source of intellectual engagement, and nothing would be more motivating and interesting to me than to develop tools that will allow people around the world to interact with Wikipedia like I do. I’ve been looking for an opportunity to bring my internationalization & localization ideas to life, and I think Wikimedia’s GSoC program presents the perfect opportunity for me to express my interests and meaningfully impact my world. I would like to work with any of the mentors involved in internationalization, such as Alolita Sharma, Amir Aharoni, Santosh Thottingal, and others.
If we want everyone to use Wikipedia, we need to make it usable for everyone. As a language fanatic, I know from first-hand experience that people who can contribute a lot to our digital knowledge bank are unable to do so because of unsupported scripts and locales, and as someone who wants all human knowledge readily available at his fingertips, this frustrates me. And that’s why I want to make this the most awesomest wiki enhancement ever -- so that I may eventually be able to master all of human knowledge at just one click away. As such, I am interested in getting more involved with and contributing to Wikimedia’s internationalization projects, and I see this as a great leap into that goal. Making this project happen means many more people can contribute to Wikipedia in a way they like, which makes me happy and inspired to do more.
A wide variety of internationalization projects are required to universalize access to digital information. I have identified five different types of localizations, as specified below. I will try to implement one of each, in increasing complexity, so that a framework exists for (hopefully) quicker implementations of other future internationalization tasks that may fall under each category.
1) Font Variation: Farsi/Urdu: Naskh-Nastaliq
Viewers should have the choice of viewing material in the style/font of their choice.
2) Script Transliteration: Malay/Indonesian: Rumi-Jawi
The same article should be available in multiple scripts used for the same language.
3) Simple Version Control: Punjabi: Gurmukhi-Shahmukhi
For languages which use multiple scripts but already have differentiated content in each script, since the content cannot be modified, I can simply give users the option to view the original article in the script of their choice and render the associated article. The advantage of this unified page is that users who can read both scripts and understand finer nuances in dialects can easily translate articles to other scripts. Also, this opens an easier pathway for dialect-friendly machine transliteration, which can be explored if time permits.
4) Dialectal Variation: English: American-British-Australian-Indian
For multinational languages, multiple user communities can exist with different spelling & numerical conventions. For example, British spellings vary from American ones, Indian English far more often makes use of crores and lakhs in counting large numbers instead of millions and billions, and having both conventions present on the same webpage prevents necessary localization to cater to the familiarities of users around the world. I plan to have localized versions of each webpage that account for conventions agreed upon by localized user communities.
5) Form Variation: Arabic: Diacritics-No Diacritics
Allowing the user to toggle the display of vowel diacritics can help the user to easily identify/pronounce a word that is not easily recognizable and can help for looking up the word in a dictionary. This can also help in displaying diacritics for Arabic-based scripts in which diacritics are obligatory.
And if time permits:
- Simple Version Control: German: German-Alemannic-Luxembourgish-Plattdeitsch-etc.
- Dialectal Translation: Chinese: Mandarin-Yue-Hakka-Minnan-Wu-Classical
I can publish code on my Github, sandsub95. I will ask for help from members of the Wikimedia GSoC community, including mentors or students who can answer a question that I cannot find on Stack Overflow or elsewhere. I will communicate weekly at a minimum with my mentor to ensure satisfactory progress.
Timeline (11 Weeks):
Goals: Learning PHP, Find Naskh/Nastaliq Fonts for Urdu/Farsi Across Platforms, Implement System to Change Font on Page from Menu
Deliverables: Naskh-Nastaliq Rendering Activated in Farsi/Urdu Wikipedias; Free, Open-Source Font Choice for other Language Wikipedias
Goals: Implement Transliteration between Rumi and Jawi; Implement for Javanese and Rumi in Basa Jawa Wikipedia if Time
Deliverables: Rumi & Jawi Options Activated in Bahasa Melayu and Bahasa Indonesia Wikipedias from Drop-Down Menu
Goals: Collect Gurmukhi & Shahmukhi Articles, with Mappings between them, and combine into version control system.
Deliverables: New Punjabi Wikipedia with Drop Down Menu for Shahmukhi & Gurmukhi
Goals: Develop tool that will allow users to specify required localizations for different English Wikipedia user communities; when tools approved, implement changes.
Deliverables: Organized Localization Request System & Community Page for Reviewing Submissions
Goals: Develop tool that can predict Arabic diacritics based on words: Collect database of Arabic words used on Wikipedia, map to possible words that could be represented with diacritics, choose contextually based on part of speech, correlation tags, etc.; Also set up version control using previous formats to allow for user contributions.
Deliverables: Tool with Arabic diacritics for Arabic articles (unlikely to be finished; this is a project of substance)
User Feedback & Bug Fixes & Slip Week