Overview
This is one of two projects for Google Summer of Code 2024 for the Scribe organization that makes keyboard applications for second language learners using Wikidata and other Wikimedia projects as a basis for the data. The goal of Scribe is to provide everything that a user needs to help them with their second language in any app without them needing to leave their keyboard to look up grammar or other needed information. One focuses on Scribe-iOS (T358063), and the other on Scribe-Data (T358064).
We are a Wikimedia Project for New Developers and would love to work with you!
Project Goals
- Scribe-Data is our Wikidata lexemes data process that gets the data for the other applications that the Scribe team makes
- As of now the process is specific to Scribe applications
- We want to turn this into a general service that anyone can use to easily get language data from Wikidata
- For example: all German verbs in various formats, all French nouns...
- Etc for all languages and word types...
- The main task is devising how best to achieve this and make changes so that the project can have a more broad usability to the greater Wikimedia/open-source community
- Once that's done, more Wikidata query and formatting processes will be written to expand the reach of the service to non-Scribe languages
- There's also the potential to work on deploying the service via Docker as well as developing the test suite for Scribe-Data
- This new version of the data process will be used to make API calls later on via our in development Scribe-Server, with the student being welcome to work with us on this after GSoC
Project Specifics
- Size: 350 hours
- Rating: Intermediate
- Languages: Python, SPARQL query language, maybe Docker
- What you'll learn: Wikidata, Wikidata Query Service, ETL processes, maybe docker development
- Prerequisites: prior experience with Python and data analytics is a plus
Mentors
- Primary: Will Yoshida (@wkyoshida)
- Secondary: Andrew McAllister (@AndrewTavis)
- Tertiary: Henrik Thomasson (@Henrikt93)
Community
All issues related to this project will be tracked on GitHub. Please join our community Matrix spaces to chat with the team and learn more about Scribe! Specifically we have a room for GSoC and another for data processes. During the program your mentors will be happy to communicate with you on GitHub or via Matrix. You'll also be invited to the Scribe bi-weekly developer calls where you'll have time to present your progress and work with the team on any problems. Calls and checkins outside of the syncs can also happen if needed :)