Page MenuHomePhabricator

(GSoC 2024) Scribe-Data: Refactor into a multi purpose Wikidata language pack CLI tool
Open, Needs TriagePublic



This is one of two projects for Google Summer of Code 2024 for the Scribe organization that makes keyboard applications for second language learners using Wikidata and other Wikimedia projects as a basis for the data. The goal of Scribe is to provide everything that a user needs to help them with their second language in any app without them needing to leave their keyboard to look up grammar or other needed information. One focuses on Scribe-iOS (T358063), and the other on Scribe-Data (T358064).

We are a Wikimedia Project for New Developers and would love to work with you!

Project Goals

  • Scribe-Data is our Wikidata lexemes data process that gets the data for the other applications that the Scribe team makes
  • As of now the process is specific to Scribe applications
  • We want to turn this into a general service that anyone can use to easily get language data from Wikidata
    • For example: all German verbs in various formats, all French nouns...
    • Etc for all languages and word types...
  • The main task is devising how best to achieve this and make changes so that the project can have a more broad usability to the greater Wikimedia/open-source community
  • Once that's done, more Wikidata query and formatting processes will be written to expand the reach of the service to non-Scribe languages
  • There's also the potential to work on deploying the service via Docker as well as developing the test suite for Scribe-Data
  • This new version of the data process will be used to make API calls later on via our in development Scribe-Server, with the student being welcome to work with us on this after GSoC

Project Specifics



All issues related to this project will be tracked on GitHub. Please join our community Matrix spaces to chat with the team and learn more about Scribe! Specifically we have a room for GSoC and another for data processes. During the program your mentors will be happy to communicate with you on GitHub or via Matrix. You'll also be invited to the Scribe bi-weekly developer calls where you'll have time to present your progress and work with the team on any problems. Calls and checkins outside of the syncs can also happen if needed :)

Event Timeline

Hi @AndrewTavis It sounds like a great project and is perfect for GSoC!

I have added this project to our GSOC 2024 Media Wiki page:

Kindly share your project via Wikitech by replying to this thread:

If this is your first time mentoring via GSoC, I recommend reviewing this guide for mentors: Additionally, I'll add you to our Zulip chat where you can connect with fellow mentors for ongoing support and collaboration.

Hi, Can I comment here? I'm a beginner in Gsoc and I am very interested in this project. Now I am working on a good-first-issue on the Scribe-Data repository. After finishing it, Can I work on the proposal and can you guide me? Thanks! @AndrewTavis

Hi @Linfye 👋 Thanks for writing and your interest in working with us on Scribe-Data. Great to have you on Matrix as well. Feel free to write there, GitHub or here in this Phabricator task as needed :)

Happy to support you with working on the issue you picked out. That would be the appropriate one to start with! Kindly write back in the GitHub issue with a bit more information about your background, and then from there we can know better how to support. Specifically you're looking at a Python issue, so let us know what your experience is, and as it's machine translation let us know what you've done that might be similar. If you haven't had any experiences with all this, no stress - happy to write out some pseudocode to help!

Beyond this, happy to support on the proposal! Let's get one or two Scribe-Data issues done so the team and I have a better understanding of your skills, and then we can schedule some time to work on the proposal together 😊

Kindly share your project via Wikitech by replying to this thread:

Hi @Maryann-Onyinye 👋 I included both projects in one email to the thread mentioned above :) Thanks so much!

Hi @AndrewTavis, I am a beginner in GSOC and I am interested in this project. I added information about my background to the good-first issue I'm looking to work on in the Scribe-data repository.

Hi @AndrewTavis , I am a python and django developer , this is my current stack in the company I intern at and I am very familiar with python programming , would love to learn more about wikidata while working on this task.

Great to have so much interest in the project, and welcome to all! Please find a good first issue, which for a lot of people are the machine translation issues. We can work on these a bit later once one version of it has been completed so that everyone is producing something consistent :)

Please reach out here or on Matrix via the links in the task description if you need help! 😊

Hi @AndrewTavis

It's my first time participating GSOC, I mostly work with Python and JavaScript and I would love to work, learn and collaborate on this project for Scribe-Data.

As you mentioned above I am looking into Scribe-data's github repo for any issues I could work on! 🥰

P.S. I wanted to ask how do i join wikimedia for this year's GSOC and what are the best practices I could do that would maximise my chances of getting selected in by wikimedia for this year's GSOC

with love,
Hasan :)

Hey @HasanCoder 👋

Let us know in the issue when you find one you want to work on!

Big thing for GSoC is that you should check other projects from Wikimedia and apply to those as well. You can do multiple GSoC applications, which increases your chances of being accepted :) All the Wikimedia projects are subtasks of T354734: Coordinate Wikimedia's participation in Google Summer of Code 2024 and Outreachy Round 28, so that would be a good place to start.