(GSoC 2024) Scribe-Data: Refactor into a multi purpose Wikidata language pack CLI tool
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	AndrewTavis
	Feb 20 2024, 11:37 PM

Description

Overview

This is one of two projects for Google Summer of Code 2024 for the Scribe organization that makes keyboard applications for second language learners using Wikidata and other Wikimedia projects as a basis for the data. The goal of Scribe is to provide everything that a user needs to help them with their second language in any app without them needing to leave their keyboard to look up grammar or other needed information. One focuses on Scribe-iOS (T358063), and the other on Scribe-Data (T358064).

We are a Wikimedia Project for New Developers and would love to work with you!

Project Goals

Scribe-Data is our Wikidata lexemes data process that gets the data for the other applications that the Scribe team makes
As of now the process is specific to Scribe applications
We want to turn this into a general service that anyone can use to easily get language data from Wikidata
- For example: all German verbs in various formats, all French nouns...
- Etc for all languages and word types...
The main task is devising how best to achieve this and make changes so that the project can have a more broad usability to the greater Wikimedia/open-source community
Once that's done, more Wikidata query and formatting processes will be written to expand the reach of the service to non-Scribe languages
There's also the potential to work on deploying the service via Docker as well as developing the test suite for Scribe-Data
This new version of the data process will be used to make API calls later on via our in development Scribe-Server, with the student being welcome to work with us on this after GSoC

Project Specifics

Size: 350 hours
Rating: Intermediate
Languages: Python, SPARQL query language, maybe Docker
What you'll learn: Wikidata, Wikidata Query Service, ETL processes, maybe docker development
Prerequisites: prior experience with Python and data analytics is a plus

Mentors

Primary: Will Yoshida (@wkyoshida)
Secondary: Andrew McAllister (@AndrewTavis)
Tertiary: Henrik Thomasson (@Henrikt93)

Community

All issues related to this project will be tracked on GitHub. Please join our community Matrix spaces to chat with the team and learn more about Scribe! Specifically we have a room for GSoC and another for data processes. During the program your mentors will be happy to communicate with you on GitHub or via Matrix. You'll also be invited to the Scribe bi-weekly developer calls where you'll have time to present your progress and work with the team on any problems. Calls and checkins outside of the syncs can also happen if needed :)

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		Maryann-Onyinye	T354734 Coordinate Wikimedia's participation in Google Summer of Code 2024 and Outreachy Round 28
Open		None	T358064 (GSoC 2024) Scribe-Data: Refactor into a multi purpose Wikidata language pack CLI tool
Open		Jacob4code	T358527 GSoC '24 Proposal: Refactor Scribe-Data into a multi purpose Wikidata language pack CLI tool
Open		Linfye	T361417 [Proposal] Srribe-Data: A Wikidata language pack CLI tool
Duplicate	Feature	None	T361433 Refactor into a multi purpose Wikidata language pack CLI tool
Open	Feature	Evads0	T361441 Project Proposal: Multi-Purpose Wikidata Language Pack CLI Tool for Scribe-Data
Open		Mhmohona	T361464 GSoC'24 Proposal - Scribe-Data: Refactor into a Multi-Purpose Wikidata Language Pack CLI Tool
Open		Shashankmittaliitbhu	T361474 (Project Proposal): Refactor Scribe-Data into a multi purpose Wikidata language pack CLI tool

Event Timeline

AndrewTavis created this task.Feb 20 2024, 11:37 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 20 2024, 11:37 PM

AndrewTavis mentioned this in T358063: (GSoC 2024) Scribe-iOS: Add multilingual translation and internationalized interfaces.Feb 20 2024, 11:38 PM

AndrewTavis updated the task description. (Show Details)

AndrewTavis added a parent task: T354734: Coordinate Wikimedia's participation in Google Summer of Code 2024 and Outreachy Round 28.Feb 20 2024, 11:40 PM

AndrewTavis updated the task description. (Show Details)Feb 20 2024, 11:58 PM

Jacob4code subscribed.Feb 21 2024, 5:18 PM

Maryann-Onyinye moved this task from Backlog to Project Proposals on the Google-Summer-of-Code (2024) board.Feb 22 2024, 11:09 AM

Hi @AndrewTavis It sounds like a great project and is perfect for GSoC!

I have added this project to our GSOC 2024 Media Wiki page: https://www.mediawiki.org/wiki/Google_Summer_of_Code/2024#Ideas_for_projects

Kindly share your project via Wikitech by replying to this thread: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/Y7PRNX3SMKLTT6ABLGYADTLT2NQ7MKJE/

If this is your first time mentoring via GSoC, I recommend reviewing this guide for mentors: https://www.mediawiki.org/wiki/Google_Summer_of_Code/Mentors. Additionally, I'll add you to our Zulip chat where you can connect with fellow mentors for ongoing support and collaboration.

Hi, Can I comment here? I'm a beginner in Gsoc and I am very interested in this project. Now I am working on a good-first-issue on the Scribe-Data repository. After finishing it, Can I work on the proposal and can you guide me? Thanks! @AndrewTavis

Mhmohona subscribed.Feb 26 2024, 11:07 AM

Hi @Linfye 👋 Thanks for writing and your interest in working with us on Scribe-Data. Great to have you on Matrix as well. Feel free to write there, GitHub or here in this Phabricator task as needed :)

Happy to support you with working on the issue you picked out. That would be the appropriate one to start with! Kindly write back in the GitHub issue with a bit more information about your background, and then from there we can know better how to support. Specifically you're looking at a Python issue, so let us know what your experience is, and as it's machine translation let us know what you've done that might be similar. If you haven't had any experiences with all this, no stress - happy to write out some pseudocode to help!

Beyond this, happy to support on the proposal! Let's get one or two Scribe-Data issues done so the team and I have a better understanding of your skills, and then we can schedule some time to work on the proposal together 😊

Kindly share your project via Wikitech by replying to this thread: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/Y7PRNX3SMKLTT6ABLGYADTLT2NQ7MKJE/

Hi @Maryann-Onyinye 👋 I included both projects in one email to the thread mentioned above :) Thanks so much!

Shashankmittaliitbhu subscribed.Mar 3 2024, 12:17 PM

Hi @AndrewTavis, I am a beginner in GSOC and I am interested in this project. I added information about my background to the good-first issue I'm looking to work on in the Scribe-data repository.

Hi @AndrewTavis , I am a python and django developer , this is my current stack in the company I intern at and I am very familiar with python programming , would love to learn more about wikidata while working on this task.

Great to have so much interest in the project, and welcome to all! Please find a good first issue, which for a lot of people are the machine translation issues. We can work on these a bit later once one version of it has been completed so that everyone is producing something consistent :)

Please reach out here or on Matrix via the links in the task description if you need help! 😊

Hi @AndrewTavis

It's my first time participating GSOC, I mostly work with Python and JavaScript and I would love to work, learn and collaborate on this project for Scribe-Data.

As you mentioned above I am looking into Scribe-data's github repo for any issues I could work on! 🥰

P.S. I wanted to ask how do i join wikimedia for this year's GSOC and what are the best practices I could do that would maximise my chances of getting selected in by wikimedia for this year's GSOC

with love,
Hasan :)

Hey @HasanCoder 👋

Let us know in the issue when you find one you want to work on!

Big thing for GSoC is that you should check other projects from Wikimedia and apply to those as well. You can do multiple GSoC applications, which increases your chances of being accepted :) All the Wikimedia projects are subtasks of T354734: Coordinate Wikimedia's participation in Google Summer of Code 2024 and Outreachy Round 28, so that would be a good place to start.

Jacob4code added a parent task: T358527: GSoC '24 Proposal: Refactor Scribe-Data into a multi purpose Wikidata language pack CLI tool.Mar 29 2024, 6:43 PM

AndrewTavis removed a parent task: T358527: GSoC '24 Proposal: Refactor Scribe-Data into a multi purpose Wikidata language pack CLI tool.Mar 29 2024, 7:03 PM

AndrewTavis added a subtask: T358527: GSoC '24 Proposal: Refactor Scribe-Data into a multi purpose Wikidata language pack CLI tool.

AndrewTavis moved this task from Incoming to In progress on the affects-scribe-org board.Mar 29 2024, 7:49 PM

Linfye added a subtask: T361417: [Proposal] Srribe-Data: A Wikidata language pack CLI tool.Mar 30 2024, 1:34 AM

Evads0 mentioned this in T361433: Refactor into a multi purpose Wikidata language pack CLI tool.Mar 30 2024, 7:35 PM

AndrewTavis added a subtask: T361433: Refactor into a multi purpose Wikidata language pack CLI tool.Mar 31 2024, 12:19 PM

Evads0 mentioned this in T361441: Project Proposal: Multi-Purpose Wikidata Language Pack CLI Tool for Scribe-Data.Mar 31 2024, 3:03 PM

AndrewTavis added a subtask: T361441: Project Proposal: Multi-Purpose Wikidata Language Pack CLI Tool for Scribe-Data.Mar 31 2024, 3:29 PM

AndrewTavis closed subtask T361433: Refactor into a multi purpose Wikidata language pack CLI tool as Invalid.

Shashankmittaliitbhu added a subtask: T361474: (Project Proposal): Refactor Scribe-Data into a multi purpose Wikidata language pack CLI tool.Apr 1 2024, 2:21 PM