Page MenuHomePhabricator

Proposal: Integrate Wikimedia Ecosystem within BUB2 tool
Open, Needs TriagePublic

Description

Proposal for https://phabricator.wikimedia.org/T346386

Profile

Name: Okereke Chinweotito
Email: chinweotitookereke@gmail.com
IRC nickname: Okereke
github: https://github.com/okerekechinweotito
Location: Nigeria
Typical Working Hours: 11am to 10pm UTC+1 hour

Synopsis

BUB2 tool helps community members upload books, newspapers, magazines, gazettes, etc. from public libraries such as Google Books, Panjab Digital Library, and Trove Digital Library to Internet Archive. BUB2 has been helpful for the Wikisource community to import texts from different public libraries to Wikisource. Currently, Wikisource volunteers are using the BUB2 tool to import texts to the Internet Archive and the IA Upload tool to import them back to Wikimedia Commons. The latter could be easily deprecated by integrating the Wikimedia ecosystem (Commons, Wikidata, and Wikisource) in the BUB2 tool so that volunteers can directly upload the texts to Wikisource and correctly cite them to Wikidata item.

Mentors
@wassan.anmol
@PMenon-WMF

Timeline

PeriodTask
Nov 21 to Dec 4Community bonding period.
Dec 04 to Dec 15Research and explore the Internet Archive API to understand how to retrieve .djvu files. Discuss with Mentors on how to approach the task, and also receive guidance on API links. Blog Post
Dec 16 to Jan 05Set up a polling mechanism in the bub2 codebase to periodically check for .djvu files on the Internet Archive. Implement the necessary logic to download the .djvu files from the Internet Archive using the polling mechanism. Blog Post Phase I evaluation
Jan 06 to Jan 26Implement the necessary authentication and authorization mechanisms to interact with the MediaWiki API. Integrate the MediaWiki API into the bub2 codebase to upload the downloaded .djvu files and their metadata to Commons.Blog Post
Jan 27 to Feb 16Integrate the Wikidata API into the bub2 codebase to cite the metadata correctly for the uploaded .djvu files so that it can be reused in Wikisource. Implement the necessary logic to link the uploaded files with relevant Wikidata items. Blog Post Phase II evaluation
Feb 17 to March 1Implement the necessary logic to automatically create pages on Wikisource using the uploaded .djvu files and their metadata. Test and debug the entire integration process to ensure smooth functionality.Phase III evaluation
March 1Mentors submit final student evaluations.

Deliverables

  • Set up a polling mechanism to get .djvu files from Internet Archive
  • Integrate MediaWiki API to upload the .djvu file and metadata to Commons
  • Integrate Wikidata API to cite the metadata correctly so that it can be reused in Wikisource and other places
  • Create pages on Wikisource automatically with the djvu file and metadata

About me

I am a Full-Stack Engineer (MERN stack) , I have previously worked as an intern at HNG and I have previously held frontend developer positions at IBCSCORP and CORESIGHT RESEARCH . I am currently mastering and polishing my backend skillset. In 2022 I graduated with a Bachelor of Science in Computer Science from Imo State University, Owerri Nigeria. I have always been fascinated by Wikimedia and how it operates having been a user for a long time myself, this inspired me to start contributing to this project.
I am a quick learner and a great team player and I look forward to having a great learning experience during this internship.

Microtasks completed/In progress

Issues

  • [[ T315134 | Add search bar in queue ]]
  • [[ T338267 | Use API:EmailUser to send Emails to the users ]]
  • [[ T344116 | Fix peer dependencies and remove deprecation warnings ]]
  • [[ T348188 | For PDL, download and stream the PDF if available ]]
  • [[ T348412 | Handle PDL library failures ]]
  • [[ T348413 | Redesign HTML template as per Codex design ]]
  • [[ T348600 | Title not being shown and placeholder text is getting cut ]]
  • [[ T349567 | QueueTable Footer Pagination inconsistent style ]]
  • [[ T348192 | Add max character limit while creating identifier in Internet Archive and remove some special characters ]]
  • [[ T348186 | Author is not being sent to Internet Archive for Google Books ]]