Page MenuHomePhabricator

GSoC 2023 Proposal: Wikidocumentaries to import images from the web to Structured Data on Commons
Open, Needs TriagePublic


Profile Information

Name : Zexi Gong
Time zone : UTC+08:00
Github :
Location : China
Working hours : 8:00 am to 4:00 pm UTC+08:00


The project aims to establish a complete workflow for retrieving media related to a currently viewed topic in Wikidocumentaries from a given media repository and uploading it to Wikimedia Commons while adding structured data statements. The project includes following tasks:
• Develop or modify the API script for the intended media repository.
• Format the retrieved information and present properly in Wikidocumentaries.
• Enable the user to choose and upload images.
• Authenticate with Wikimedia Commons.
• Upload the selected media files to Wikimedia Commons and categorize them based on the available information.
• Generate Structured Data statements by utilizing the information obtained from both the corresponding Wikidata item and the original source.
The successful completion of this workflow will enable the creation of further tools to enrich the data of the uploaded content. In summary, this project intends to provide a more streamlined and user-friendly way for users to find and contribute open content to Wikimedia.

Mentors: @TuukkaH @Susannaanas


May 4 to May 28Community bonding period. Familiarize with APIs of different media repositories, image upload workflow and Wikimedia Commons authentication workflow. Determine the media repositories that we want to include in this project. Design the UI for image upload/ Wikimedia Commons authentication. Finish the ongoing microtasks.
May 29 to June 11Build the UI and the user flow to authenticate with Wikimedia Commons
June 12 to June 20Build the UI and the user flow to upload the chosen media files and categorize them using available information
June 20 to June 25Make a simple Structured Data statement with one image using information from the original source for the current media repositories ( + Wikimedia Commons).
June 22 to June 25Make Structured Data statements using information from the corresponding Wikidata item for the current media repositories ( + Wikimedia Commons). First version code complete.
June 26 to June 30Make a simple Structured Data statement using information from the corresponding Wikidata item for the current media repositories ( + Wikimedia Commons). First version code complete.
July 1 to July 10Testing Round 1: Do sanity testing on the first version. Write automation testing for API requests at both frontend and backend. Write related documentation. Fix bugs found in the testing.
July 10 to July 14Midterm evaluation.
July 15 to July 25Improve the first version of code, make the current UI more user-friendly, add more Structured Data statements from wikibase repositories.
July 26 to July 31Testing Round 2: Do sanity testing and exploratory testing on the second version. Write automation testing for new API requests at backend. Write related documentation. Fix bugs found in the testing.
August 1 to August 5Investigate further tools to enrich the data of the uploaded content, determine whether we want to enable the user to upload multiple images at one time.
August 6 to August 15Implement the tools and other improvements of the current version after the investigation
August 16 to August 20Clean up useless code in the codebase. Improve code quality in the codebase (e.g. centralize all api functions at the frontend, optimize the ImageViewer component) to increase readability and modifiability.
August 21 to August 28Final week: Submit final work product and final mentor evaluation. Freeze the code. Fix existing bugs. Write documentation and instructions.
August 28 to September 4Mentors submit final student evaluations.
September 5Initial results of Google Summer of Code 2023 announced


• Early design of the UI and backend architecture.
• UI for images selection.
• UI and user flow for authentication with Wikimedia Commons.
• UI and user flow for image upload.
• Structured Data statements for the current media repositories.
• First version for the current media repositories
• New automation tests
• Design of new backend architecture
Midterm evaluation
• API script with formatting and structured data statements for new media repositories
• Second version for more new media repositories
• New automation tests
• Further tools to enrich the data of the uploaded content
• Code cleanup and optimization
• Documentation and instructions
Final evaluation


• I will submit commits on wikidocumentaries-ui & wikidocumentaries-api on github. Code will be uploaded to the dev branch periodically and will be merged with the master branch once review and testing is done.
• I will be online in my working hours (8:00 am to 4:00 pm UTC+08:00) to collaborate with the mentors.
• I will use Phabricator for managing bugs and subtasks.
• I will be available in Gmail to be contacted when needed in the non-working hours.

About Me

I am currently pursuing a master’s degree in computer science at Northeastern University in San Francisco. During the GSoC summer period,I will be on my summer vacation and I will be fully committed to focusing on this project and can guarantee to work for at least 30 hours per week. Although this is my first time contributing to an open-source community, I am excited about the opportunity to take on this project and am prepared to invest the time, effort, and resources necessary to ensure its success. With my skills and expertise, I am confident that I am well-suited for the task at hand.

Past Experience

I am proficient in several programming languages including Python, Java, HTML, CSS, and JavaScript/TypeScript. I am interested in full-stack development, and my knowledge in this area has been furthered through the completion of a college project using Vue. In addition, I gained familiarity with MediaWiki API, Wikidata, and Structured Data on Commons while completing microtasks. My expertise also extends to data science and machine learning, with a minor in data science from the University of California, Berkeley.

Microtasks carried out
T330179: Image viewer for article images:
• Query all the images with links in the wikipedia article, suppress the original image click actions (open the wikipedia image page), and add the new click action
• Call the backend api to get image url and metadata for all images, extract the metadata we want from the data response, fill it into the imageviewer items list and open the image viewer that lists all images.
• Save the metadata locally at the first image click for future clicks.

Event Timeline

@TuukkaH @Susannaanas This is my first draft of GSoC 2023 proposal. Could you take a look and give me some comments or suggestions? Thanks!

Hi @Zexi_Gong, as the deadline for GSoC is quickly approaching in less than 48 hours (April 4th, 2023, 18:00 UTC), it's crucial that you submit your proposal on Phabricator and Google's program website in the recommended format as soon as possible. To avoid any potential last-minute rushes or server failures, we highly recommend that you submit your proposal early and keep updating it as needed before the deadline. Once you have submitted your proposal, please move it from the "Proposals in Progress" column to the "Proposals Submitted" column on the Phabricator workboard by simply dragging it. If you have any inquiries, please do not hesitate to ask. Good luck with your application!

Hi! Please consider resolving this task and moving any pending items to a new task, as GSoC/Outreachy rounds are now over, and this workboard will soon be archived.