Page MenuHomePhabricator

Wikidocumentaries to import images from the web to Structured Data on Commons
Open, Needs TriagePublic

Assigned To
None
Authored By
Susannaanas
Feb 7 2023, 10:17 AM
Referenced Files
Restricted File
Feb 9 2023, 1:58 PM
Restricted File
Feb 9 2023, 1:49 PM
F36817395: Arkistokuvia (4).png
Feb 9 2023, 10:41 AM
F36817396: Arkistokuvia (1).png
Feb 9 2023, 10:41 AM
F36817397: Arkistokuvia (5).png
Feb 9 2023, 10:41 AM
F36802439: Arkistokuvia.png
Feb 8 2023, 6:42 AM

Description

IMPORTANT: Make sure to read the GSoC participant instructions and communication guidelines thoroughly before commenting on this task. This space is for project-specific questions, so avoid asking questions about getting started, setting up Gerrit, etc. When in doubt, ask your question on Zulip first!

Brief summary

Wikidocumentaries is a website aggregating Wikimedia content and integrating it with content from other open media repositories. It provides a language-independent way of browsing Wikimedia projects based on Wikidata items. The idea of Wikidocumentaries is to allow the users to find relevant open content and contribute it to the Wikimedia projects by using the content for their purposes. The name of the project, Wikidocumentaries, refers to media compilations that the project will eventually allow the users to create from the materials they find.

The goal of the GSoC project is to establish the entire process for retrieving media from a given media repository related to the currently viewed topic in Wikidocumentaries and uploading it to Wikimedia Commons, adding structured data statements to it.

  • Create or update the API script for the desired media repository.
  • Format the retrieved information so that it can be displayed in Wikidocumentaries.
  • Allow the user to select images they want to upload.
  • Authenticate with Wikimedia Commons.
  • Upload the chosen media files and categorize them using available information.
  • Make Structured Data statements using information from both the corresponding Wikidata item and the original source.

When this workflow has been completed, it will be possible to make available further tools to enrich the data of the uploaded content. It is possible to expand the work to some such tool, based on the interests of the intern.

Skills required

The UI code is created with Vue, and the API code is JavaScript. The work focuses on Structured Data on Commons, therefore understanding of the MediaWiki API, Wikidata and Structured Data on Commons is needed.

Possible mentor(s)

TuukkaH, Susannaanas

Links

Microtasks

We are adding suitable microtasks to T329256.

Event Timeline

It sounds like a great project and is perfect for GSoC! We ran out of slots for Outreachy this year, but with GSoC, there is no limit. The only thing that I'd like to ensure is that there is at least one mentor in your team with a technical background. It would also be helpful to clarify more what you mean by "external media repositories". Could you share some examples perhaps? Same for the intended workflow, could you give one example of reading media, contributing to Commons, and then pulling it back in Wikidocumentaries (if I got the flow right)? A bit more visual explanation might be helpful here.

After your conversation w/ Platform Product folks, as shared in the email:

If it is your first time mentoring via GSoC, you could read through this guide https://www.mediawiki.org/wiki/Google_Summer_of_Code/Mentors :) Once we have a final list and we get accepted as an organization, I'll add you to our Zulip chat w/ fellow mentors for continuous support.

Thank you for the encouraging feedback!

@TuukkaH is the lead developer for the project, and he has agreed to mentor the intern. In addition, we are seeking to have a support network to assist with questions related to the MediaWiki platform and the Wikimedia Cloud environment.

In regard to the workflow:

There is existing code that reads openly licensed media for a given Wikidata item from open repositories through their open APIs and formats the input to be displayed on Wikidocumentaries. We have kept the code for the Finnish national aggregator for GLAM materials Finna most up-to-date, but the same scenario applies to other similar repositories. These could be for example Europeana, Internet Archive, Flickr and many others around the world. There are several aggregators of open content that consolidate input from the different sources in the same way, and they could be used instead or additionally.

When the images are presented in a Wikidocumentaries page for a given topic, we have the data about the topic from Wikidata and the metadata for the image from the institution. In the simplest scenario, the user chooses an image that correctly represents the topic they are browsing, and uploads the image to Wikimedia Commons in a single click. There are many false hits in the results, so it needs to be taken into account.

Based on the data available from Wikidata, the image can be placed in a correct category in Wikimedia Commons and Structured Data on Commons statements can be added to it, at minimum the "depicts" statement stating that the image depicts the Wikidata topic in question, the same topic that is the topic of the Wikidocumentaries page.

Once the image is in Wikimedia Commons, the image metadata can be manipulated and enriched in many ways. If there's time and based on the intern's interests, some of these opportunities can be explored.

The images below display the simple user interaction in the workflow. It displays the image section for images that have been read from repositories outside Wikimedia Commons.

{F36817569}

Arkistokuvia (4).png (440×961 px, 583 KB)

Arkistokuvia (5).png (440×961 px, 609 KB)

{F36817580}

This YouTube video was made for WikidataCon 2021: https://www.youtube.com/watch?v=RVymVs6Avek&t=1s

@Susannaanas
Hello, I am Bessong and Interested in contributing on this project for GSoC.
I absolutely want to be part of this community and have setup a development environment for the project, So please I need someone guide to participate?.

@Juniorbesong8: Hi and thank you for your interest! Please check the red box at the top of this page and thoroughly read all its links. Thanks a lot!

Hi, i am Naomi and i am interested in this project for GSoC
I have my environment set up for the development of this project and i really do wish to be part of this community. so i will really wish to be guided

I plan to work on this project at the Wikimedia-Hackathon-2023 and it seems relevant here: T335910: Improve Structured Data on Commons data models (and their documentation) as a basis for Lua Infobox templates

Basically, I want to help improve data modeling conventions on Commons so that batch upload (with combined SDC and Wikitext) becomes more simple, benefitting any batch tool/process.

Feedback will be very welcome!

At this moment, I'd be interested to see which types of images / files would be typically (most often) be uploaded to Commons, so that data modeling for these can perhaps be prioritized. I assume (looking at the screenshots above) that's mainly historical photographs?

Here are some outstanding issues that we have detected. They can be any kind of observations, not necessarily something that the internship was meant to cover.

  • All of the metadata can be displayed in the upload popup.
  • Newly added and missing information must be added to SDC.
  • Style fixes needed: The modal disappears when focus is lost. The modal from the ImageViewer inherits some styles from the containing menu (white text, right-align). The disappearing modal is caused by a similar issue.
  • Text editing needs to be made possible for text fields
  • Original value from the institution also added to Wikidata as a qualifier

T346117

  • The value of the page topic might be best suited for depicts / creator / location. The user should be able to choose.
  • The value of the page topic may be wrong. The user should be able to remove or change that. The workflow and interface needs to be xplored, many options.

Hi! Please consider resolving this task and moving any pending items to a new task, as GSoC/Outreachy rounds are now over, and this workboard will soon be archived.