- Title of session: Improving public domain information with Wikidata and Paulina
- Session description: In this session we will talk about the Paulina tool and the project "Strengthening the Latin American public domain with Wikidata".
Paulina is a Wikidata-based tool, developed in Python, that provides a user-friendly search interface to discover authors, identify the copyright status of their works in different jurisdictions, and access public domain works from Wikisource, Commons and other sites. It's available at: https://paulina.toolforge.org/ The code repository is available at: https://gitlab.wikimedia.org/toolforge-repos/paulina
The tool is part of a larger project to improve data about the public domain in Wikidata and provide a user-friendly search and query interface. Among other things, the project seeks to increase the representation of data about authors and cultural works from the Global South.
The project also seeks to promote the creation of national, regional or thematic implementations of the Paulina tool, which allow the focus to be placed on the needs of each community. A first local adaptation of the tool is being developed in Uruguay. A very preliminary version already exists at: https://dominiopublico.uy/
The aim of the session is to share our work, obtain feedback to improve the tool, call for volunteers who want to collaborate with code, and also explore other types of collaboration.
- Username for contact: Pepe_piton
- Session duration (up to 90min): 40 min
- Session type (presentation, workshop, discussion, etc.): presentation
- Language of session (English, Arabic, etc.): English
- Prerequisites (some Python, etc.): Interest in Wikidata, Python and/or GLAM
Wikimedia Hackathon 2025 - Istanbul, Turkey
Etherpad link: https://etherpad.wikimedia.org/p/WMHack25__Public_Domain_Info_Wikidata
Notes
*INTRO Paulina: Wikidata, GLAM y tecnología en la región
- Es una herramienta basada en Wikidata, se llama Paulina por una feminista uruguaya, basada en datos de Wikidata que permiten identificar estado de derecho, obras de dominio público, ambientado a proyectos GLAM, docentes, interesados en patrimonio cultural digital
*The problem we address: The enormous complexity of copuright is a barrier to identifying and accessing works in the public domain.
Very long copyright terms
Lack of clear and reliable data about the public domain in many countries
**This problem affects GLAM institutions as well as users
**¿Cómo podemos usar Wikidata para resolver este problema?
- Las ventajas de Wikidata pueden ser útiles porque es una base de datos global, es colaborativa, está conectada a otras bases de datos
- No importa el tema de interés o la especialización, se puede estar colaborando
**Wikidata has some limitations.
- It is general purpose, and not specifically focused on public domain
- The inteface is not designed to access the information about the public domain
- Wikidata has huge information gaps about cultural heritage, especially from the Global South. We can't be illusioned that everything is in Wikidata. A lot of work is required to continue inputting data about cultural heritage in Latin America, Africa, etc.
**We can't approach this problem only from a technology POV. Requires people knowing where to find information about cultural heritage from their region. Paulina tries to help other projects make visible other efforts about cultural heritage in the public domain.
- Paulina tries to help overcome the Wikidata limitations described above. Features include: You can discover authors, works, and learn about te public domain. It's organized by countries with different copyright terms. There are search filters for advanced search (we've been improving them at this Hackathon!). Working on making filter-only based search. There are also author profiles which have information about copyright clearance and tries to calculate if the author is in the public domain by looking at the property in Wikidata, but also does math based on date of birth if that doesn't exist. You can also query works by that author. There are profiles about each work as well - data from Wikidata which is organized in a useful way. Also adds identifiers from Internet Archive, Project Gutenburg, etc to find the full work. The profile also has a link to edit the Wikidata item. Paulina has a multilingual interface - we have invited people to translate the app into more languages, right now there are 10.
- Technologies used are Wikidata APIs, toolforge, python, flask, and babel.
- Methods used to access Wikidata's data are:
- MediaWiki Action API Wikibase REST API Wikidata Query service for complex queries
- At the hackathon, we're working on advanced search improvements, improving the design, redesigning the country page with new statistics and exploration filters, and creating a staging environment to improve collaboration workflow. Also want to work on translating the app into more languages, improving copyright status information about works and authors in Wikidata, and allowing editing of Wikidata directly from Paulina, e.g. when you detect vandalism through Paulina. Right now it sends you back to Wikidata to do that, but ideally it would have an interface to edit Wikidata directly. But we're not sure if that's important to develop right now, because Paulina can interact with and complement other tools instead.
- How to collaborate:
- contribute code on gitlab, request a feature or report a bug on phabc, or help translate the tool (see links in the slides). Please tell us what you need!
- Collaboration beyond Paulina:
- Latina American affiliates and volunteers are collaborating to strengthen the public domain with Wikidata. We accepted a grant to do training and campaigns within the LatAm region.
- We are part of wikiproject dominio public en america latina, where we share tips about copyright status, and which identifiers are useful from different countries in the region
- project: strengthening thel atin american public domain with wikidata
- Paulina is just one piece within this broader effort.
- Interested in knowing what you're doing with Paulina and GLAM and how Paulina can help
Questions
- What is the feedback you got from GLAM community in Uruguay?
- Last year we did some training sessions with people from the University in Uruguay, students and some professionals including librarians. They liked it and made a lot of suggestions. There are tradeoffs between making it accessible for a wider user (librarians) because they have very specific needs. We improved some features through their feedback. Generally they are open to this. They were so enthusiastic - it's a great entry point for them to have an incentive to edit Wikidata. They immediately see how helpful it is input the data without the need to know how to make a query in the query service. It's something they can understand and share with others.
- What sparked your interest to come to the session?
- This looks great, but it's also duplicating wikisource and author and index pages. I'm curious - you're developing this as a separate tool, which means people often won't see it. Is there a reason you've chosen to do that versus embedding this information on Wikisource? What's the motivation behind doing it off wiki?
- The idea is that this tool facilitates making very common queries. Applying filters in the search, making queries e.g. countries page for statistics about authors in different fields. We are taking works from wikisource but also other sources. In some ways at overlaps. But the important thing is it's a tool for checking the public domain status, and for doing that you have all works, all authors - those who are in public domain, those who are not, those who are entering soon. In Wikisource you have public domain (and open content) only. E.g. I have a digitization project and need to check public domain status in order to upload or transcribe this content for Wikisource in the future, or check if it's possible to digitize, or visualize the gaps in Wikisource. It facilitates research for contributing to wiksource or commons. Paulina gives you an overview for a specific author, country, or intersection between these filters. It's a tool for contributing to the projects and has the link so you can access easily wikisource, wikipedia, related projects. In a few words, the main difference is trying to leverage the wikidata possibilities.
- I'd love to see lists like this directly on Wikisource. You're doing really nice work that would feed into that. Would love it if you think about, could you do this on Wiki, and what are the blockers?
- The modeling of each jurisdiction is really complex, and there's no public software that does that. Once we model that, maybe we can take it to wikisource. It's a hard problem.
- I'm not a heavy wikisource user, more a wikdata user. First thing we thought is working on wikdata in the wikimedia ecosystem. You have a case that it would be useful to integrate with wikisource.
- I have searched on Paulina and found information that I could not find on Wikisource. If you don't know what you're looking for, Paulina really helps on giving you links and much more granular information, which helps you refine your research and resources.
- The idea is that this tool facilitates making very common queries. Applying filters in the search, making queries e.g. countries page for statistics about authors in different fields. We are taking works from wikisource but also other sources. In some ways at overlaps. But the important thing is it's a tool for checking the public domain status, and for doing that you have all works, all authors - those who are in public domain, those who are not, those who are entering soon. In Wikisource you have public domain (and open content) only. E.g. I have a digitization project and need to check public domain status in order to upload or transcribe this content for Wikisource in the future, or check if it's possible to digitize, or visualize the gaps in Wikisource. It facilitates research for contributing to wiksource or commons. Paulina gives you an overview for a specific author, country, or intersection between these filters. It's a tool for contributing to the projects and has the link so you can access easily wikisource, wikipedia, related projects. In a few words, the main difference is trying to leverage the wikidata possibilities.
- How do you know if the author is in the public domain, since it's different for some countries?
- It's complex. First have to differentiate copyright status of works and copyright status of authors. Wikidata has properties for both and it has qualifiers e.g. copyright status is public domain, with qualifier of countries with 70 years post-mortem of the author or less. Then if you go to the item of those countries you have the list of countries. We've been working on modeling that (completing the model based on previous work). Copyright terms of each country classified in not just the specific term but also e.g. 70 years or less. There's the possibility to model this on Wikidata, but there are huge gaps. For authors where it's simpler to do the math, we first ask Wikidata if it has the property, but if it has the date of death, we also have a simple calculator. Ideally we're trying to work on Wikidata so the information is available.