Page MenuHomePhabricator

search authors for a specific death date year
Open, Needs TriagePublicFeature

Description

Feature summary

https://paulina.toolforge.org/ on its advanced search has a checkbox allowing the user to search only for public domain authors.
This feature adds granularity to the search, by allowing the user to seach for authors that died in a specific year, ie., whose works entered or will enter into the public domain in a specific year.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

I went to Paulina wanting to see a list of Portuguese authors that will enter the public domain next year. But I could also be going there to see a list of Portuguese authors that entered in the public domain this year (like we have in https://pt.wikipedia.org/wiki/Lista_de_autores_portugueses_que_entram_em_dom%C3%ADnio_p%C3%BAblico_em_2025 ).

Benefits (why should this be implemented?):

"Recently in public domain" or "soon to be in public domain" works and authors usually attract more interest - this would make Paulina more attractive/useful.

Related Objects

Event Timeline

Hi @Mind_Booster_Noori @Pepe_piton @Nat_WDU

I fixed this issue. Kindly have a look to check if I'm on the right track.

Here is the link to merge request: https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/166

I encountered errors while trying to fix this issue. It was taking a lot of time to load. So I had to optimize the performance by doing a couple of things:

The first, replacing YEAR() function with date range filtering
Before: FILTER(YEAR(?deathDate) = {death_year})
After: FILTER(?deathDate >= "{death_year}-01-01T00:00:00Z"^^xsd:dateTime && ?deathDate < "{death_year_plus_one}-01-01T00:00:00Z"^^xsd:dateTime)
Date range comparisons can use indexes; YEAR() requires computing the year for every date

The second was changing from GET to POST method
Before: SPARQL queries sent via GET with URL encoding
After: SPARQL queries sent via POST in request body

This avoids URL length limits and URL encoding issues
Also, POST is the recommended method per SPARQL 1.1

Then, the query structure optimization
Before: Date filter applied after fetching all data
After: Date filter applied early in the query
This filters reduce the dataset before expensive operations

I also replaced SERVICE wikibase:label with explicit label fetching
Before: SERVICE wikibase:label (was returning Q-numbers as fallbacks)
After: Explicit ?author rdfs:label ?authorLabel with language filtering
Basically, direct label fetching is more efficient and avoids the SERVICE overhead

The main improvement comes from using date range comparisons instead of the YEAR() function, which can leverage database indexes and significantly reduces computation time.

The next thing I'll be fixing is the user experience, to add a check for authors that entered the public domain the previous year, and those who will enter the public domain next year.

I also need to make the input field more intuitive, instead of using a dropdown, it should use a type to select.

@Mind_Booster_Noori Thank you for your suggestion about this important feature, and @Oluwatumininu.m @System625 thank you for your contributions! Date of birth and date of death filters are something I've been considering for quite some time, but until now I haven't been able to implement it precisely for performance reasons. Since searches with "haswbstatement" using the Action API don't allow the "date" data type, we have to use SPARQL, which significantly impacts performance. Some weeks ago I left a comment on an old ticket about this, asking for the date data type to be added to haswbstatement: T238498

I'm very interested in studying the solutions you propose. Using date ranges in the SPARQL query, instead of the YEAR function, is highly recommended to save a few seconds, and it's something I already had in mind. But I'd like to take the time to review the details of your solutions and test them in different cases.

The most important case I'm interested in testing is searching for authors leaving the search box blank, applying the nationality filter for a country with many author entries in Wikidata (for example, the United States, France, the UK), and then applying the "year of death" filter. This use case is the one raised in the ticket and I'm sure it's quite common, since many people want to obtain lists of authors from a country rather than searching for specific names of authors. But it's a case where performance can be particularly affected.

@Mind_Booster_Noori Thank you for your suggestion about this important feature, and @Oluwatumininu.m @System625 thank you for your contributions! Date of birth and date of death filters are something I've been considering for quite some time, but until now I haven't been able to implement it precisely for performance reasons. Since searches with "haswbstatement" using the Action API don't allow the "date" data type, we have to use SPARQL, which significantly impacts performance. Some weeks ago I left a comment on an old ticket about this, asking for the date data type to be added to haswbstatement: T238498

I'm very interested in studying the solutions you propose. Using date ranges in the SPARQL query, instead of the YEAR function, is highly recommended to save a few seconds, and it's something I already had in mind. But I'd like to take the time to review the details of your solutions and test them in different cases.

The most important case I'm interested in testing is searching for authors leaving the search box blank, applying the nationality filter for a country with many author entries in Wikidata (for example, the United States, France, the UK), and then applying the "year of death" filter. This use case is the one raised in the ticket and I'm sure it's quite common, since many people want to obtain lists of authors from a country rather than searching for specific names of authors. But it's a case where performance can be particularly affected.

Thank you very much.

In a situation where users wants to pull a list of authors in a country within a specific year, will the date range still work?