Page MenuHomePhabricator

Don't consider future dates as valid publishing dates
Open, Needs TriagePublicBUG REPORT

Description

I propose to rule out future dates as valid publishing dates to minimize wrong results. The following examples I found in the german wikipedia should showcase the need for such a rule:

On this website a date out of the "Upcoming events" section seems to be recognized as publishing date. (see citeoid-API-request)
The page contains the following tag: <time class="..." datetime="2024-10-19">...</time>

This website provides a wrong date as meta tag, which is of coure not citoids fault, but could have been recognized by it. (see citeoid-API-request)
The page contains the following tag: <meta property="article:published_time" content="2024-08-24T11:23:47+03:00">
Citoid could have chosen <meta property="article:modified_time" content="2023-03-29T12:04:44+03:00"> as a fallback.

On this website the date is given in the format DD.MM.YYYY which if the format typical in Germany, but citoids interprets it wrongly as MM.DD.YYYY. (see citeoid-API-request)
The page contains the following tag: <meta name="DC.Date" content="08.03.2024">

On this website an upcoming event is recognized as publishing date. (see citeoid-API-request)
The page contains the following tag: <time class="..." datetime="2024-04-09">...</time> while the publishing date Originally published: 4 April 2023 is only given as text.

Thanks for your work.

Event Timeline

Can anyone tell me if I am even right here or if I should move this Issue to Zotero?

Zotero doesn't do much validation, normally we end up doing validation in citoid.

However, I'm not sure we should do it in this case. Sometimes things have a publication date but are released online early - I know it's nonsensical in reality, but the point of citation metadata is to positively identify the source, and if the "official" publication date is in the future, potentially it's the one we want.