I propose to rule out future dates as valid publishing dates to minimize wrong results. The following examples I found in the german wikipedia should showcase the need for such a rule:
On this website a date out of the "Upcoming events" section seems to be recognized as publishing date. (see citeoid-API-request)
The page contains the following tag: <time class="..." datetime="2024-10-19">...</time>
This website provides a wrong date as meta tag, which is of coure not citoids fault, but could have been recognized by it. (see citeoid-API-request)
The page contains the following tag: <meta property="article:published_time" content="2024-08-24T11:23:47+03:00">
Citoid could have chosen <meta property="article:modified_time" content="2023-03-29T12:04:44+03:00"> as a fallback.
On this website the date is given in the format DD.MM.YYYY which if the format typical in Germany, but citoids interprets it wrongly as MM.DD.YYYY. (see citeoid-API-request)
The page contains the following tag: <meta name="DC.Date" content="08.03.2024">
On this website an upcoming event is recognized as publishing date. (see citeoid-API-request)
The page contains the following tag: <time class="..." datetime="2024-04-09">...</time> while the publishing date Originally published: 4 April 2023 is only given as text.
Thanks for your work.