Page MenuHomePhabricator

Internationalise citoid dates
Open, HighPublic1 Estimated Story Points

Description

  • Put out years in year only format (i.e. YYY)
  • Put out all dates in a readable format (i.e. May 2010) in the date field to address the polluted data issue ASAP.

This is a possible way forward for internationalising dates:

  • Translate dates on our end.

OR

Note: Discussion is also happening on ENWP here: https://en.wikipedia.org/wiki/Help_talk:Citation_Style_1#ISBNs_in_mw:Citoid

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Mvolz raised the priority of this task from Low to High.May 12 2017, 11:09 PM
This comment was removed by Mvolz.

Come on over to https://en.wikipedia.org/wiki/Help_talk:Citation_Style_1 and start a conversation. It can work: @Whatamidoing-WMF has been consistent in engaging with en.WP about Tidy going away, coming back with updates, and that has built some trust, at least within a small community of gnomes. Efforts like that will go a long way toward improving the relationship between en.WP and WMF.

If you come over and explain what you would like to do and make a reasoned case for your proposed method, that will work a lot better than deploying a tool that has a known bug that delivers incorrect data to citations.

Re: "A local template will be able to do a better job of making these dates the right style for the wiki, better than a random nodejs library will do anyway. ": This is not actually true, because (at least on English Wikipedia) multiple date styles are allowed, and "the right style" is the style that is consistent with what has already been used for the other references in the article. The local templates do not generally have this information, any more than citoid does (because it is outside the template parameters and often not recorded with the special "use dmy dates" templates).. So the way to make the dates the right style is to ask the user which style to use before inserting them, rather than passing the job off to some other software that doesn't have any better idea than citoid what to do. Jonesey's suggestion above of a radio button would work.

The WMF should provide any of its employees working on this an official copy of ISO 8601 and they should be required to read it. Among other things, they would find that the only correct way to represent the current year is "2017". "2017-00" or "2017-00-00" are just wrong. Likewise, the only correct ways to represent the current year and month are "201705" or "2017-05"; "2017-05-00" is wrong.

Also understand that IS0 8601 in this situation is a one-way protocol. Citoid can produce it internally and use it to produce a cite on Wikipedia, but dates can't go from Wikipedia to other places in ISO 8601 format because Wikipedia contains many Julian calendar dates, and Julian calendar dates are not allowed in ISO 8601. Since ISBNs and that ilk postdate the replacement of the Julian calendar by the Gregorian (c. 1923 or earlier) we can expect the dates in these databases to be Gregorian.

Citation templates can theoretically produce the correct style from these types of data. But it would require a lot of extra work from the template.

The work's (mostly) done (at enwiki and any wiki that's got a semi-current copy of enwiki's templates). The |df= parameter formats ISO into whatever's wanted for an article.

@Trappist_the_monk, do you have any objections to having CS1 support all of the ISO formats? If not, then citoid could report 2017 (which CS1 handles now) for books published this year, and 2017-05 (which CS1 dislikes) for journals published this month, and the template can re-format them however the MOS desires.

@Trappist_the_monk, do you have any objections to having CS1 support all of the ISO formats?

It is not for me to object; I have no power there. However, en.wiki Manual of Style does object. The date validation in cs1|2 tries to adhere to what WP:MOS allows. When WP:MOS permits other forms of year initial numeric dates, cs1|2 will support them.

@Trappist_the_monk, do you have any objections to having CS1 support all of the ISO formats?

It is not for me to object; I have no power there. However, en.wiki Manual of Style does object. The date validation in cs1|2 tries to adhere to what WP:MOS allows. When WP:MOS permits other forms of year initial numeric dates, cs1|2 will support them.

If CS1 accepted the dates and rendered them as written (i.e. 2007-05), then yes, this would violate the manual of style. I don't think anyone is suggesting that.

What I was suggesting is to have the template accept the 2007-05 and display this as "May 2007" on en wiki which would not violate the manual of style; But would require the template to do some extra work.

This is probably the wrong venue for talking about changes to en.WP's MOS or CS1 templates, but the short version is that the "YYYY-MM" format is discouraged because of ambiguity. If an editor (or script) inputs "2004-05" to mean "2004–2005", the template rejects that ambiguous date format rather than convert it to "May 2004".

This is probably the wrong venue for talking about changes to en.WP's MOS or CS1 templates, but the short version is that the "YYYY-MM" format is discouraged because of ambiguity. If an editor (or script) inputs "2004-05" to mean "2004–2005", the template rejects that ambiguous date format rather than convert it to "May 2004".

Thanks, that's helpful. That suggests that maybe the 00-ed out version that wikidata uses i.e. 2007-05-00 might preferable since there's no way to confuse that with a date range.

... have the template accept the 2007-05 and display this as "May 2007" ...

Changing your example from 2007-05 to 2007-08 makes the latter a form that is almost correct for year ranges; should be 2007–08 with an en dash. I suspect that it is because of this permitted year range that yyyy-mm date forms are not permitted.

In general cs1|2 do not attempt to transform the content of their parameters. Exceptions to that general rule are the automatic conversion of hyphens to en dashes in the page and date parameters. This has caused some antagonism because hyphenated page numbers are perfectly legitimate.

I can imagine a date format that is intentionally intended to be transformed. For example, we might use the correct iso8601 form |date=200708 which cs1|2 could transform to August 2007. Additionally, this standard iso8601 form will support date ranges: |date=200708/200709 which cs1|2 could transform to August–September 2007.

But, this form still doesn't answer the issue that DavidEppstein mentioned at T132308#3258883:

... For instance doi:10.13110/discourse.37.1-2.0003 has an actual date of "Winter/Spring 2015", ...

As far as I know, iso8601 doesn't support seasonal or quarterly dates, nor does it support proper noun dates (Christmas 2015). While I would prefer a solution that adheres to some known standard, perhaps that's not possible. We might 'extend' iso8601 as our own 'standard' for these dates that aren't iso8601 compliant. Perhaps |date=2015.Winter/2015.Spring becomes Winter–Spring 2015; |date=2015.Christmas becomes Christmas 2015. cs1|2 does not support quarterly dates because MOSDATE is mute but I can imagine |date=2015.Q2 rendering as Second Quarter 2015.

Change 353706 had a related patch set uploaded (by Mvolz; owner: Marielle Volz):
[mediawiki/services/citoid@master] If only year is provided, only put year in date field

https://gerrit.wikimedia.org/r/353706

... have the template accept the 2007-05 and display this as "May 2007" ...

Changing your example from 2007-05 to 2007-08 makes the latter a form that is almost correct for year ranges; should be 2007–08 with an en dash. I suspect that it is because of this permitted year range that yyyy-mm date forms are not permitted.

In general cs1|2 do not attempt to transform the content of their parameters. Exceptions to that general rule are the automatic conversion of hyphens to en dashes in the page and date parameters. This has caused some antagonism because hyphenated page numbers are perfectly legitimate.

I can imagine a date format that is intentionally intended to be transformed. For example, we might use the correct iso8601 form |date=200708 which cs1|2 could transform to August 2007. Additionally, this standard iso8601 form will support date ranges: |date=200708/200709 which cs1|2 could transform to August–September 2007.

But, this form still doesn't answer the issue that DavidEppstein mentioned at T132308#3258883:

... For instance doi:10.13110/discourse.37.1-2.0003 has an actual date of "Winter/Spring 2015", ...

As far as I know, iso8601 doesn't support seasonal or quarterly dates, nor does it support proper noun dates (Christmas 2015). While I would prefer a solution that adheres to some known standard, perhaps that's not possible. We might 'extend' iso8601 as our own 'standard' for these dates that aren't iso8601 compliant. Perhaps |date=2015.Winter/2015.Spring becomes Winter–Spring 2015; |date=2015.Christmas becomes Christmas 2015. cs1|2 does not support quarterly dates because MOSDATE is mute but I can imagine |date=2015.Q2 rendering as Second Quarter 2015.

These are all great suggestions. I think in iso8601 it is not valid to leave out the dashes except when the time is also included as well though.

Worth noting that if you try this in wikidata, i.e. "Fall 2003" you will get "date is malformed..." so they haven't figured it out either. Since iso8601 allows ranges, I can imagine us translating say, "Fall 2003" to be Sept-Nov 2003, which would be 200709/200711 instead of having the special formats. And converting Christmas to December 25th :) (which we could actually do right now with the current set-up actually.) And quarter 1 is Jan-March?

Thanks, that's helpful. That suggests that maybe the 00-ed out version that wikidata uses i.e. 2007-05-00 might preferable since there's no way to confuse that with a date range.

The templates output from Citoid will be mixed with other citations in the article that were typed by hand. Indeed, many of the Citoid generated templates will be imperfect and require manual fixes. Thus, the dates should not be regarded as some hidden format that you can do whatever you want with; rather they should be regarded as human-readable information that must obey the Help:Citation Style 1 documentation (which in turn defers to Wikipedia Manual of Style/Dates and numbers).

Worth noting that if you try this in wikidata, i.e. "Fall 2003" you will get "date is malformed..." so they haven't figured it out either. Since iso8601 allows ranges, I can imagine us translating say, "Fall 2003" to be Sept-Nov 2003, which would be 200709/200711 instead of having the special formats. And converting Christmas to December 25th :) (which we could actually do right now with the current set-up actually.) And quarter 1 is Jan-March?

We are dealing with seasons and quarters as printed in the publication. Publications located in the southern hemisphere will have different definitions of spring, summer, etc. than northern hemisphere publications. Some publications that mention quarters may be referring to fiscal year quarters, which could be just about anything. The goal should be to allow the reader to look at the Wikipedia article, then look at the publication cover, and determine they are the same, regardless of how seasons or quarters are defined.

These are all great suggestions. I think in iso8601 it is not valid to leave out the dashes except when the time is also included as well though.

Section 4.1.2.3 does say that reduced accuracy year and month dates do require the hyphen.

Still, since the suggestion violates iso8601 for seasonal, quarterly, and proper noun dates, dropping the hyphen is simply further extension or adaptation of the standard to suit our needs. Or we don't bother to refer to this thing as iso8601 at all; it becomes a date interchange format used internally to wmf.

Worth noting that if you try this in wikidata, i.e. "Fall 2003" you will get "date is malformed..." so they haven't figured it out either. Since iso8601 allows ranges, I can imagine us translating say, "Fall 2003" to be Sept-Nov 2003, which would be 200709/200711 instead of having the special formats. And converting Christmas to December 25th :) (which we could actually do right now with the current set-up actually.) And quarter 1 is Jan-March?

Fall 2003 might be Sept-Nov 2003 in the northern hemisphere, but is spring in the southern.

Dates in citations should, as closely as possible within the constraints of MOSDATE reflect the dates actually used in the sources. A Christmas issue of some periodical may not have 25 December on the cover. This same is true for quarterly dates.

Thanks all. It seems to me that in a lot of these "dates" that these are actually issue names: like Summer 2003 is the Summer issue from 2003. But Summer sounds date-ish so it ends up in "date" field. So we could try to parse out words like Summer or Q1 into the 'issue' field and put the year in the date field by itself.

Might that work for these odd cases? Can anyone think of a publication Summer 2003 date or similar that has an issue number in addition?

Can anyone think of a publication Summer 2003 date or similar that has an issue number in addition?

Try this insource search at en.wiki: insource:/\| *date *= *Summer/

In fact, the example I already gave had an issue number as well as a "Winter/Spring" date:

Double Exposures: Derrida and Cinema, an Introductory Séance
James Leo Cahill and Timothy Holland
Discourse
Vol. 37, No. 1-2 (Winter/Spring 2015), pp. 3–21
Published by: Wayne State University Press
DOI: 10.13110/discourse.37.1-2.0003

The issue number is the part where it says "No. 1-2". The date is "Winter/Spring 2015". Also note that "Winter/Spring 2015" is ambiguous, even for northern hemisphere dates: does it mean the period beginning in December 2015 and lasting through the spring of the following year, or does it mean the period that ends in Spring 2015? In this case it's the latter but I had to look at the adjacent issue dates to tell. So it would be a mistake to assume that one can always parse these things and turn them into unambiguous ISO date ranges. The dates are what the publishers give as the dates, and if we want to include a date in a citation (instead of just punting and giving only the year) then those are the dates we need to use.

Change 353706 merged by Mobrovac:
[mediawiki/services/citoid@master] If only year is provided, only put year in date field

https://gerrit.wikimedia.org/r/353706

Mentioned in SAL (#wikimedia-operations) [2017-05-15T15:07:52Z] <mobrovac@tin> Started deploy [citoid/deploy@3ed34ef]: Better publishing date extraction support - T132308

Mentioned in SAL (#wikimedia-operations) [2017-05-15T15:10:42Z] <mobrovac@tin> Finished deploy [citoid/deploy@3ed34ef]: Better publishing date extraction support - T132308 (duration: 02m 49s)

We've deployed a fix for the year issue; all dates with only a year should now have just a year.

Please note that we're still working on the partial date issue; I think what we've discovered here is that publishers have a very loose definition of what constitutes a "date" - however, we still have to abide by the style guidelines. We do get back dates which violate CS1 rules like 10-04 and 11/11/2007 so we still need to attempt to validate.

Having our own modified version of ISO that includes seasons and quarters I think increases the burden on citation templates to too great a degree, so as much as it bothers the standards compliant person in me, I think we just have to do this sort of arbitrarily.

however, we still have to abide by the style guidelines

For the record: No, you don't. Software and services that are used on hundreds of wikis are not required to abide by the policies or guidelines of any individual wiki. There is even a policy at the English Wikipedia that acknowledges the unreasonability of devs being expected to customize software to fit each community's ever-changing and sometimes contradictory standards. Devs are only required to have a consensus from the MediaWiki community that their software choices are right for the software.

However, this particular style guideline does contain some information and advice that is not project- or language-specific, and it identifies a number of interesting situations. So of course it would be sensible and probably efficient to learn from it, even though citoid is not technically required to abide by it.

We do get back dates which violate CS1 rules like 10-04 and 11/11/2007 so we still need to attempt to validate.

At en.wiki the rules are Manual of Style rules that cs1|2 adhere to; they are not 'CS1 rules'

Having our own modified version of ISO that includes seasons and quarters I think increases the burden on citation templates to too great a degree, so as much as it bothers the standards compliant person in me, I think we just have to do this sort of arbitrarily.

I'm not at all sure I fully understand what you've written here. How is 'our own modified version of ISO' more burdensome on citation templates than a non-standard 'arbitrary' something? If you accept the part of iso8601 for full dates and year-only dates (en.wiki already supports these) and define a year month version of that which zeros out the days (yyyy-mm-00) it isn't too much of a struggle for en.wiki and others to render that as Month YYYY. That same form works for date ranges in the standard iso8601 form yyyy-mm-00/yyyy-mm-00. The hard part is still seasons and proper-name dates. I've offered one possible solution to that dilema, there may be other and better solutions. Whatever it is that chosen, document it, adhere to it, and advertise it so that we all know what it is.

Lots of tools and bots examine citations. I'm not really sure which ones examine the rendered HTML, which ones examine the COinS metadata, and which ones examine the wikitext. Anything that examines the wikitext and finds yyyy-mm-00 should reject it as completely non-standard.

If you insist on creating dates that don't follow any standard, I would suggest supplying it in a parameter with a special name, like citoid-date=blahblahblah. But since there is no mechanism to keep human editors from playing with the citoid-date parameter, I don't like this idea.

Anything that examines the wikitext and finds yyyy-mm-00 should reject it as completely non-standard.

which is why I wrote:

! In T132308#3266180
Whatever it is that is chosen, document it, adhere to it, and advertise it so that we all know what it is.

! In T132308#3267381, @Jc3s5h wrote:
If you insist on creating dates that don't follow any standard, I would suggest supplying it in a parameter with a special name, like citoid-date=blahblahblah. But since there is no mechanism to keep human editors from playing with the citoid-date parameter, I don't like this idea.

We have these issues:

  1. publishers have non-standard ways of writing publication dates
  2. iso8601 is a standard that is not capable of communicating all dates that publishers commonly write
  3. at en.wiki, Manual of Style dictates which of the myriad available date formats are permissible (presumably this applies to other languages as well)
  4. at en.wiki, editors complain about writing en dashes and therefore often use a hyphen instead
  5. cs1|2 and other citation templates elsewhere must make some sort of sense out of citoid's rendering of publisher's non-standard publication dates

We know that iso8601 cannot represent all dates that publishers write (season, quarter, and proper name are some). Somehow, somewhere, some mechanism must be contrived to allow citoid to do that. We know that the various wikis may have differing notions regarding how certain dates are to be displayed. This is relatively easy when citoid can represent dates in an iso8601 format or a format appropriate to the language but falls apart where the date cannot be represented by iso8601 or, as is most likely, citoid does not (or will not) have support for the plethora of languages (a huge task).

To answer these conflicting issues we can concoct our own standard for dates produced by citoid. Perhaps the first sentence of that standard is:

  • Where possible, dates produced by citoid shall be rendered in accordance with iso8601.

Because of the opening sentence in Our New Standard, editors at en.wiki would need to give up the freedom of writing 2007-08 (with a hyphen) and would need to write 2007–08 (with an en dash) because cs1|2 would need to be modified to render the former as August 2007 (because MOS does not allow for YYYY-MM dates).

Following that, Our New Standard describes how citoid is to render dates that cannot be rendered in a form supported by iso8601. For example, something like this perhaps:

  • Seasonal dates: for single dates: YYYY.<season> for ranges: YYYY.<season>/YYYY.<season>

It would continue to describe what it is that <season> means for an international audience; similarly for <quarter> and <proper name>.

To resolve this date transfer issue, there must be a bit of give and take. It is ok for citoid to depart from iso8601 as long as the departure is itself published and advertised as its own standard. When the iso8601 committee catch up with Wikipedia, Our New Standard becomes obsolete.

I believe that if you went to enwiki and asked the practical question:

"Would you rather that:

  1. we change the Manual of Style to accept more ISO 8601-compliant formats, which will have the side effect of requiring 100% of editors to use en dashes properly for date ranges in all citation templates, even if they don't know what an en dash is or how to type it on their Windows box, or
  2. we use a local standard that doesn't violate the Manual of Style, in which a source published in August 2007 can be unambiguously marked in a citation template as 2007-08-00, the dash-making bots will not mistake it for a date range (the bots could even convert it to August 2007), and the CS1 template will automagically display it as August 2007 so that no reader will ever see the zeroes?"

then their first choice will be that editors type August 2007 by hand, and their second choice will be the "non-standard" 2007-08-00. They will reject the proposal that depends upon every editor using dashes properly.

If you're not going to follow a standard, you should go far away from the standard to avoid confusion. I'd suggest keeping the date and year field just as they are. In the absence of both those fields, the template could look for citoid-month, citoid-year, citoid-day, citoid-season, and whatever else is required.

|date= already doesn't follow IS0 8601. It has never followed ISO 8601. The decision that |date= would not follow ISO 8601 was made years before citoid was created. I'm not sure why "editor doesn't follow ISO 8601 while typing manually" should be separated from "editor still doesn't follow ISO 8601 while using a semi-automated script".

Perhaps this ticket can be split into two tickets. One that ensures that dates such as "2013" or "March 2017" aren't being represented as 2013-01-01 or "2017-03-01", and one ticket that discusses how to format dates such as "Winter 2012" in an MOS/ISO standard way. I don't see why Citoid can't just go to the "lowest denominator". If only a year is known, then only produce a year, if year and month are known, either only produce year+month, or only the year, but not a day. Whether or not CItoid should do 2013-03 or 2013-03-00 or March 2013 can be discussed, but first, the tool should stop doing 2017-03-01 or 2013-01-01, After the false dates have stopped being produced with Citoid, we can discuss how to make the dates more precise.

|date= already doesn't follow IS0 8601. It has never followed ISO 8601. The decision that |date= would not follow ISO 8601 was made years before citoid was created. I'm not sure why "editor doesn't follow ISO 8601 while typing manually" should be separated from "editor still doesn't follow ISO 8601 while using a semi-automated script".

The |date= parameter has always supposed to contain some version of the date that would generally be regarded as correct English, and in recent years, has been expected to follow the date formats accepted in MOSNUM. It has never been acceptable to use a format that would never be considered correct English, such as 2017-05-00.

What I wrote was merely a suggestion and should be taken to be just that: a suggestion. It was intended to move the conversation ahead. My point was to show that it is possible to create our own inter-tool date exchange standard so that citoid can transmit dates to the various wikis with their various templates in a consistent and understandable manner. We have already had Editor Jc3s5h declare the YYYY-MM-00 format to be unacceptable, you are suggesting, I think, that requiring that en.wiki editors write an en dash in YYYY–yy year range dates is unacceptable. Are we now stymied? Shall we just up stumps and retire to the pavilion?

I don't understand what you mean by this. At en.wiki, from their earliest days through today, cs1|2 templates have always accepted some form of iso8601 date. Prior to March 2014 cs1|2 simply rendered what they were given so they accepted all forms of iso8601 dates. MOS may not have approved but, in their adolescence, cs1|2 did not care what MOS thought. From March 2014, dates are expected to comply with MOS which still allows the iso8601 yyyy-mm-dd form. And where did the quoted text in your post come from? I can't find it on this ticket.

It has never been acceptable to use a format that would never be considered correct English, such as 2017-05-00.

True, but, until now, we have not considered using such a form sub rosa as an inter-tool date exchange mechanism that unambiguously identifies a month and year date where the template parameter in wiki text is |date=2017-05-00 and where, transformed by the template, the final rendering is May 2017.

Sorry, I do not know how to read the line containing " T132308#3268915" so cannot respond to your question.

[offtopic] @Jc3s5h: If the link does not make your browser jump to that comment, please click "Changes from before your most recent comment are hidden. Show Older Changes" first and then try again. Thanks!

|date= already doesn't follow IS0 8601. It has never followed ISO 8601. The decision that |date= would not follow ISO 8601 was made years before citoid was created. I'm not sure why "editor doesn't follow ISO 8601 while typing manually" should be separated from "editor still doesn't follow ISO 8601 while using a semi-automated script".

Currently, the standard followed by |date= is "use the subset of acceptable English-language dates (including some of ISO 8601 that is allowed in MOS:DATES". If software is going to create date information that is neither ISO 8601 nor correct English, it won't be following the current standard. If the software-created dates are separated from the human-created dates by giving the software-created dates a different parameter name, the template would transform the software-created dates into the appropriate format for the article before rendering them. If the new parameter and |date= were present in the same citation, the new parameter should be ignored.

Humans should not edit the new parameter. If there is no |date= parameter and the editor knows the new parameter is wrong, the human would create a correct |date= parameter and delete the new parameter.

cs1|2 templates have always accepted some form of iso8601 date.

cs1|2 templates have always accepted some ISO 8601-compliant dates, but perhaps more to the point, enwiki has always rejected some ISO 8601-compliant dates, and enwiki has always accepted some ISO 8601-non-compliant dates. Therefore, "this suggestion doesn't comply with this standard (that we aren't complying with anyway)" [1] does not sound like a logical argument to me.

[1] See, e.g., comments such as "Anything that examines the wikitext and finds yyyy-mm-00 should reject it as completely non-standard." We agree that this isn't the standard presented in ISO 8601. But anything that finds |date=Summer 1942 should equally reject that as completely non-standard, too, because that also isn't the standard presented in ISO 8601.

[1] See, e.g., comments such as "Anything that examines the wikitext and finds yyyy-mm-00 should reject it as completely non-standard." We agree that this isn't the standard presented in ISO 8601. But anything that finds |date=Summer 1942 should equally reject that as completely non-standard, too, because that also isn't the standard presented in ISO 8601.

"yyyy-mm-00" is completely non-standard for the date parameter because it is neither ISO 8601 nor a proper English date. |date=Summer 1942 is standard because it is both accepted in MOS:DATES and it is proper English.

Thanks all! I have decided on a game plan:

  • Put out all dates in a readable format (i.e. May 2010) in the date field to address the polluted data issue ASAP.
  • Possibly translate them on our end as well.

This is a possible way forward for standardising dates which would occur over a longer time scale:

  • Write up a standard format for publishers' dates. Likely in the form 2007-10-00, 2007-summer, 2007-q1.
  • Put the new standard in a new field called publisherDate in citoid for transitioning purposes.
  • See if any wikis are willing to use publisherDate. Each wiki would be able to decide if they want to accept the new format in their 'date' field, or if they prefer to have a separate field for it like 'publisher-date' or 'citoid-date'.
  • Assess if the standard publisherDate is now suitable for the date field and potentially replace it. If not, remove it. This bit is sticky if there is only partial conversion.
Mvolz renamed this task from Figure out how to deal with incomplete dates, i.e. year only or year and month only to Deal with incomplete and non standard dates, i.e. year only, year and month only, or season / quarter.May 18 2017, 11:00 AM
Mvolz removed a project: Patch-For-Review.
Mvolz updated the task description. (Show Details)

I have opened up the conversation to the wikicite-discuss group as well: https://groups.google.com/a/wikimedia.org/forum/#!topic/wikicite-discuss/a2kRHayAiyo

Someone there suggested EDTF (extended date time format) https://www.loc.gov/standards/datetime/ - this looks like exactly what we need.

The only issue is that they do represent missing precision in the form YYYY-MM

I know this is ambiguous w.r.t. to ranges but

  1. 2010-11 causes as CS1/2 error and tells people they should use 2002–2003 (with em dash). So this is not valid, and technically CS1/2 could accept it and interpret it as Nov 2010. It won't alert the user if they think they're adding a range, but I feel like they should figure it out when it gets rendered "wrong" - and I think the amount of this kind of error would be low anyway.
  2. If CS1/2 doesn't want to allow them despite point 1, it could go into a different field as previously proposed.

IMO we should use this ETDF because I think the last thing the world needs is any more standard formats, so if there's one out there that looks like it could mostly do what we want, we should use it :).

In response to T132308#3272430, @Mvolz:

Wait a minute, citoid can render all dates in readable format and in the appropriate language? Tell me again why we've been having this conversation?

In T132308#3273017, @Mvolz wrote in part:

Someone there suggested EDTF (extended date time format) https://www.loc.gov/standards/datetime/ - this looks like exactly what we need.

I participated in the discussion that lead to EDTF. A modification of this has been proposed for the next version of ISO 8601. I have a draft but am not allowed to make it available to others. My concern about ISO 8601 is that copies are so expensive that few editors who are not employed by a relevant institution will have access to a copy, so there is much misinformation about the contents of ISO 8601. I expect this problem to continue with the new edition.

In T132308#3273017, @Mvolz wrote in part:

I have opened up the conversation to the wikicite-discuss group as well: https://groups.google.com/a/wikimedia.org/forum/#!topic/wikicite-discuss/a2kRHayAiyo

I can't edit google groups because of the settings of the organization that I obtain access to gmail through, so I'll comment here. The question was raised about publication date vs. copyright. An apropos example is the Oxford Companion to the Year which has a copyright year of 1999 but was "reprinted with corrections" in 2003. Both dates are listed in WorldCat. The information about reprinting with corrections comes from my paper copy.

I don't know the details of what year a publisher is expected to place in the copyright notice, but I would speculate that the corrections did not involve sufficient creative effort by the authors to restart the copyright period.

The only issue is that they do represent missing precision in the form YYYY-MM

Is that not handled by §5.2.2 Unspecified? That section reads, in part:

  1. Year and month specified, day unspecified.
    • 1999-01-uu
      • some day in January 1999

That form is much the same as the 1999-01-00 form suggested elsewhere and accomplishes the same thing.

Also missing is quarterly date and what they have chosen to call holiday date format support. These are recognized as issues but that version of EDTF is mute on those topics.

  1. 2010-11 causes as CS1/2 error and tells people they should use 2002–2003 (with em dash). So this is not valid, and technically CS1/2 could accept it and interpret it as Nov 2010. It won't alert the user if they think they're adding a range, but I feel like they should figure it out when it gets rendered "wrong" - and I think the amount of this kind of error would be low anyway.

cs1|2 emits an error message for 2010-11 not so much because of the hyphen but because of ambiguity; is 11 a month or a year? There is no error when the last two digits are outside the range 00-12. Years in a range are separated with en dash not em dash; see the MOS. As I've suggested elsewhere in this ticket, cs1|2 can interpret the YYYY-MM-uu form: |date=2010-11-uu → November 2010. I'll hack the cs1|2 sandbox to demonstrate this in the next day or two.

  1. If CS1/2 doesn't want to allow them despite point 1, it could go into a different field as previously proposed.

It isn't cs1|2. The restrictions on date-parameter values in the form YYYY-xx (where xx may be a year or month) is imposed on cs1|2 by the en.wiki MOS. Please stop blaming cs1|2 for limitations that are imposed on it by the en.wiki MOS.

IMO we should use this ETDF because I think the last thing the world needs is any more standard formats, so if there's one out there that looks like it could mostly do what we want, we should use it :).

I've only read it once but am inclined to agree. The numeric values that is uses for seasons are similar to the internal values that cs1|2 uses to represent seasons. I've made a TODO: note in the cs1|2 date validation code to change to the EDTF values. I notice that cs1|2 supports 'Fall' as a synonym of 'Autumn'; EDTF does not.

It isn't cs1|2. The restrictions on date-parameter values in the form YYYY-xx (where xx may be a year or month) is imposed on cs1|2 by the en.wiki MOS. Please stop blaming cs1|2 for limitations that are imposed on it by the en.wiki MOS.

Dates in citations are controlled by the Citing sources, not by Manual of Style/Dates and numbers (abbreviated MOS:DATES). By consensus at Help:Citation Style 1 editors decided to use the date portion of MOS:DATES The editors could decide to make some exceptions, and make the allowable dates for cs1|2 a bit different than the allowable dates in MOS:DATES.

For citation formats other than cs1|2, which are allowed in the English Wikipedia, other date formats, that don't follow MOS:DATES, could be used. For example, if an article followed APA style for citations, the date 1993, September 30 could be used.

I'll hack the cs1|2 sandbox to demonstrate this in the next day or two.

Easier than I thought. Discussion and simple examples at en.wiki Help talk:Citation Style 1.

Change 354249 had a related patch set uploaded (by Mvolz; owner: Marielle Volz):
[mediawiki/services/citoid@master] Relax date validation significantly

https://gerrit.wikimedia.org/r/354249

Change 354249 merged by Mobrovac:
[mediawiki/services/citoid@master] Relax date validation significantly

https://gerrit.wikimedia.org/r/354249

Mentioned in SAL (#wikimedia-operations) [2017-05-31T20:13:22Z] <mobrovac@tin> Started deploy [citoid/deploy@7d69554]: Relaxing date validation - T132308

Mentioned in SAL (#wikimedia-operations) [2017-05-31T20:15:54Z] <mobrovac@tin> Finished deploy [citoid/deploy@7d69554]: Relaxing date validation - T132308 (duration: 02m 32s)

mobrovac changed the task status from Open to Stalled.May 31 2017, 8:31 PM

The patch relaxing date validation is now live on all projects. It incorporates some of the suggestions outlined in this task, so please test it. Setting the task as stalled until further input.

because this conversation has apparently collapsed and died without obvious resolution, I have removed the code that supported edtf transformations from the cs1|2 module sandbox.

Mvolz renamed this task from Deal with incomplete and non standard dates, i.e. year only, year and month only, or season / quarter to Consider using EDTF format to standardise dates.Oct 19 2017, 2:11 PM
Mvolz removed a project: Patch-For-Review.
Mvolz updated the task description. (Show Details)

because this conversation has apparently collapsed and died without obvious resolution, I have removed the code that supported edtf transformations from the cs1|2 module sandbox.

The current status is that we have very weak date validation and no one has commented since then, so I guess the relaxed validation was a satisfactory quick fix. We could leave it at that, for the most part.

We still have the issue that all dates are English language. We still could consider implementing https://www.loc.gov/standards/datetime/ISO_DIS%208601-2.pdf or at least moving towards it cautiously, but since we're receiving the data in a non-standard format to begin with, we have to be careful about transforming things into nonsensical data (like what happened before), and in the end it might make sense to not have a standard and just continue to do weak validation. If we were a publisher that could guarantee of the format of our own data, a standard would be best, but it just might not be plausible in this scenario, where we're getting dates back in all sorts of non standard formats.

Mvolz renamed this task from Consider using EDTF format to standardise dates to Internationalise citoid dates.Oct 19 2017, 2:23 PM
Mvolz updated the task description. (Show Details)

Extended Date/Time Format (EDTF) Specification as of 2019-02-04. I still think that citoid and other tools that pass dates back and forth should adopt IETF or something similar so that there is a single standard for date representation. Let the templates that present the date information worry about month-name and date format presentation according to the rules of the particular wiki.

Most of Extended Date/Time Format (EDTF) Specification has been adopted into ISO 8601-1:2019 and ISO 8601-2:2019. These cost 158 Swiss francs and 178 Swiss francs respectively. I am concerned that due to the widespread use of ISO standards, the Library of Congress specification will be left by the wayside. However, because the volunteer developers and editors at the Wikimedia Foundation won't be able to afford the expensive ISO standards, they will be used, but used incorrectly, because the users will rely on unreliable summaries of the standards.

Aklapper changed the task status from Stalled to Open.Jul 25 2020, 5:18 PM

The patch relaxing date validation is now live on all projects. It incorporates some of the suggestions outlined in this task, so please test it. Setting the task as stalled until further input.

Resetting task status as three years should have been enough time to provide further input (and as tasks should not be stalled forever).

The patch relaxing date validation is now live on all projects. It incorporates some of the suggestions outlined in this task, so please test it. Setting the task as stalled until further input.

Resetting task status as three years should have been enough time to provide further input (and as tasks should not be stalled forever).

We've been using EDTF (https://www.loc.gov/standards/datetime/ - level 0) for a while now, so I guess in one sense this is resolved, but it still causes citation errors on en wiki. @Trappist_the_monk - any chance of re-opening the discussion on en wiki on accepting these?

@Mvolz you write "we've been using EDTF". Who is we? And what discussion on the English Wikipedia are you referring to.

I'd be inclined to say No! The specification states " Level 0 specifies features of ISO 8601-1". ISO 8601 states it must be used with the Gregorian calendar. In addition to Gregorian, Wikidata needs to support the proleptic Julian calendar. Wikipedia needs to support the proleptic Julian calendar, the Julian calendar as observed in Rome, and the calendar of the Roman Republic.

@Mvolz you write "we've been using EDTF". Who is we? And what discussion on the English Wikipedia are you referring to.

I'd be inclined to say No! The specification states " Level 0 specifies features of ISO 8601-1". ISO 8601 states it must be used with the Gregorian calendar. In addition to Gregorian, Wikidata needs to support the proleptic Julian calendar. Wikipedia needs to support the proleptic Julian calendar, the Julian calendar as observed in Rome, and the calendar of the Roman Republic.

Citoid has been returning partial dates like 2009-01 or 1888-05 for dates for the last few years. At present this causes a citation error.

Previously it's been suggested the on wiki that Module:Citation can interpret these EDTF dates when in the template and display them to the user as "January 2009", or "May 1888." Right now it doesn't recognise them and returns a citation error. This is adding support, not removing it.

I'd be happy to consider switching to a different way of representing partial dates that doesn't require the use of English, but unfortunately no one has suggested a standard that doesn't that I've noticed @Trappist_the_monk you suggested IETF but I had a look and couldn't find a partial date representation there - did I miss it maybe?

The relevant discussion page is here: https://en.wikipedia.org/wiki/Help_talk:Citation_Style_1#edtf_date_formats_as_cs1.7C2_date_parameter_values

For the YYYY-MM date format to be accepted at en.wiki, you will have to get it accepted at MOS:NUM. Were I you, I would not hold my breath.

Way back when in this conversation it was suggested that we should be using what is now Level 1 §Unspecified digit(s) from the right, specifically item 3: 1985-04-XX. At the time, I modified Module:Citation/CS1/Date_validation/sandbox to render dates in that form as Month YYYY (April 1985). I removed support for YYYY-MM-XX from the cs1|2 module when discussion here did not advance.

I still think that support for various parts of Level 1 and Level 2 are worth pursuing. The current version of Level 1 supports seasons. Level 2 supports seasons and quarters. Internally, cs1|2 uses the season codes from level 1 and the quarter codes from level 2.

I'd be happy to consider switching to a different way of representing partial dates that doesn't require the use of English, but unfortunately no one has suggested a standard that doesn't that I've noticed @Trappist_the_monk you suggested IETF but I had a look and couldn't find a partial date representation there - did I miss it maybe?

Isn't that what most of the whole long discussion has been about? The current IETF at Level 1 §Unspecified digit(s) from the right at item 3 reads:

Year and month specified, day unspecified in a year-month-day expression (day precision)
Example 4 ‘1985-04-XX’

Does that not provide a way to do month year dates that don't require English?

That discussion has been archived.
The discussion where edtf was removed is also archived

For the YYYY-MM date format to be accepted at en.wiki, you will have to get it accepted at MOS:NUM.

That may be true for output; but is not true for input. Per Postel's Law, templates should accept YYYY-MM dates (valid in ISO 8601) when entered, regardless of how we decide they should display such values.

Some comments on the status of EDTF.

  • The most recent Library of Congress page about EDTF describes itself as "Official Web Site" and the main heading is "EDTF Background". It states "EDTF functionality has now been integrated into ISO 8601-2019, the latest revision of ISO 8601, published in March 2019." Based on what I've read on the various Library of Congress pages about EDTF, and having contributed to the drafts over time, I think this means that all the things you could do with EDTF can be done with ISO 8601-2019, but the syntax may be different.

[[ https://www.loc.gov/standards/datetime/ | The 2019 EDTF specification ]] mentioned above is, of course, older. It's unclear if it has much of a future, even in terms of staying on the Library of Congress website.

Using ISO 8601-2019 has three problems.

  1. It's expensive, so the volunteer editors and developers associated with the Wikimedia Foundation are very likely to rely on unreliable summaries.
  2. It only supports the Gregorian calendar, so using it for Julian calendar dates would be an approximation. Since the standards discussed are external to the Wikimedia Foundation, we can't modify them to say this approximation is acceptable.
  3. Usages in one area of Wikimedia Foundation stuff tend to creep into other areas. This process of Citoid's non-standard YYYY-MM creeping into the English Wikipedia is an example of this creep. Nobody in this discussion has the power to prevent YYYY-MM-XX being used to represent a Julian calendar death date, where the difference of a dozen or so days is more important that when representing a magazine publication date.

Year and month specified, day unspecified in a year-month-day expression (day precision)
Example 4 ‘1985-04-XX’

Does that not provide a way to do month year dates that don't require English?

1985-04-XX means "an unknown individual day in April 1985"; 1985-04 means "the month of April 1985".

I'd be happy to consider switching to a different way of representing partial dates that doesn't require the use of English, but unfortunately no one has suggested a standard that doesn't that I've noticed @Trappist_the_monk you suggested IETF but I had a look and couldn't find a partial date representation there - did I miss it maybe?

Isn't that what most of the whole long discussion has been about? The current IETF at Level 1 §Unspecified digit(s) from the right at item 3 reads:

Year and month specified, day unspecified in a year-month-day expression (day precision)
Example 4 ‘1985-04-XX’

Does that not provide a way to do month year dates that don't require English?

Yes, but that quote is from EDTF - That's EDTF level 1. https://www.loc.gov/standards/datetime/

(I was just confused by the IETF statement because I couldn't find anything like that in IETF, i.e. https://tools.ietf.org/html/rfc3339 )

I'm happy to do EDTF level 1 abridged dates instead if that's preferable to the level 0 format.

For the YYYY-MM date format to be accepted at en.wiki, you will have to get it accepted at MOS:NUM.

That may be true for output; but is not true for input. Per Postel's Law, templates should accept YYYY-MM dates (valid in ISO 8601) when entered, regardless of how we decide they should display such values.

Nobody reads Help:Citaition Style 1 or related documentation. So editors will inevitably write 2011-12 when they mean 2011-2012. There is no way to tell if 2011-12 means 2011-2012 or December 2011. So it is an error and should always be flagged as such.

Year and month specified, day unspecified in a year-month-day expression (day precision)
Example 4 ‘1985-04-XX’

Does that not provide a way to do month year dates that don't require English?

1985-04-XX means "an unknown individual day in April 1985"; 1985-04 means "the month of April 1985".

In EDTF 2019, 1985-04-XX means the individual day in April 1985 is unspecified, not unknown. In the case of a newer magazine issue, in the form of a PDF, if the cover said it was the April 2021 issue and the PDF properties said it was last changed on 6 March 2021, the correct date, for citation purposes, would be April 2021.

Reputedly ISO 8601-2019 uses the same syntax. I'll tell you exactly what it means in that specification after you buy a copy and send it to me.

(I was just confused by the IETF statement because I couldn't find anything like that in IETF, i.e. https://tools.ietf.org/html/rfc3339 )

Yeah, my error; I type IETF much more often than I type EDTF...

I'm happy to do EDTF level 1 abridged dates instead if that's preferable to the level 0 format.

Let us stick to a single term: 'unspecified digits'; 'abridged' implies a truncation or shortening which isn't the case. The date 2021-02-XX is just as long as 2021-02-24.

There is a very good reason for NOT generating and not accepting dates like 2009-10 in MOS:DATE and in citation template input: Because we don't know whether that means October 2009 or 2009–2010. If the citoid software has a date in this format for which it does know the correct disambiguation, then it is in that software that the conversion to a valid format must be made, before the disambiguation information is lost.