Page MenuHomePhabricator

IABot overwrites already populated 'archivedate'-parameter with date in format not conforming to the language's locate
Closed, ResolvedPublic

Description

When cite-templates are pre-populated with a (forced snapshot) archive-url (and date), not only does IAB pointlessly change the archive-url prefix from "http" to "https" (which is however acceoptable), but it also overwrites the also already filled-out 'archivedate'-parameter. This, as the other date-fields on the NL-WP (nl.wikipedia.org), are preferrably/often filled out conforming to the date format as it is associated to the language. Even if a (slightly) other/shorter format was used, it may and should be assumed that the authors of the article have filled out all dates within the same article using the *same* (consistent) date-format (in either cite-templates, non-templated cites or any mix of those, and (already) populated date-fields should never be overwritten by IAB.

When IAB needs to populate the date field because it has not been provided/populated yet, then IABot should conform to the locate settings as they are applicable to the language for that wiki. In this case – the Dutch Wikipedia, nl.wikipedia.nl –, for the Dutch language the [ISO-variant – if it even is an ISO-format)] date format of YYYY-MM-DD is the least used format and officially not acceptable according to Dutch grammar and spelling, date and time notation conventions. The preferred date notation for the Dutch language(/locale) is: "[d]d month yyyy"
(Where "[d]d" is the day, never prefixed with a 0, "month" is the name of the month in Dutch, never starting with a capital letter, and "yyyy" is the year number). As a short alternative format "[d]d-[m]m-[yyyy] is the next acceptable (but not preferred) format. The third option is 'dd-mm-yyyy' using 0 as prefix before numbers less than 10. The format 'yyyy-mm-dd' does not conform to Dutch language notation formats).

The point is, when a date-field is already populated, IAB should assume it has been populated in a format that is consistent throughout the entire article, and throughout cite-references specifically, and overwriting existing date-parameter values by IAB is not acceptable, *ever*.

An example is the following edit made by IABot:

which I then repaired as follows:

So in short: please let IABot refrain from overwriting/altering pre-existing parameter values for date fields, and let IAB use the locale associated with the applicable language if it needs to add the archive (url and) date parameters due to a dead link. IAB should never ever alter/overwrite pre-populated date fields, as it introduces deviations, disharmony and inconsistency within the article/article's references-section.

Thanks in adcance.

martix - 16 februari 2019 12:14 (CET)

Event Timeline

Apologies, 'locate' in the subject should read "locale"

Do you have an order of preferred formats. IABot attempts to recognize a date format among a defined list of acceptable formats. Please least from most commonly used to least commonly used so I can update the bot accordingly.

  • [1] Most preferred: mixed numeral/text:
    • Format 'day_nr full_month_name year'
    • Examples: "16 februari 2019" , "2 juli 2018" , "1 januari 1901"
    • Month names in Dutch: januari , februari , maart , april , mei , juni , juli , augustus , september , oktober , november , december ;
    • Note: no capitals whatsoever in the names of the month
  • [2] Next preferred: numbers only:
    • Format: 'day_nr-month_nr_year'
    • Examples: "16-2-2019" , "2-7-2018", 1-1-1901
    • Note: numbers not prefixed/pre-padded with zeroes
  • [3] Next preferred: numbers only ISO notation:
    • Format: 'dd-mm-yyyy'
    • Examples: "16-02-2019" , "02-07-2018" , "01-01-1901"
    • Note: fixed digits-length for date-, month and year-numbers (2,2,4), single digit numbers to be pre-padded with (a) zero(es)
  • [4] Least preferred, last resort: ISO 8601 (as currently in use);
    • Format: 'yyyy-mm-dd'
    • Example: "2019-02-16"

If it helps, in a *nix or posix-compliant shell environment:

  • with settings: set LANG=nl_NL.UTF-8 ; set TZ=CET
  • the output to retrieve the current system date by the command: date "+%e %B %Y"
    • which would (today, Feb 16) yield: "16 februari 2019"
  • or alternatively, without (permanent) environment settings, the output of the command:
    • LANG=nl_NL.UTF8 TZ=CET date "+%e %B %Y"

(for witch the man page of the C function 'strftime()' should provide a proper API to the above formatting of a provided (rather than the syutem's) date as well).

I hope this helps, and thank you for your efforts,

martix.

Part of this is a bug in the date format detection routine. It tries to match formatting based on how the remainder of the page formats the timestamps, and defaults to what is defined as default format in the configuration. This is now fixed in beta12.

The other part is that the default is set to %Y-%m-%d. So the order of preference in the current configuration is %-e %B %Y -> %B %-e, %Y -> %Y-%m-%d.

I will be amending it to %B %-e, %Y -> %e-%-m-%Y -> %d-%m-%Y -> %Y-%m-%d, with the default being %-e %B %Y.

As a reference, IABot uses strftime, with cross platform compatibility functions I designed so the it works on Windows, Mac, and Linux consistently.

Also the reason why IABot overwrites dates, is when it believes it's not complying with formatting preferences of the local community. That is, if IABot can't read the date with the given formatting strings, it will overwrite it. This in part plays to the bug I mentioned above.