Page MenuHomePhabricator

Archivebot doesn't recognize non-latin digits anymore
Closed, DeclinedPublic

Description

After updating pywikibot, my bot started ignoring existing archive counter and start from 1 with Latin digits instead. This is clearly a regression and causing a lot of issues in the wiki.

Example:
https://fa.wikipedia.org/w/index.php?title=%D8%A8%D8%AD%D8%AB_%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Jeeputer&diff=prev&oldid=41869565
(The counter is reduced from 69 to 1 and instead of archiving to https://fa.wikipedia.org/wiki/%D8%A8%D8%AD%D8%AB_%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Jeeputer/%D8%A8%D8%A7%DB%8C%DA%AF%D8%A7%D9%86%DB%8C_%DB%B6%DB%B9 (Persian digits: ۶۹), it started to archive to 1 in Latin digits: https://fa.wikipedia.org/wiki/%D8%A8%D8%AD%D8%AB_%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Jeeputer/%D8%A8%D8%A7%DB%8C%DA%AF%D8%A7%D9%86%DB%8C_1

I remember fixing this long time ago

Details

Event Timeline

From reading the code, I'm seeing the localcounter being implemented. Please revert this. We will have to update the whole wiki. There is no reason to use non-local counter in the wiki, and if it's really needed, it should be reverted and default should be the wiki's default digits for counter not the other way around.

This has made a massive mess on the wiki

Change #1164698 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[pywikibot/core@master] archivebot: Revert the force of latin digits

https://gerrit.wikimedia.org/r/1164698

The localized values were introduced for T71551 in rPWBC0421498 in release 3.0.20180108. With rPWBC4a6c2e0a (release 7.5.0) the to_local_digits function was fixed to always return a string as already described in the documentation. In result problems T313682, T313692 and T313785 occurred. The later was fixed in rPWBC31a31f7 (7.5.1) where the printf signed integer fields where replaced by string conversion type. Due to T313785 localized parameters where introduced in rPWBC5178710(7.5.2) to solve these issues; see the commit message for it.

NOTE: See also the compatibility matrix at T313785#8105043 and the documentation
IMPORTANT: Release 7.5.1 and 7.5.2 are only available via the repository. They are published at pypi with 7.6.0.

Conclusion

  • Using %(counter)d on fa-wiki always failed before 7.5.1
  • Using %(counter)s on fa-wiki works for fa-wiki but this conversion type was never inteded or documented. The latin numbers were used starting with 7.5.2
  • %(localcounter)s should be used to get localized numbers
  • The %(local... )s can be used for each field and it is up to the user to specify which field should be localized
  • I think it's up to the user who adds the template to decide to use latin or localized digits
  • reverting the "Force of latin digits" will break the current behaviour (also outside fa-wiki)
  • we could add a -localdigits option for bot owners to always determine localized digits (and to change the field then)
  • we could implement a fixes.py entry to change the fields - or use replace.py manually for it
  • ... (other ideas)
IMPORTANT: It is not recommended to use localized digits for any template variable. Default values will be used instead.

I understand what you're saying from technical perspective but I do not understand from usability or linguistics perspective. Specially the fact the Latin digit is the default in languages where digits are not Latin. I speak one of those languages and non-latin digits are not optional. What you're saying to me sounds like saying "by default all edit summaries will be in English in all wikis, if someone wants to run a bot with edit summary of local language they have to add an argument like -local-summary" or like saying "Case in German is optional and interchangeable because cases don't exist in English, So 'Ich habe der Tisch' is correct and 'Ich habe des Tisch' too".

Using localized digits is not a just i18n like an edit summary but a more or less individual L10N for the archive path. Always using localizes digits would break these paths if

  • NON_LATIN_DIGITS are expanded
  • site.lang changes for the given site.code
  • users prefer latin digits for their local archives because they might be non-natural speakers for the given wiki
  • there are already latin digits in the current archives paths

btw i18n is english by default and any other language within the fallback sequence is used if the message has a translation. Note: there is no fallback for the localized digits e.g. for azb, glk, mzn (currently).

@Xqt This would be a great chance to fix an old problem in constant names and documentation of pywikibot: 1, 2, 3 and so forth are not Latin digits, they are Arabic numerals. Latin numerls would be like X and I and D.

Technically speaking, there is no such thing as Latin digits, because Latin numerals are non-positional and not base-10 (for instance, IX or XC).

In case you are curious, what is used in Arabic and Persian (۱, ۲, ۳ and so forth) are called eastern Arabic numerals.

The action item would be to edit the translitartion module to rename NON_LATIN_DIGITS to something more appropriate such as NON_ARABIC_NUMERALS or ALTERNATE_NUMERALS, keep NON_LATIN_DIGITS for a while as a deprecated constant and show a deprecation warning for it, update all pywikibot code that references NON_LATIN_DIGITS to reference the new constant (there are only a handful), and ping the toolforge tools that are referencing NON_LATIN_DIGITS to also update their code (only a dozen or so changes).

@Ladsgroup in the patch above, you also are adding the phrase "non-Latin" to the documentation and I ask you to update the patch based on the above.

The action item would be to edit the translitartion module to rename NON_LATIN_DIGITS to something more appropriate such as NON_ARABIC_NUMERALS or ALTERNATE_NUMERALS, keep NON_LATIN_DIGITS for a while as a deprecated constant and show a deprecation warning for it, update all pywikibot code that references NON_LATIN_DIGITS to reference the new constant (there are only a handful), and ping the toolforge tools that are referencing NON_LATIN_DIGITS to also update their code (only a dozen or so changes).

Good catch. Also the textlib.to_latin_digits should be renamed then. We also can say ASCII digits and curious enought they are part of the Basic Latin Unicode Block

Xqt closed this task as Declined.EditedOct 17 2025, 12:04 PM

We already have implemented the %(local...)s variants for using non_ascii digits since release 7.3. There is no reason to revers this after more than 3 years.
https://doc.wikimedia.org/pywikibot/master/scripts_ref/scripts.html#module-scripts.archivebot

Change #1164698 abandoned by Xqt:

[pywikibot/core@master] archivebot: Revert the force of latin digits

Reason:

Declined T398146, see also https://doc.wikimedia.org/pywikibot/master/scripts_ref/scripts.html#module-scripts.archivebot

https://gerrit.wikimedia.org/r/1164698

I'm saddened that we (speakers of those languages) explain how wikis with non-latin digits operate and it gets ignored.

I'm saddened that we (speakers of those languages) explain how wikis with non-latin digits operate and it gets ignored.

It isn't ignored. You are abe to use the localized fields to use non-latin digits instead of ASCII digits, see https://doc.wikimedia.org/pywikibot/master/scripts_ref/scripts.html#module-scripts.archivebot

I disagree. The current implementation ignores how languages with non-latin digits work. Use of latin digits is plain wrong in wikis like Persian but instead of switching default, you're saying we need to update every single page to use localisoweek and similar. Every single page. And you're reasoning is this:

users prefer latin digits for their local archives because they might be non-natural speakers for the given wiki

That's still wrong. That's not how our wiki works and you're telling us it should work like that. As I said, you're ignoring our description of how non-latin digits are used and you've implemented a feature that doesn't satisfy those requirements based on how you perceive non-latin digits works and you're ignoring native speakers.

Your implementation with rPWBC0421498 never worked as documented; see my comment at T398146#10957597 for this and other issues. The localized parameters were introduced over three years ago, and reverting subsequent changes would introduce breaking changes for existing users.

Regarding the statement:

users prefer latin digits for their local archives ...

I never made that claim. My previous comment only outlined a conditional example — deciding between latin or localized digits is entirely up to the user managing their own namespace.

The current documented approach — using %(localcounter)s and other %(local...)s fields — was designed specifically to handle non-latin digits in a consistent, backwards-compatible way. The latin-digit behavior was kept to maintain compatibility across all wikis and should not be reversed. Reverting it would break existing functionality not just on fa-wiki but on any wiki using the same system. Btw, I am planning to implement support for other localized digits in wikis such as km, my, nqo, ta, th, and others with Pywikibot 11.

If a bot owner wants to enforce localized digits, this can be done via a -localdigits option (if desired and implemented but it will fail if the parameters aren't changed to string format codes accordingly), or manually with (user-)fixes.py/replace.py. The default template variables, as documented, should continue to use latin digits to ensure correct operation across all environments.

In short: the current system respects localized digits where intended, preserves backwards compatibility, and leaves control to the user; ignoring this would break documented behavior and cause widespread regressions.