Page MenuHomePhabricator

list(dict.keys()) is not deterministic
Open, HighPublic

Description

list(dict.keys()) is not deterministic and the order can be different in Python 2.7 and Python 3.7.

For Python 3.7+ the insertion order is respected and part of the Python spec.
For Python 3.6 the insertion order is respected but might not be safe
For previous versions including 2.7 the insertion order is not respected.

https://stackoverflow.com/questions/5629023/key-order-in-python-dictionaries

In result, methods using the order of dict keys may have different results in Python 3.6+ and previous versions.

Problematic methods using the oder of a dict keys() which must be inspected:

  • i18n.translate()
  • cosmetic_changes.fixArabicLetters() (has only one entry)
  • page.WikibasePage.get()
  • site.APISite.upload()
  • casechecker.CaseChecker.PickTarget()
  • page_tests.TestLinkObject.testNamespaces()

Event Timeline

Xqt created this task.Apr 4 2019, 12:53 PM
Restricted Application added a project: Pywikibot. · View Herald TranscriptApr 4 2019, 12:53 PM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript
Xqt triaged this task as High priority.Apr 4 2019, 12:54 PM
Xqt updated the task description. (Show Details)
Dvorapa added a comment.EditedApr 4 2019, 1:02 PM

Hmm....

  • if list(dict.keys())[0] is broken
  • list(dict)[0] will be broken too
  • next(iter(dict)) will be broken too
  • and the only hope would be to order these dicts or deprecate Python 2 - 3.5 soon
Xqt added a comment.Apr 4 2019, 1:09 PM

Hmm....

  • and the only hope would be to order these dicts or deprecate Python 2 soon

I just looked for the token .keys())[ but you are right there might be other problematic issues too. Unfortunately the difference is between Python 3.6/3.7+ and previous releases but I also don't know whether the results of 2.7 - 3.5 are equal.

The solution is using OrderedDict in such cases for Python < 3.7.

Dvorapa added a comment.EditedApr 4 2019, 1:16 PM

Unfortunately the difference is between Python 3.6/3.7+ and previous releases but I also don't know whether the results of 2.7 - 3.5 are equal.

Perhaps they are (I tried py2.7 and 3.5 now), but the order is random for each dict:

>>> d={'en': 'hi', 'de': 'hallo', 'cs': 'ahoj', 'pl': 'czeszcz'}
>>> d
{'cs': 'ahoj', 'de': 'hallo', 'en': 'hi', 'pl': 'czeszcz'}
>>> d={'banana': 20, 'apple': 40, 'coconut': 60, 'apricot': 1}
>>> d
{'apricot': 1, 'coconut': 60, 'banana': 20, 'apple': 40}
>>> d={'banana': 20, 'apple': 40, 'coconut': 60, 'apricot': 1, 'air': 0, 'cherry': 55}
>>> d
{'coconut': 60, 'apple': 40, 'apricot': 1, 'air': 0, 'cherry': 55, 'banana': 20}

We do not know, what the first key is in py2 - 3.5. And it changes every time the dict is changed (e.g. new translation is added to the code)

For i18n.translate, I would avoid using first key as a fallback at all. We would need to move all the translation dicts to ordered because of this fallback. Is that fallback necessary? If fallback = False I would fail/return None if code was not found. If fallback = True and dict doesn't have _default or en, I would fail that dict is purely written.

Xqt added a comment.Apr 4 2019, 2:47 PM

For i18n.translate, I would avoid using first key as a fallback at all. We would need to move all the translation dicts to ordered because of this fallback. Is that fallback necessary? If fallback = False I would fail/return None if code was not found. If fallback = True and dict doesn't have _default or en, I would fail that dict is purely written.

I agree with your proposal for this case. In i18n.translate it should not be necessary to use an unsprecific value (see the doc: If none of the options gives result, we just take the one language from xdict which may not be always the same - which is weird). Normally i18n.translate should never have a fallback because this is a L10N and not a i18n method. If a fallback is made I guess this could be moved to twn then.

Sure and for this we have T110856

Change 589557 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Do not return a random i18n.translation() result

https://gerrit.wikimedia.org/r/589557

Change 589557 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Do not return a random i18n.translation() result

https://gerrit.wikimedia.org/r/589557

Xqt updated the task description. (Show Details)Apr 27 2020, 1:19 PM
Xqt updated the task description. (Show Details)May 26 2020, 9:09 AM

list(dict.keys()) is not deterministic

list(dict.values()), list(dict.items()), dict.popitem(), list(set), set.pop() as well. In general, any iteration over a dict or set is non-deterministic.

Unfortunately the difference is between Python 3.6/3.7+ and previous releases but I also don't know whether the results of 2.7 - 3.5 are equal.

<3.3 and 3.3-3.5 have different results due to "hash randomization" added in 3.3 (see the second gray box).