Page MenuHomePhabricator

list(dict.keys()) is not deterministic
Closed, ResolvedPublic

Description

list(dict.keys()) is not deterministic and the order can be different in Python 2.7 and Python 3.7.

For Python 3.7+ the insertion order is respected and part of the Python spec.
For Python 3.6 the insertion order is respected but might not be safe
For previous versions including 2.7 the insertion order is not respected.

https://stackoverflow.com/questions/5629023/key-order-in-python-dictionaries

In result, methods using the order of dict keys may have different results in Python 3.6+ and previous versions.

Problematic methods using the oder of a dict keys() which must be inspected:

  • i18n.translate()
  • cosmetic_changes.fixArabicLetters() (has only one entry)
  • page.WikibasePage.get()
  • site.APISite.upload()
  • casechecker.CaseChecker.PickTarget()
  • page_tests.TestLinkObject.testNamespaces()

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript
Xqt triaged this task as High priority.Apr 4 2019, 12:54 PM
Xqt updated the task description. (Show Details)

Hmm....

  • if list(dict.keys())[0] is broken
  • list(dict)[0] will be broken too
  • next(iter(dict)) will be broken too
  • and the only hope would be to order these dicts or deprecate Python 2 - 3.5 soon

Hmm....

  • and the only hope would be to order these dicts or deprecate Python 2 soon

I just looked for the token .keys())[ but you are right there might be other problematic issues too. Unfortunately the difference is between Python 3.6/3.7+ and previous releases but I also don't know whether the results of 2.7 - 3.5 are equal.

The solution is using OrderedDict in such cases for Python < 3.7.

Unfortunately the difference is between Python 3.6/3.7+ and previous releases but I also don't know whether the results of 2.7 - 3.5 are equal.

Perhaps they are (I tried py2.7 and 3.5 now), but the order is random for each dict:

>>> d={'en': 'hi', 'de': 'hallo', 'cs': 'ahoj', 'pl': 'czeszcz'}
>>> d
{'cs': 'ahoj', 'de': 'hallo', 'en': 'hi', 'pl': 'czeszcz'}
>>> d={'banana': 20, 'apple': 40, 'coconut': 60, 'apricot': 1}
>>> d
{'apricot': 1, 'coconut': 60, 'banana': 20, 'apple': 40}
>>> d={'banana': 20, 'apple': 40, 'coconut': 60, 'apricot': 1, 'air': 0, 'cherry': 55}
>>> d
{'coconut': 60, 'apple': 40, 'apricot': 1, 'air': 0, 'cherry': 55, 'banana': 20}

We do not know, what the first key is in py2 - 3.5. And it changes every time the dict is changed (e.g. new translation is added to the code)

For i18n.translate, I would avoid using first key as a fallback at all. We would need to move all the translation dicts to ordered because of this fallback. Is that fallback necessary? If fallback = False I would fail/return None if code was not found. If fallback = True and dict doesn't have _default or en, I would fail that dict is purely written.

For i18n.translate, I would avoid using first key as a fallback at all. We would need to move all the translation dicts to ordered because of this fallback. Is that fallback necessary? If fallback = False I would fail/return None if code was not found. If fallback = True and dict doesn't have _default or en, I would fail that dict is purely written.

I agree with your proposal for this case. In i18n.translate it should not be necessary to use an unsprecific value (see the doc: If none of the options gives result, we just take the one language from xdict which may not be always the same - which is weird). Normally i18n.translate should never have a fallback because this is a L10N and not a i18n method. If a fallback is made I guess this could be moved to twn then.

Change 589557 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Do not return a random i18n.translation() result

https://gerrit.wikimedia.org/r/589557

Change 589557 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Do not return a random i18n.translation() result

https://gerrit.wikimedia.org/r/589557

list(dict.keys()) is not deterministic

list(dict.values()), list(dict.items()), dict.popitem(), list(set), set.pop() as well. In general, any iteration over a dict or set is non-deterministic.

Unfortunately the difference is between Python 3.6/3.7+ and previous releases but I also don't know whether the results of 2.7 - 3.5 are equal.

<3.3 and 3.3-3.5 have different results due to "hash randomization" added in 3.3 (see the second gray box).

Xqt claimed this task.

I think there is nothing left to do after we moved to Python 3.6