Page MenuHomePhabricator

Usage of unicode_literals from __future__ package
Closed, ResolvedPublic

Assigned To
Authored By
XZise
Feb 15 2015, 11:56 AM
Referenced Files
None
Tokens
"Party Time" token, awarded by xSavitar."Like" token, awarded by Rubin16."Mountain of Wealth" token, awarded by Ladsgroup."Mountain of Wealth" token, awarded by Ricordisamoa.

Description

Hi, the code should use unicode_literals throughout to avoid using u'…' which is one the reasons why Python 3 versions before 3.3 aren't supported. @jayvdb said somewhere (determine), that he'd prefer to do that in one go. I suggest to atleast do it in all new code, so that changing u'…' into '…' won't break git blame for those files when it's introduced later. It is possible to use u'…' on conjunction with unicode_literals, but that could confuse newcomers into thinking that u'…' is necessary.

One primary advantage is that the usage of ur'…' has been disallowed in Python 3 so unicode_literals is a way around it, as that does allow to use r'…' with unicode chars inside.

In case we want to revert the unicode_literals patch 1e54a7d6 the following patches are solely fixing the original patch and could be reverted with that too:

  • a0a194d5ca54
  • 0c4197272821
  • 853e6b0bdce3
  • 4ca47e10ab24
  • 4426e881762b
  • 1a6d29b80c0c

Related Objects

Event Timeline

XZise raised the priority of this task from to Low.
XZise updated the task description. (Show Details)
XZise added subscribers: XZise, jayvdb.
Restricted Application added subscribers: Aklapper, Unknown Object (MLST). · View Herald TranscriptFeb 15 2015, 11:56 AM

I think it makes sense to add unicode_literals everywhere in a big bang before 2.0 RC1, and wait for the screams. This was discussed a little on https://gerrit.wikimedia.org/r/#/c/189914/2/scripts/cosmetic_changes.py

We can then remove u'...' over time, as part of regular code modifications.
I dont see that removing them particularly helps newcomers; writing a style guide for new developers is a better approach.

XZise set Security to None.

Regarding changing u'…': Obviously that doesn't need to happen independently. But (especially after unicode_literals have been imported) it's possible to change it when the line is changed anyway.

gerritbot subscribed.

Change 186962 had a related patch set uploaded (by XZise):
[IMPROV] Use unicode_literals in fixes

https://gerrit.wikimedia.org/r/186962

Patch-For-Review

Change 202373 had a related patch set uploaded (by XZise):
[IMPROV] Use unicode_literals

https://gerrit.wikimedia.org/r/202373

Change 202373 merged by jenkins-bot:
[IMPROV] Use unicode_literals

https://gerrit.wikimedia.org/r/202373

Hmm... it seems to be causing even more harm than benefits, doesn't it?

Well it discovered some flaws and afaik all patches after that fix the original patch work independently. So if it's really unusable we can still just revert that one patch.

But just for good measure I'll list all patches which are unicode_literals specific in the first post so that if we want to abandon that project we could revert everything of that (but leave the useful stuff).

@Dalba: could you explain why this is open again or better what is still to do here?

This task has never been closed as far as I can tell.

There is one subtask: T95810: config does differ between unicode and str which can be investigated independently.

The only thing that seems to be remaining here is to decide about the complete removal of the u prefixed string literals. I'm personally in favor of it, but if others disagree, this task can be closed.

The only thing that seems to be remaining here is to decide about the complete removal of the u prefixed string literals. I'm personally in favor of it, but if others disagree, this task can be closed.

I aggree now. This mixed types are harder to read imho.

Change 462270 had a related patch set uploaded (by D3r1ck01; owner: Alangi Derick):
[pywikibot/core@master] [cleanup] cleanup scripts/welcome.py

https://gerrit.wikimedia.org/r/462270

Change 462270 merged by jenkins-bot:
[pywikibot/core@master] [cleanup] cleanup scripts/welcome.py

https://gerrit.wikimedia.org/r/462270

Change 463604 had a related patch set uploaded (by D3r1ck01; owner: Alangi Derick):
[pywikibot/core@master] [cleanup] cleanup tests/[data_ingestion_tests.py-exceptions_tests.py]

https://gerrit.wikimedia.org/r/463604

Change 463604 merged by jenkins-bot:
[pywikibot/core@master] [cleanup] cleanup tests/[data_ingestion_tests.py-exceptions_tests.py]

https://gerrit.wikimedia.org/r/463604

@Xqt, do we still have instances of u'' or u"" in the code base to remove? Or has this been complete?

In T89589#4633212, @D3r1ck01 wrote:

@Xqt, do we still have instances of u'' or u"" in the code base to remove? Or has this been complete?

I guess not, seems it's completed

In T89589#4633212, @D3r1ck01 wrote:

@Xqt, do we still have instances of u'' or u"" in the code base to remove? Or has this been complete?

There are some remaining "bad quotes": https://integration.wikimedia.org/ci/job/pywikibot-core-tox-docker/4720/console

Change 463939 had a related patch set uploaded (by D3r1ck01; owner: Alangi Derick):
[pywikibot/core@master] [cleanup] cleanup some tests/ files

https://gerrit.wikimedia.org/r/463939

Change 463939 merged by jenkins-bot:
[pywikibot/core@master] [cleanup] cleanup some tests/ files

https://gerrit.wikimedia.org/r/463939

In T89589#4633483, @D3r1ck01 wrote:

@Xqt, anything left?

docs/conf.py is the last one I guess

Change 463954 had a related patch set uploaded (by D3r1ck01; owner: Alangi Derick):
[pywikibot/core@master] [cleanup] cleanup docs/conf.py

https://gerrit.wikimedia.org/r/463954

docs/conf.py is the last one I guess

Done! Since Pywikibot is to be added to codesearch, we could use it to also search though other repos in case we have some instances of this left? Maybe?

Change 463954 merged by jenkins-bot:
[pywikibot/core@master] [cleanup] cleanup docs/conf.py

https://gerrit.wikimedia.org/r/463954