Page MenuHomePhabricator

Introduce Python code formatters usage
Open, MediumPublic

Description

Context: during a discussion at Infrastructure Foundations team meeting the subject of tools automatically (re)formatting Python code came up.
This task is to track the rationale for having such tools for (new) Python code and implementation.

Pros:

  1. Authors don't have to think about code formatting
  2. Code formatting comments no longer required during code reviews
  3. Code readability is the same across authors

Cons:

  1. Local setup needed: formatter installation and editor integration at least
  2. Similar setup needed for integration with CI to fail if code isn't formatted as required
  3. If/when the code formatter output changes e.g. across upgrades we'll have to adjust for that

Open questions:

  1. Which formatter(s) to use?
  2. Require standard formatting for old and new code or only new or a mix depending on the codebase?
  3. How much local setup is needed depending on the formatter(s) and the editor(s)?
  4. Likely other questions

Event Timeline

ema triaged this task as Medium priority.Dec 12 2018, 10:48 AM
ema added a subscriber: ema.

Just to throw in, since this was discussed perhaps 2 weeks ago on the meeting, I installed Black with the width settings we normally use for Cumin and have been using it regularly through emacs (blacken plugin). It seems to do a nice job and produce expected output.

I agree that a Python formatter simplify the life of reviewers avoiding unnecessary comments over the style, but has also some draw backs as highlighted in the description.

I'd add to the Con #1 also the git pre-commit hook part, that is probably mandatory.

This article [1] seems to have a decent summary of current available tools with some examples.

On my side I've done a test on the cumin codebase with black. The results are:

  • all the ignore comments for pylint or any other validation tool were misplaced (moved to the last line when splitting) and require to manually move them to the first line [one off]
  • it doesn't pass flake8:
    • E203 whitespace before ':' (this seems a bug on their side, it's for a list slice _ARGV[index + 1 :]
    • E501 line too long (144 longest one)
  • it doesn't pass pylint (same thing for too long lines)

I personally don't like the output of black as I find it much less readable and more vertically spread than the original. We live in a world with wide but short screens so splitting into multiple lines means that less context will be readable in the same screen and the eye has to track much more indentation levels. Of course this is mostly subjective, looking forward to hear different opinions.

[1] https://medium.com/3yourmind/auto-formatters-for-python-8925065f9505

I like Black, but any formatter, as long as the barrier to entry is low, is a good idea.

The value to me is in a faster review cycle and less friction between author and reviewer. By handing control of style to a tool, it means less cognitive load at the critical point where thinking inductively about the impact to our systems and reconciling intent with the implementation is more important/valuable than where a line should (or shouldn't) split.

I don't have much to add besides saying I've been using Black (with VSCode) for all my new code here. I'd vote for it to be adopted if we had a poll but I understand the drawbacks mentioned for a larger codebase like ours. Just my 2c.

Alternatives would be autopep8 and yapf.

I like black too but from but from https://black.readthedocs.io/en/stable/installation_and_usage.html it tied to having python 3.6 installed.

Black can be installed by running pip install black. It requires Python 3.6.0+ to run

With stretch shipping with 3.4 (what do the Mac OS X versions do?) it might be a bit too restrictive to require it.

I am a bit wary of @Volans comment about vertical space usage increase. I haven't noticed that yet, but It's a point to consider indeed.

On my side I've done a test on the cumin codebase with black. The results are:

  • all the ignore comments for pylint or any other validation tool were misplaced (moved to the last line when splitting) and require to manually move them to the first line [one off]
  • it doesn't pass flake8:
    • E203 whitespace before ':' (this seems a bug on their side, it's for a list slice _ARGV[index + 1 :]

Do you have examples of these? I wonder whether these are architectural issues or (potentially transient) bugs that we need to report against upstream.

    • E501 line too long (144 longest one)
  • it doesn't pass pylint (same thing for too long lines)

I'm not sure why you're hitting this, perhaps local misconfiguration? Black's default line length is 88, but is one of the few things that is configurable (I use -l 80). But yes, we'll need to align black's, flake8's and pylint's line length expectations :) Flake8 is actually mentioned in Black's README with a rationale and various mitigations (e.g. using Bugbear).

I personally don't like the output of black as I find it much less readable and more vertically spread than the original. We live in a world with wide but short screens so splitting into multiple lines means that less context will be readable in the same screen and the eye has to track much more indentation levels.

Do you have an example of something that Black splits into multiple lines which would be (even subjectively!) more readable without doing that?

I like black too but from but from https://black.readthedocs.io/en/stable/installation_and_usage.html it tied to having python 3.6 installed.

Black can be installed by running pip install black. It requires Python 3.6.0+ to run

With stretch shipping with 3.4 (what do the Mac OS X versions do?) it might be a bit too restrictive to require it.

This was a problem for me, so what I did was build backports of Py3.6/7 to stretch. We've decided to put them in a separate component in our apt repository so that people can install them on their own systems (among other reasons).

So what you can do now is:

echo "deb http://apt.wikimedia.org/wikimedia stretch-wikimedia component/pyall" >> /etc/apt/sources.list
apt update
apt install python3.7 pipsi
pipsi install --python /usr/bin/python3.7 black
export PATH="~/.local/bin:$PATH" # and maybe put this to your .bashrc
black --help

(Instead of or in addition to pips one can use pre-commit, which supposedely handles the differing Python version and venv etc., but I haven't tried it)

On my side I've done a test on the cumin codebase with black. The results are:

  • all the ignore comments for pylint or any other validation tool were misplaced (moved to the last line when splitting) and require to manually move them to the first line [one off]
  • it doesn't pass flake8:
    • E203 whitespace before ':' (this seems a bug on their side, it's for a list slice _ARGV[index + 1 :]

Do you have examples of these? I wonder whether these are architectural issues or (potentially transient) bugs that we need to report against upstream.

Re-reading their docs it's intended behaviour and apparently is flake8 at fault here. See https://black.readthedocs.io/en/stable/the_black_code_style.html#slices

    • E501 line too long (144 longest one)
  • it doesn't pass pylint (same thing for too long lines)

I'm not sure why you're hitting this, perhaps local misconfiguration? Black's default line length is 88, but is one of the few things that is configurable (I use -l 80). But yes, we'll need to align black's, flake8's and pylint's line length expectations :) Flake8 is actually mentioned in Black's README with a rationale and various mitigations (e.g. using Bugbear).

The problem with long lines is:

Black will try to respect that. However, sometimes it won’t be able to without breaking other rules. In those rare cases, auto-formatted code will exceed your allotted limit.

Source: https://black.readthedocs.io/en/stable/the_black_code_style.html#line-length

So for example it generates this diff (with -l 120 -N -S):

-    'ls_partial_success_ratio_re':
-        r"[4-6]0\.0% \([2-3]/5\) success ratio \(< 100\.0% threshold\) for command: 'ls -la /tmp/maybe'\. Aborting.",
-    'ls_partial_success_threshold_ratio':
-        "60.0% (3/5) success ratio (>= 50.0% threshold) for command: 'ls -la /tmp/maybe'.",
+    'ls_partial_success_ratio_re': r"[4-6]0\.0% \([2-3]/5\) success ratio \(< 100\.0% threshold\) for command: 'ls -la /tmp/maybe'\. Aborting.",
+    'ls_partial_success_threshold_ratio': "60.0% (3/5) success ratio (>= 50.0% threshold) for command: 'ls -la /tmp/maybe'.",

that goes over the limit.

I personally don't like the output of black as I find it much less readable and more vertically spread than the original. We live in a world with wide but short screens so splitting into multiple lines means that less context will be readable in the same screen and the eye has to track much more indentation levels.

Do you have an example of something that Black splits into multiple lines which would be (even subjectively!) more readable without doing that?

DISCLAIMER: those are totally subjective examples.

  1. One example for the vertical verbosity (the old one is much less readable here in Phabricator as it wrap lines :( ):
-    parser.add_argument('-m', '--mode', choices=(sync_mode, async_mode),
-                        help=('Execution mode, required when there are multiple COMMANDS to be executed. In sync mode, '
-                              'execute the first command on all hosts, then proceed with the next one only if '
-                              '-p/--success-percentage is reached. In async mode, execute on each host independently '
-                              'from each other, the list of commands, aborting the execution on any given host at the '
-                              'first command that fails.'))
+    parser.add_argument(
+        '-m',
+        '--mode',
+        choices=(sync_mode, async_mode),
+        help=(
+            'Execution mode, required when there are multiple COMMANDS to be executed. In sync mode, '
+            'execute the first command on all hosts, then proceed with the next one only if '
+            '-p/--success-percentage is reached. In async mode, execute on each host independently '
+            'from each other, the list of commands, aborting the execution on any given host at the '
+            'first command that fails.'
+        ),
+    )
  1. Another vertical verbosity example
-        self.pbar_ok = tqdm(desc='PASS', total=num_hosts, leave=True, unit='hosts', dynamic_ncols=True,
-                            bar_format=colorama.Fore.GREEN + self.bar_format, file=sys.stderr)
+        self.pbar_ok = tqdm(
+            desc='PASS',
+            total=num_hosts,
+            leave=True,
+            unit='hosts',
+            dynamic_ncols=True,
+            bar_format=colorama.Fore.GREEN + self.bar_format,
+            file=sys.stderr,
+        )
  1. Exception formatting
-                raise InvalidQueryError(("Unable to parse the query '{query}' with the global grammar and no "
-                                         "default backend is set:\n{error}").format(query=query_string, error=e))
+                raise InvalidQueryError(
+                    (
+                        "Unable to parse the query '{query}' with the global grammar and no "
+                        "default backend is set:\n{error}"
+                    ).format(query=query_string, error=e)
+                )
  1. Quick logging (where it doesn't remove the now unnecessary parentheses):
-            self.logger.debug(("Provided batch_size '%d' is greater than the number of hosts '%d'"
-                               ", using '%d' as value"), batch_size, hosts_size, hosts_size)
+            self.logger.debug(
+                ("Provided batch_size '%d' is greater than the number of hosts '%d'" ", using '%d' as value"),
+                batch_size,
+                hosts_size,
+                hosts_size,
+            )
  1. Multiline if conditions:
-        if value is not None and (not isinstance(value, float) or
-                                  not (0.0 <= value <= 1.0)):  # pylint: disable=superfluous-parens
-            raise WorkerError("success_threshold must be a float beween 0 and 1, got '{value_type}': {value}".format(
-                value_type=type(value), value=value))
+        if value is not None and (
+            not isinstance(value, float) or not (0.0 <= value <= 1.0)
+        ):  # pylint: disable=superfluous-parens
+            raise WorkerError(
+                "success_threshold must be a float beween 0 and 1, got '{value_type}': {value}".format(
+                    value_type=type(value), value=value
+                )
+            )
  1. Shrinks things that because of the context might be more readable in multilines, for example dependencies, so that are also more explicit with diff when modifying them:
-    'with-openstack': [
-        'keystoneauth1>=2.4.1',
-        'python-keystoneclient>=2.3.1',
-        'python-novaclient>=3.3.1',
-    ],
-
+    'with-openstack': ['keystoneauth1>=2.4.1', 'python-keystoneclient>=2.3.1', 'python-novaclient>=3.3.1'],

I've tried experimenting how CI integration would look like for repositories that choose to enforce code formatting, e.g. with tox and for example https://pypi.org/project/isort/ + https://pypi.org/project/black/.

[testenv:formattercheck]
skipsdist = True
skip_install = True
basepython = python3.6
deps =
 black
 isort
commands =
  isort --check --line-width 88 --skip-glob '.tox' --skip-glob '.eggs'
  black --check --line-length 88 --exclude '/(\.eggs|\.git|\.hg|\.mypy_cache|\.nox|\.tox|\.venv|_build|buck-out|build|dist)/' .

Change 494259 had a related patch set uploaded (by Gilles; owner: Gilles):
[integration/config@master] Add CI for new performance/asoranking repo

https://gerrit.wikimedia.org/r/494259

Not Python-related but noting it here, I came across this shell formatter which would achieve the same but for shell https://github.com/mvdan/sh

As I've seen some efforts to use black also in other part of the org, ideally it would be nice to have a single way to set it up:

  • black configuration (line length, quotes)
  • tox environment to ensure the code is black'ed in CI
  • easy as much automatic as possible way to integrate it into the development (IDE/editor configs, git hook, etc.)

Agreed with what @Volans said, I've been working on tox-wikimedia (https://pypi.org/project/tox-wikimedia/) which will hopefully be able to contain and be the the authoritative way to set up whichever code formatter we end up using. My POC patch (since black is being used in cloud services repos) is https://gerrit.wikimedia.org/r/c/integration/tox-wikimedia/+/566407

On the topic of code formatters in general, I'm +1, but +0 on black. I'm not a fan of the "uncompromising" philosophy, because I end up compromising when trying to read code rather than when I'm writing it (and I do more of the former). One of my friends created lavender, which alleviated most of the things I don't like about black's imposed code style.

Change 554827 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] ci - flake8: update flake8 rules to be compatible with black

https://gerrit.wikimedia.org/r/554827

Change 554825 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] CI - black: update python3 files with black

https://gerrit.wikimedia.org/r/554825

Change 554826 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] CI - black: run black over python2 files

https://gerrit.wikimedia.org/r/554826

Change 553487 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] CI - taskgen: add black tests for python2 and python3 files

https://gerrit.wikimedia.org/r/553487

Change 621724 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/software/wmfmariadbpy@master] Add 'black' formatter support.

https://gerrit.wikimedia.org/r/621724

Change 621724 merged by jenkins-bot:
[operations/software/wmfmariadbpy@master] Add 'black' formatter support.

https://gerrit.wikimedia.org/r/621724

[follow up from a chat in #wikimedia-databases]
Another aspect that should be clarified is how to manage changes introduced by the code formatter itself between different versions of the formatter. Both pinning + manual update or random failures of CI don't seem optimal solutions.

FWIW, WMCS uses black with line-length set to 80 for all python and has for a little while. In non-puppet repos, we have tox check for it (any line length other than 80 generates failures). https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/Python_coding
Our team doesn't agree on text editors, so I can't say we have anything to offer on the setup part of the equation.

Just noting here that we would like not to have CI call it all broken in the future, but if we have to change it all in puppet as a special case, we can. We run checks for it with tox outside of puppet like:

[testenv:black]
description = check black formatter
basepython = {[default]basepython}
commands = black -l 80 -t py37 --check \
           maintain_kubeusers \
           maintain_kubeusers.py
deps = black

Puppet is a more widely shared space, so we generally just try to make sure the output is ok for flake8 after black hits it.

Change 554825 abandoned by Jbond:
[operations/puppet@production] CI - black: update python3 files with black

Reason:
This change is now IMO too old. to much will have changed and we should regenerate this CR from scratch if we want to move ahead wit it

https://gerrit.wikimedia.org/r/554825

Change 554826 abandoned by Jbond:
[operations/puppet@production] CI - black: run black over python2 files

Reason:
This change is now IMO too old. to much will have changed and we should regenerate this CR from scratch if we want to move ahead wit it

https://gerrit.wikimedia.org/r/554826

I went ahead and did a try using lavender on spicerack repo:

About some of the concerns:

Another aspect that should be clarified is how to manage changes introduced by the code formatter itself between different versions of the formatter. Both pinning + manual update or random failures of CI don't seem optimal solutions.

This gets handled by checking only the files that changed while developing, and only the last commit diff while on CI.

black configuration (line length, quotes)
tox environment to ensure the code is black'ed in CI
easy as much automatic as possible way to integrate it into the development (IDE/editor configs, git hook, etc.)

Using the same line length than the rest of the checkers (in the pyproject.toml standard file).
Tox environment is there and working, one for checking, one for formatting (that you can run as a pre-commit hook, but that's up to the developer). Also tried re-using the same docker image that CI uses but it seems that the docker image might need some changes (skip git init and such).
As of right now, black is pretty well supported on all major (and some minor) IDEs and editors (https://github.com/psf/black/blob/master/docs/editor_integration.md), lavender sholud be a drop-in replacement (afaik).
It's a spell to revert to black again if needed.

On the topic of code formatters in general, I'm +1, but +0 on black. I'm not a fan of the "uncompromising" philosophy, because I end up compromising when trying to read code rather than when I'm writing it (and I do more of the former). One of my friends created lavender, which alleviated most of the things I don't like about black's imposed code style.

I don't really have a strong opinion on what imposed code style to use, so I used lavender. I must say though that black has way more support and a very wide adoption, so will interact better with most tools.

I say let's try in a few repos, tweak anything if needed, and continue from there, I think that inaction is more harmful in this case (>2 years already). Unless someone has any strong arguments against this, I'll start sending patches to some of the repos (spicerack, cookbooks, cumin, as those are the ones I plan on using the most), so we can start polishing what options and how to use them.

Change 659785 had a related patch set uploaded (by David Caro; owner: David Caro):
[operations/software/spicerack@master] style: this introduces lavender as autoformatter

https://gerrit.wikimedia.org/r/659785

Okok, lavender when using single quotes also changes the docstrings, and that upsets pep257, so I'll revert to black for now as it does not conflict.

I had another pass at black and also a long chat with @dcaro about it (thanks for resurfacing this task).

TL;DR

We've decided to test black on Spicerack, tweak as needed the integration with local environments, CI and other linters/checkers in the repository.
We'll see how that goes and evaluate next steps from there.

Current plan
  1. Make the current linters/checkers configuration compatible with black, see https://github.com/psf/black/blob/master/docs/compatible_configs.md
  2. Configure black with a pyproject.toml file, see also https://github.com/psf/black#pyprojecttoml :
    • For now using as line length the same value present in the repository
    • Setting the python target version to the minimum python version supported by the project
    • Use force exclude to exclude a couple of fixtures files that have on purpose an invalid python syntax. The choice of force exclude is to avoid to hardcode the default values for the exclude parameter that has already a long list of automatically excluded files and directories.
  3. Apply black to the whole repo along with its configuration in a standalone commit that has only changes applied by black (to be merged together with the rest of the commits for the other changes required)
  4. Add a .git-blame-ignore-revs file with the SHA1 of the commit that applies black to the whole repo to allow git blame to skip it using --ignore-revs-file.
  5. Setup an optional tox environment that allows to run black on the whole repo or the files passed to the command.
    • Investigate if it's safer to use a separate virtualenv to prevent dependency issues with the projects dependencies
    • Try to have a single environment name but use whatever Python version is available on the system, should work with any of the supported python version and also in environments where multiple python versions are available.
    • Decide where to put the black dependency to prevent issues related to having different black versions between different environments (local, CI) and issues related to https://github.com/tox-dev/tox/issues/149
    • Decide if it's better to pin the black version or not. Pro: pinning ensure same output for all and no surprises, not pinning allow to follow the natural evolution of the project, that so far is making a release every 6 months or so. Con: pinning requires to remember to manually push the version for each new release, not pinning might lead to CI failing and/or unwanted changes on an unrelated commit
  6. Add the documentation to install a pre-commit hook that runs the tox black environment to black the code before committing it
  7. Add a tox environment that runs black --check to be run as part of local tox and in CI, to check both the whole project or the modified files passed as arguments
  8. Investigate if there is a way to customize the Clone with commit-msg hook command in Gerrit UI so that when cloning the repository also the pre-commit hook is installed.
  9. Test and document editor/IDE integration, in particular to make sure that black would read the same configuration and it would be at the same version of the one used by CI. See also https://github.com/psf/black#editor-integration
  10. Contact all current contributors to the project pointing them to the documentation to make their existing checkout compatible with the new setup

And then reiterate as needed based on the results and feedbacks.

Rationale
PROs
  • Having a code formatter is better than not having one
  • black is the one adopted by the PSF (Python Software Foundation) despite some unpopular choices
  • Other formatters are not widely used or allow for a very fine-grained configuration that defies the purpose of having the same style across the industry (like gofmt for Golang)
CONs
  • black is still considered beta by its developers, and as such might change the generated output. But there are discussions going on on the definition of beta, see https://github.com/psf/black/issues/1256
  • the generated output is not perceived super readable by some (me included)
  • has still some issues moving around comments because they are not part of the AST, in particular when splitting a line that has a pylint/mypy/etc ignore comment. But this happens mostly only at conversion time, should not happen much in the future as most of those ignores are added after the development phase and hence most likely after black has already formatted the code.
  • black choice of using double quotes is quite unpopular, given that Python itself still defaults to single quotes for strings that don't contain a double quote:
>>> "string"
'string'
>>> "string's"
"string's"

Thanks for progressing this just wanted to make one note

Investigate if there is a way to customize the Clone with commit-msg hook command in Gerrit UI so that when cloning the repository also the pre-commit hook is installed.

I honestly don't think this is something worth investigating. This in my opinion is a git anti pattern, clones by design are meant to be different then the source (this is not svn) and you shouldn't be able to impose a policy on it. On a more practice point what ever we do in gerrit would also need to be possible in gitlab.

A couple more things that we discussed and agreed:

  • Use isort for import autoformatting (probably disable pylint checks for them).
  • Disable docstring checks (py257) from the tests repository, and try relaxing it for the rest.

Change 662783 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/spicerack@master] documentation: add a development page

https://gerrit.wikimedia.org/r/662783

I had another pass at black and also a long chat with @dcaro about it (thanks for resurfacing this task).

TL;DR

We've decided to test black on Spicerack, tweak as needed the integration with local environments, CI and other linters/checkers in the repository.
We'll see how that goes and evaluate next steps from there.

I'm excited to see how this goes :)

...

  • Decide if it's better to pin the black version or not. Pro: pinning ensure same output for all and no surprises, not pinning allow to follow the natural evolution of the project, that so far is making a release every 6 months or so. Con: pinning requires to remember to manually push the version for each new release, not pinning might lead to CI failing and/or unwanted changes on an unrelated commit

I like pinning because you never have the annoyance of submitting a patch, it failing for the entirely unrelated reason of upstream making a new release. As for as manually updating, we could teach LibUp to update black for us.

...

  • the generated output is not perceived super readable by some (me included)

Me too, and this is my biggest worry. I think we spend way more time reading code than writing it and should optimize for the case of reading rather than writing.

Change 659785 merged by jenkins-bot:
[operations/software/spicerack@master] style: this introduces black+isort as autoformatter

https://gerrit.wikimedia.org/r/659785

Change 662783 merged by jenkins-bot:
[operations/software/spicerack@master] documentation: add a development page

https://gerrit.wikimedia.org/r/662783

Change 662904 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/spicerack@master] git: exclude black refactor from git blame

https://gerrit.wikimedia.org/r/662904

Change 662904 merged by Volans:
[operations/software/spicerack@master] git: exclude black refactor from git blame

https://gerrit.wikimedia.org/r/662904

FYI i had to apply a fix to get the following setting to work let g:black_skip_string_normalization = 1