Page MenuHomePhabricator

Add support for commit message trailers that GitLab markdown will render on individual lines
Open, Needs TriagePublic

Description

commit-message-validator requires that multiple task statements appear like so:

Bug: T1234
Bug: T4321

However, GitLab's markdown runs these lines together, so to make things work you instead have to write:

Bug: T1234 \
Bug: T4321

(With a trailing \ at the end of each line except the last.)

Can CMV be tweaked to allow these?

Details

ReferenceSource BranchDest BranchAuthorTitle
lucaswerkmeister-wmde/test!2mrmainlucaswerkmeister-wmdeUpdate README again
lucaswerkmeister-wmde/test!1prmainlucaswerkmeister-wmdeUpdate README
Customize query in GitLab

Event Timeline

@Jdforrester-WMF Do you have some examples of this happening that I can take a look at in GitLab? I did not see this issue when testing commit messages like https://gitlab.wikimedia.org/bd808/cmv-mr-test/-/commit/7b6aa1d7c70383847eafb7abd82cb7d00f7a5f8a.

Changing the regex to allow an optional trailing \ pair is relatively trivial, but this is also the sort of change that will pollute the actual git commit metadata with codeforge specific hacks that may have other consequences.

Thanks for the link. I can see in the commit history for the function-orchestrator repo that this trailing slash is a convention that has been used multiple times, but not universally. 29a60e4a50b5f902c5c6098db308715a66dae395 is one example where the git commit message is "clean":

$ git log --pretty -n 1 29a60e4a50b5f902c5c6098db308715a66dae395
commit 29a60e4a50b5f902c5c6098db308715a66dae395
Author: Cory Massaro <cmassaro@wikimedia.org>
Date:   Tue Jul 11 15:55:27 2023 +0000

    Update function-schemata sub-module to HEAD (4927eba) (BREAKING - changes some test outputs)

    New changes:
     - `46ed833` utils: Make various minor code coverage improvements
     - `4927eba` Support degenerate quoted objects in the mixed validator, normalization, and canonicalization.

    Bug: T336499
    Bug: T338315
    Bug: T340770

    repos/abstract-wiki/wikifunctions/function-orchestrator!39

I can also see that the associated MR's description visually collapses these footer lines when the feature branch commit message has been copied into the MR description:

Screen Shot 2023-11-14 at 3.58.03 PM.png (183×617 px, 36 KB)

This footer merging seems to be purely cosmetic as the 29a60e4a50b5f902c5c6098db308715a66dae395 commit message resulting from landing the MR with what I assume is a fast-forward merge with the feature branch squashed has each footer on a separate line as desired.

T351253#9332725 helps me understand what is happening, and broadly why. Now the question is how should we adjust either CMV or the conventions used in the function-orchestrator repo to get back to a place where the linter can be used?

@Jdforrester-WMF's opening of this bug seems to be a vote for adjusting the commit message's idea of what a Bug: ... footer should look like so that MR descriptions in GitLab look nicer when there are multiple Bug: ... footers. I assume this would actually need to be expanded generally to support a trailing \ forced line break for all commit message footer lines except the penultimate footer line.

An alternative would be to do nothing and let MR descriptions like repos/abstract-wiki/wikifunctions/function-orchestrator!39 look a bit weird when rendered as markdown by the GitLab UI knowing that if a squash or non-fast-forward merge is used to land the MR things will look normal in the resulting git commit message.

Something a bit in between would be to allow per-repository configuration when running the linter so that each repo could choose one form or the other. It may be difficult to make that configuration live in only one place if the repo is using CMV in both GitLab-CI and locally via pre-commit. This also opens the Pandora's box of per-repo configurable rules, but that is something that I started to prepare for in rICMVf9bfb67cf64b: Refactor into rules based system, so it's at least in the realm of possibility.

I personally lean towards ignoring MR message rendering weirdness, but I also think that both merge commits and feature branch squashing are bad ideas so repos where I'm the decider probably will never hit this problem. Where should we try to find consensus on how to move forward? https://www.mediawiki.org/wiki/Talk:Gerrit/Commit_message_guidelines doesn't seem to get much activity (I just replied to a question there that had been untouched for a year). Maybe a wikitech-l post requesting feedback from folks who really care here?

My preference would be to use \n\n instead of \\n.

Bug: T1234

Bug: T4321

This produces a paragraph break, so slightly less nice in the GitLab interface but doesn't contaminate any of the data.

Second preference would be \n, which produces the same results as \\\n but whitespace is more likely to already be ignored by other parsers. The downside here is that it may cause editors to complain about trailing whitespace, and structural whitespace tends to be a bad idea.

My opinion:

  • Allow the backslash as well as an empty line (as suggested above), but don't require either.
  • However, enforce a space before the backslash in case a backslash is used. I find this important. Otherwise people would wonder what T1234\ is.
  • I was thinking about disallowing the backslash in the last line. But I realized this is identical to the question in T222042: Add sniff to require trailing commas in multiline arrays. The loose consensus seems to be that the extra (but technically not needed) comma at the end is usually a good idea. I think we should do the same here and allow the backslash in the last line, but don't require it either.

My main argument is that it's readable either way, with or without the backslash. We don't really win anything when we block the backslash, nor do we win anything when we enforce it:

  • Sure, finding a backslash in a raw commit message might be a bit confusing. But come on. It's not like the Bug: T1234 before the backslash becomes harder to read because of the backslash. We will get used to it.
  • Sure, Bug: T1234 Bug: T4321 in one line looks a little odd. But come on. It's not like you can miss the second bug number. Especially not when it's auto-linked (which I really, really hope we can add).

Speaking of auto-linking: If it turns out we can do this, it should as well be possible to auto-add the missing linebreak. This discussion would be obsolete then.

In general I think we should prioritize over what the end result in the Git repository will be, instead of how GitLab decides to parse the message. (The MR description field is entirely separate from the commit message, it just happens to get autofilled from a commit message. GitLab, or our tools around it like gerritlab, could just as well format the commit message in a way that this problem would not exist.) So using a backslash should not be mandatory. I would prefer that I could enforce that it's not there on my personal projects, but I also totally understand if you don't want to introduce configuration to CMV where it currently does not have any.

My preference would be to use \n\n instead of \\n.

Please do not do this, this means that git interpret-trailers is no longer able to parse anything but the last one.

My preference would be to use \n\n instead of \\n.

Please do not do this, this means that git interpret-trailers is no longer able to parse anything but the last one.

Agreed, we already have this problem on Gerrit (people accidentally put a blank line between Bug: and Change-Id: and then Gerritbot never links the task).

Second preference would be \n, which produces the same results as \\\n but whitespace is more likely to already be ignored by other parsers. The downside here is that it may cause editors to complain about trailing whitespace, and structural whitespace tends to be a bad idea.

I would also suggest double space rather than space-and-backslash; ideally, GitLab would even trim the trailing whitespace automatically when it constructs the merge commit. (Git’s default commit --cleanup mode does this; I’ve been unable to test GitLab’s behavior so far – in my test repo, it just put “see merge request” into the merge commit message, rather than the full MR body.)

Oops, and I did not realize those would be linked, sorry about the noise. I’ll use a non-Bug: trailer if I test any further.

Sorry for the terrible suggestion, but how difficult would be to change the gitlab parser to parse Bug: footers differently and link them to phabricator/render nicely on gitlab? I think it would be important if that is something that could potentially happen in the future or not- My guess is that is a standard that was inherited from bugzilla times and that at some point was implemented into gerrit (?).

It would be also nice to know how it gets rendered on other software- sadly the project doesn't appear on phab diffusion/gitiles, but it renders well on GitHub: https://github.com/wikimedia/mediawiki-extensions-WikiLambda/commit/897023647e2f0ccf8090a7cb0206b77e0ea4fbad

Sorry for the terrible suggestion, but how difficult would be to change the gitlab parser to parse Bug: footers differently and link them to phabricator/render nicely on gitlab?

A big part of the point of moving from gerrit to GitLab is to use more standard, well-supported tools. Forking them, especially for minor cosmetic things, seems like a regressive step?

It would be also nice to know how it gets rendered on other software- sadly the project doesn't appear on phab diffusion/gitiles, but it renders well on GitHub: https://github.com/wikimedia/mediawiki-extensions-WikiLambda/commit/897023647e2f0ccf8090a7cb0206b77e0ea4fbad

What you've linked is the conceptually-same commit on a different project (the WikiLambda extension), which because it's on gerrit doesn't use \\n but just \, so indeed renders without any mark. If we put the GitLab-required \\n in, it would render it, I believe.

The \ continuation marker is merely a hack to fit Gitlab poor processing of git commit as markdown. Git has some expectations regarding those "headers" which are known has trailers. They can be parsed or amended using git interpret-trailers.

Given:

Summary line

Bug: T1234 \
Bug: T4321

Which can be generated by echo -ne 'Summary line\n\nBug: T1234 \\\nBug: T4321\n'. You can then parse them with --parse:

$ echo -ne 'Summary line\n\nBug: T1234 \\\nBug: T4321\n'|git interpret-trailers --parse
Bug: T1234 \
Bug: T4321

Or add a new one with --trailer <key>=<value>:

$ echo -ne 'Summary line\n\nBug: T1234 \\\nBug: T4321\n'|git interpret-trailers --trailer Bug=7777
Summary line

Bug: T1234 \
Bug: T4321
Bug: 7777

The trailing \ is imho invalid regarding how trailers are managed by git. I guess that should stay invalid.


If Gitlab misbehaves when rendering trailers, I guess we should fix it? If I understand it properly, the issue is when crafting the merge request, Gitlab reuses the commit message as the merge request body which is then processed as markdown. I guess when nit does the conversation, it could process the trailers and split them with a double new line when creating the text for the merge request?

In my experience commit messages are more frequently read via git log and friends then in the browser, so I would rather optimize for that view and also keep them compatible with git interpret-trailers. One of the benefits of opensource is of course contributing upstream. I would expect us to be willing to contribute bug reports, feature request, and even code to the primary tool which we use to manage our source code. Also git trailers themselves are pretty ubiquitous, so I would assume gitlab would be interested in improving their use with their product.

Is it feasible to allow multiple but numbers in a single trailer?

Bug: T1234, T4321

and/or

Bug: T1234 T4321

Is it feasible to allow multiple but numbers in a single trailer?

Bug: T1234, T4321

and/or

Bug: T1234 T4321

Yes, this is technically possible. Such a change would have some ripple effects beyond CMV in that other systems processing Bug: Tnnnn trailers would also need to be updated. It also would not solve all potential commit message trailer issues. Any of the other expected trailers could also cause the visual collapse under markdown rendering.

Yes, this is technically possible. Such a change would have some ripple effects beyond CMV in that other systems processing Bug: Tnnnn trailers would also need to be updated. It also would not solve all potential commit message trailer issues. Any of the other expected trailers could also cause the visual collapse under markdown rendering.

Good point!

I personally read commit messages via 'git log' a lot more than the browser and would prefer optimizing for that experience. But, as Thiemo suggests, it might be a reasonable compromise to not treat " \" as invalid but just not require them as mandatory so repos / teams can decide their own convention without needing per-repo configuration.

It seems like a bad idea to me to break universally accepted git conventions for minuscule and fairly subjective presentation improvements. It harms interoperability - right now if some tool wants to associate commits to tasks, it can do something like git log <commit> -1 --pretty='%(trailers:key=Bug,valueonly)' to get the bug number. Breaking that because someone prefers multiple pointless lines over one pointless line in a merge request description (it's not linked, and you presumably don't memorize all task IDs, so what's the value of it being there at all?) seems like a poor tradeoff.
`

My current read on comments here is that there is more support for "clean" git {log,show} output than there is for pretty markdown output.

If the trailing double space solution mentioned by @AntiCompositeNumber in T351253#9365740 actually works for the markdown view that seems like a reasonable compromise. CMV could support trailing whitespace by adding something like {0,2} (match 0, 1, or 2 space characters) or ( )? (match 2 space characters 0 or 1 times) to various trailer value rules.

If the trailing double space solution mentioned by @AntiCompositeNumber in T351253#9365740 actually works for the markdown view that seems like a reasonable compromise.

Bug: T1234

Bug: T4321

That breaks the spec, cgit determines the start of the trailers section by iterating through the lines from the end of the commit message and stop when it encounters a newline:

$ echo -ne 'Summary line\n\nBug: T1234\n\nBug: T4321'|git interpret-trailers --parse
Bug: T4321

I wanted to write an example using Phabricator markdown, but it behaves differently when it comes to newlines \n which become <br/>. Gitlab respects the "spec" at https://commonmark.org/ for which there is a live interpreter at https://spec.commonmark.org/dingus/ . Entering:

Hello first paragraph

Bug: T1234
Bug: T4321

Is rendered as:

Hello first paragraph

Bug: T1234 Bug: T4321

My guess is the process in Gitlab which converts the git commit raw text to markdown should be taught to recognize the trailer and insert an explicit newlines for rendering.

If the trailing double space solution mentioned by @AntiCompositeNumber in T351253#9365740 actually works for the markdown view that seems like a reasonable compromise.

Bug: T1234

Bug: T4321

But that’s not the trailing double space solution. Trailing double space looks like this:

Hello first paragraph

Bug: T1234  
Bug: T4321

And it seems to work fine in CommonMark:

image.png (515×1 px, 30 KB)

My guess is the process in Gitlab which converts the git commit raw text to markdown should be taught to recognize the trailer and insert an explicit newlines for rendering.

That would be nice, but whether we can make it happen is another question…

@Lucas_Werkmeister_WMDE sorry I have misquoted the part that proposed Bug: XXX space space when I really wanted to quote T351253#9365740 which is about the double newlines: Bug: XXX Bug: YYY which would cause git to stop finding trailers at the double newlines mark.

Note Gerrit does support multiple tasks being passed on a single line Bug: T1234, T3444 and it would ping both tasks. I have crafted a test change with Bug: T357182, T357183 which poked the two tasks:

Change 1002623 had a related patch set uploaded (by Hashar; author: Hashar):

[test/gerrit-ping@master] Gerrit inlined multiple trackingid

https://gerrit.wikimedia.org/r/1002623

Change 1002623 had a related patch set uploaded (by Hashar; author: Hashar):

[test/gerrit-ping@master] Gerrit inlined multiple trackingid

https://gerrit.wikimedia.org/r/1002623

That is how GitLab works as well and that was proposed above by Ahmon but does not address rendering of other trailers which would have a line per entry T351253#9375891.

So I am back to the root cause: GitLab should be taught to handle trailers differently.

bd808 renamed this task from Add support for GitLab markdown linebreak requirement to Add support for commit message trailers that GitLab markdown will render on individual lines.Tue, Feb 13, 5:10 PM