diff mbox series

[v3,3/3] SubmittingPatches: explain why we care about log messages

Message ID 20220127190259.2470753-4-gitster@pobox.com (mailing list archive)
State Accepted
Commit cdba0295b0a70cf7a8113dd74be90d11ceddd5f7
Headers show
Series None | expand

Commit Message

Junio C Hamano Jan. 27, 2022, 7:02 p.m. UTC
Extend the "describe your changes well" section to cover whom we are
trying to help by doing so in the first place.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/SubmittingPatches | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Comments

Emily Shaffer March 4, 2022, 12:10 a.m. UTC | #1
On Thu, Jan 27, 2022 at 11:02:59AM -0800, Junio C Hamano wrote:
> 
> Extend the "describe your changes well" section to cover whom we are
> trying to help by doing so in the first place.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/SubmittingPatches | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 7225a0fb52..a6121d1d42 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -110,6 +110,35 @@ run `git diff --check` on your changes before you commit.
>  [[describe-changes]]
>  === Describe your changes well.
>  
> +The log message that explains your changes is just as important as the
> +changes themselves.  Your code may be clearly written with in-code
> +comment to sufficiently explain how it works with the surrounding
> +code, but those who need to fix or enhance your code in the future
> +will need to know _why_ your code does what it does, for a few
> +reasons:
> +
> +. Your code may be doing something differently from what you wanted it
> +  to do.  Writing down what you actually wanted to achieve will help
> +  them fix your code and make it do what it should have been doing
> +  (also, you often discover your own bugs yourself, while writing the
> +  log message to summarize the thought behind it).
> +
> +. Your code may be doing things that were only necessary for your
> +  immediate needs (e.g. "do X to directories" without implementing or
> +  even designing what is to be done on files).  Writing down why you
> +  excluded what the code does not do will help guide future developers.
> +  Writing down "we do X to directories, because directories have
> +  characteristic Y" would help them infer "oh, files also have the same
> +  characteristic Y, so perhaps doing X to them would also make sense?".
> +  Saying "we don't do the same X to files, because ..." will help them
> +  decide if the reasoning is sound (in which case they do not waste
> +  time extending your code to cover files), or reason differently (in
> +  which case, they can explain why they extend your code to cover
> +  files, too).
> +
> +The goal of your log message is to convey the _why_ behind your
> +change to help future developers.
> +

This is pretty compelling. Is it clear enough why we care about this in
the commit message, as opposed to in the patch description (cover letter
or post-"---" blurb)? Is it too obvious to explicitly mention that the
commit message is the first thing we try to make sense of during a 'git
blame' or 'git bisect'?

Either way, I think this is an improvement on the doc as it was before.

Reviewed-by: Emily Shaffer <emilyshaffer@google.com>

>  The first line of the commit message should be a short description (50
>  characters is the soft limit, see DISCUSSION in linkgit:git-commit[1]),
>  and should skip the full stop.  It is also conventional in most cases to
> -- 
> 2.35.0-177-g7d269f5170
>
Junio C Hamano March 4, 2022, 12:29 a.m. UTC | #2
Emily Shaffer <emilyshaffer@google.com> writes:

>> +The goal of your log message is to convey the _why_ behind your
>> +change to help future developers.
>> +
>
> This is pretty compelling. Is it clear enough why we care about this in
> the commit message, as opposed to in the patch description (cover letter
> or post-"---" blurb)? Is it too obvious to explicitly mention that the
> commit message is the first thing we try to make sense of during a 'git
> blame' or 'git bisect'?

Having to say "this may be better in the in-code comment rather than
the log message" to some patch recently (I do not remember), I tend
to agree that some guidance would help people decide between the two
(or writing both).

Again, patches welcome ;-)
Ævar Arnfjörð Bjarmason March 4, 2022, 9:52 a.m. UTC | #3
On Thu, Mar 03 2022, Junio C Hamano wrote:

> Emily Shaffer <emilyshaffer@google.com> writes:
>
>>> +The goal of your log message is to convey the _why_ behind your
>>> +change to help future developers.
>>> +
>>
>> This is pretty compelling. Is it clear enough why we care about this in
>> the commit message, as opposed to in the patch description (cover letter
>> or post-"---" blurb)? Is it too obvious to explicitly mention that the
>> commit message is the first thing we try to make sense of during a 'git
>> blame' or 'git bisect'?
>
> Again, patches welcome ;-)

I think for something that's a stylistic preference I'd see why Emily
would try to see how you feel about it first.

I.e. in this project it's ultimately up to you to decide on those
things, so for coding guidelines you've just updated I'd probably do the
same.

Don't take that as a complaint on the end result b.t.w., I think the
overall excellent state of the codebase in this project speaks for
itself on that front.

I just see why other contributors would be gun shy about pulling the
trigger on a patch submission on this particular front :)

> Having to say "this may be better in the in-code comment rather than
> the log message" to some patch recently (I do not remember), I tend
> to agree that some guidance would help people decide between the two
> (or writing both).

[A somewhat sneaky $subject change :)]

I think you're referring to this comment on (some admittedly tricky)
code I wrote[1].

First, given the above I'll adjust that to your preferences on a
re-submission. So don't take this as some argument on *that* specific
point. I'll add a comment in a re-roll.

But on reflection I still wouldn't put a comment on that code if it were
purely up to me. Why?

First, I think all programmers go through a phase of learning where they
feel more compelled to comment on the "how" v.s. "why" early on.

At the most extremes beginner programmers explaining how say a common
standard library function works in code that's relatively
straightforward.

But that's really not the case in [1], that code really is doing
something odd and worth explaining. So why not add a comment?

Because access to ubiquitous and *local* source control from git changed
a lot of people's habits on this front, including mine. For this code I
*would* definitely add a comment there if it was the pre-DVCS[2] days.

But I'd say that today my criteria for adding a comment is closer to:

    Is this so essential to note for the understanding of the rest of
    this code that nobody who's skimming past this line or reading this
    part *wouldn't* want this information?

    Or rather, they might not absolutely want to know, but it might be
    useful, *and* there's nothing odd about the pattern itself that
    makes you go "hrm?" enough to run "git blame/log" on it.

In this case it's clearly pretty weird that we run the exact "test-tool
regex" command twice. But I think that "hrm?" applies. To elaborate:

 A. It's setting up a self-contained prereq, so it's not essential
    to understand that implementation detail to read the rest of the
    code.

 B. Anyone who does want to see why that odd case is the way it is can
    run some version "git blame" or say:

        git log -p -L14,21:t/t7812-grep-icase-non-ascii.sh

 C. Each comment you add, even within a function or other scope dilutes
    the value of other more important comments. It trains people not to
    read them, as they're probably not that important.

 D. Comments that are "frozen in time" by adding them to the code
    itself are almost always in danger of drifting in accuracy from the
    rest of the implementation and its assumptions.

    Even in well-curated codebases like git.git they're *much more*
    likely to drift away from the "ground truth" than code is.

 E. Even if "D" isn't true, commit messages (in this case my [3]) are
    almost always at an advantage over comments in that they accompany
    a change to the pre-image.

    So e.g. that commit message doesn't need to waste time explaining
    what pattern we'd prefer not to have here instead of the post-image,
    you get that context for free.

Due to a combination of "D" and "E" I almost never read comments in
their current context, unless it's painfully obvious that they *must*
still apply to the current code. I'll usually run some variant of "git
blame" or say:

    git log -p -L:relevant_function:file

And then see how the code looked when that comment was added, and page
through what's changed since then.

1. https://lore.kernel.org/git/xmqqsfryah42.fsf@gitster.g/
2. To those who'd nitpick DVCS v.s. VCS: Yes nothing changed in theory from CVS/SVN
   etc. on this front, but in practice it did.
   
   Access to version control didn't tend to be ubiquitous, and even if it was
   accessible many people browsing or contributing to your code would probably
   do so via a downloaded tarball than trying to get CVS or whatever to work.
   So the "D" in DVCS really did change this.
3. https://lore.kernel.org/git/patch-12.15-f3cc5bc7eb9-20220302T171755Z-avarab@gmail.com/
Junio C Hamano March 4, 2022, 7:41 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Mar 03 2022, Junio C Hamano wrote:
>
>> Emily Shaffer <emilyshaffer@google.com> writes:
>>
>>>> +The goal of your log message is to convey the _why_ behind your
>>>> +change to help future developers.
>>>> +
>>>
>>> This is pretty compelling. Is it clear enough why we care about this in
>>> the commit message, as opposed to in the patch description (cover letter
>>> or post-"---" blurb)? Is it too obvious to explicitly mention that the
>>> commit message is the first thing we try to make sense of during a 'git
>>> blame' or 'git bisect'?
>>
>> Again, patches welcome ;-)
>
> I think for something that's a stylistic preference I'd see why Emily
> would try to see how you feel about it first.

I do not think there is a need to write down stylistic preference.
I may show my preference during my reviews, of course, but I won't
take preference-only things as a blocker.

I also do not think things like "'We used to do X here but we do Y
because ...' does not belong to in-code comment, but to log message"
is "stylistic preference", and if people are unclear about, I agree
that we should spell it out.

An example, I can think of offhand, of what should be in comment,
whether we also write in the log, is "We do X here because that
other code expects us to", when it is tricky to figure out by
reading the code by itself without going back to "git blame".

"git blame" certainly can be used to figure out which commit touched
the line that does X (which is hard to figure out why), and the log
message can refer to "that other code expects us to", but that is an
extra operation.

Also, when we really need to figure out, it is wonderful that we can
ask "blame" to give us the commit, and can look at "that other code"
in the same commit by checking it out to the working tree,
especially when "that other code" may have drifted and the original
reasoning no longer applies (iow, what we find out from "git blame"
may become stale, and it will stay stale forever because we cannot
rewrite the history that old).

Now, it is certainly not black/white decision to say what is and
what is not tricky to figure out in the code.  We shouldn't be
commenting obvious things.  Two yardsticks I use are

 (1) if reviewers raise questions during a review, it may indicate
     that it is worth commenting.

 (2) if an earlier round of the same series had a bug around the
     area, it may indicate that the fixed code is worth commenting.

but the way I use them is more to say "I found this uncommented code
somewhat tricky---but nobody asked a question and it has stayed the
same from the initial round, so it may be clear enough for other
people, and after all I managed to figure out myself, so it probably
is OK to leave it uncommented".
Junio C Hamano March 4, 2022, 9:30 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> I also do not think things like "'We used to do X here but we do Y
> because ...' does not belong to in-code comment, but to log message"

Sorry for too many nagations.  "I also do not" -> "I also do".
diff mbox series

Patch

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 7225a0fb52..a6121d1d42 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -110,6 +110,35 @@  run `git diff --check` on your changes before you commit.
 [[describe-changes]]
 === Describe your changes well.
 
+The log message that explains your changes is just as important as the
+changes themselves.  Your code may be clearly written with in-code
+comment to sufficiently explain how it works with the surrounding
+code, but those who need to fix or enhance your code in the future
+will need to know _why_ your code does what it does, for a few
+reasons:
+
+. Your code may be doing something differently from what you wanted it
+  to do.  Writing down what you actually wanted to achieve will help
+  them fix your code and make it do what it should have been doing
+  (also, you often discover your own bugs yourself, while writing the
+  log message to summarize the thought behind it).
+
+. Your code may be doing things that were only necessary for your
+  immediate needs (e.g. "do X to directories" without implementing or
+  even designing what is to be done on files).  Writing down why you
+  excluded what the code does not do will help guide future developers.
+  Writing down "we do X to directories, because directories have
+  characteristic Y" would help them infer "oh, files also have the same
+  characteristic Y, so perhaps doing X to them would also make sense?".
+  Saying "we don't do the same X to files, because ..." will help them
+  decide if the reasoning is sound (in which case they do not waste
+  time extending your code to cover files), or reason differently (in
+  which case, they can explain why they extend your code to cover
+  files, too).
+
+The goal of your log message is to convey the _why_ behind your
+change to help future developers.
+
 The first line of the commit message should be a short description (50
 characters is the soft limit, see DISCUSSION in linkgit:git-commit[1]),
 and should skip the full stop.  It is also conventional in most cases to