diff mbox series

stripspace: allow -s/-c outside git repository

Message ID 20181217165957.GA60293@google.com (mailing list archive)
State New, archived
Headers show
Series stripspace: allow -s/-c outside git repository | expand

Commit Message

Jonathan Nieder Dec. 17, 2018, 4:59 p.m. UTC
v2.11.0-rc3~3^2~1 (stripspace: respect repository config, 2016-11-21)
improved stripspace --strip-comments / --comentlines by teaching them
to read repository config, but it went a little too far: when running
stripspace outside any repository, the result is

	$ git stripspace --strip-comments <test-input
	fatal: not a git repository (or any parent up to mount point /tmp)

That makes experimenting with the stripspace command unnecessarily
fussy.  Fix it by discovering the git directory gently, as intended
all along.

Reported-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 builtin/stripspace.c  |  3 ++-
 t/t0030-stripspace.sh | 12 +++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

Comments

Martin Ågren Dec. 18, 2018, 6:09 a.m. UTC | #1
On Mon, 17 Dec 2018 at 22:56, Jonathan Nieder <jrnieder@gmail.com> wrote:
> That makes experimenting with the stripspace command unnecessarily
> fussy.  Fix it by discovering the git directory gently, as intended
> all along.

>         if (mode == STRIP_COMMENTS || mode == COMMENT_LINES) {
> -               setup_git_directory_gently(NULL);
> +               setup_git_directory_gently(&nongit);
>                 git_config(git_default_config, NULL);
>         }

Makes me wonder if `setup_git_directory_gently()` should BUG if it
receives NULL. That would require some reshuffling in setup.c, since
`setup_git_directory()` calls out to it with NULL... Just thinking out
loud. In any case, and more importantly, this is the last user providing
NULL except for the one in `setup_git_directory()`.

> diff --git a/t/t0030-stripspace.sh b/t/t0030-stripspace.sh
> index 5ce47e8af5..0c24a0f9a3 100755
> --- a/t/t0030-stripspace.sh
> +++ b/t/t0030-stripspace.sh
> @@ -430,9 +430,15 @@ test_expect_success '-c with changed comment char' '
>  test_expect_success '-c with comment char defined in .git/config' '
>         test_config core.commentchar = &&
>         printf "= foo\n" >expect &&
> -       printf "foo" | (
> -               mkdir sub && cd sub && git stripspace -c
> -       ) >actual &&
> +       rm -fr sub &&
> +       mkdir sub &&
> +       printf "foo" | git -C sub stripspace -c >actual &&
> +       test_cmp expect actual
> +'

A small while-at-it conversion from subshell (with a funny pipe into it)
to `-C sub`. The `rm -fr` invocation is not in the original, so
shouldn't be needed. This one looks safe enough, though it makes me
wonder why you need it. `mkdir -p sub`? Probably not worth a v2.

> +
> +test_expect_success '-c outside git repository' '
> +       printf "# foo\n" >expect &&
> +       printf "foo" | nongit git stripspace -c >actual &&
>         test_cmp expect actual
>  '

Nice.

Martin
Johannes Schindelin Dec. 18, 2018, 11:58 a.m. UTC | #2
Hi Jonathan,

On Mon, 17 Dec 2018, Jonathan Nieder wrote:

> v2.11.0-rc3~3^2~1 (stripspace: respect repository config, 2016-11-21)
> improved stripspace --strip-comments / --comentlines by teaching them
> to read repository config, but it went a little too far: when running
> stripspace outside any repository, the result is
> 
> 	$ git stripspace --strip-comments <test-input
> 	fatal: not a git repository (or any parent up to mount point /tmp)
> 
> That makes experimenting with the stripspace command unnecessarily
> fussy.  Fix it by discovering the git directory gently, as intended
> all along.
> 
> Reported-by: Han-Wen Nienhuys <hanwen@google.com>
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---

ACK!

>  builtin/stripspace.c  |  3 ++-
>  t/t0030-stripspace.sh | 12 +++++++++---
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/stripspace.c b/builtin/stripspace.c
> index bdf0328869..be33eb83c1 100644
> --- a/builtin/stripspace.c
> +++ b/builtin/stripspace.c
> @@ -30,6 +30,7 @@ int cmd_stripspace(int argc, const char **argv, const char *prefix)
>  {
>  	struct strbuf buf = STRBUF_INIT;
>  	enum stripspace_mode mode = STRIP_DEFAULT;
> +	int nongit;
>  
>  	const struct option options[] = {
>  		OPT_CMDMODE('s', "strip-comments", &mode,
> @@ -46,7 +47,7 @@ int cmd_stripspace(int argc, const char **argv, const char *prefix)
>  		usage_with_options(stripspace_usage, options);
>  
>  	if (mode == STRIP_COMMENTS || mode == COMMENT_LINES) {
> -		setup_git_directory_gently(NULL);
> +		setup_git_directory_gently(&nongit);

What a fool I was to believe that _gently() was always gentle...

>  		git_config(git_default_config, NULL);
>  	}
>  
> diff --git a/t/t0030-stripspace.sh b/t/t0030-stripspace.sh
> index 5ce47e8af5..0c24a0f9a3 100755
> --- a/t/t0030-stripspace.sh
> +++ b/t/t0030-stripspace.sh
> @@ -430,9 +430,15 @@ test_expect_success '-c with changed comment char' '
>  test_expect_success '-c with comment char defined in .git/config' '
>  	test_config core.commentchar = &&
>  	printf "= foo\n" >expect &&
> -	printf "foo" | (
> -		mkdir sub && cd sub && git stripspace -c
> -	) >actual &&
> +	rm -fr sub &&
> +	mkdir sub &&
> +	printf "foo" | git -C sub stripspace -c >actual &&
> +	test_cmp expect actual
> +'

Sneaky. But I like it.

Thanks,
Dscho

> +
> +test_expect_success '-c outside git repository' '
> +	printf "# foo\n" >expect &&
> +	printf "foo" | nongit git stripspace -c >actual &&
>  	test_cmp expect actual
>  '
>  
> -- 
> 2.20.0.405.gbc1bbc6f85
> 
>
Johannes Schindelin Dec. 18, 2018, noon UTC | #3
Hi Martin,

On Tue, 18 Dec 2018, Martin Ågren wrote:

> On Mon, 17 Dec 2018 at 22:56, Jonathan Nieder <jrnieder@gmail.com> wrote:
> > That makes experimenting with the stripspace command unnecessarily
> > fussy.  Fix it by discovering the git directory gently, as intended
> > all along.
> 
> >         if (mode == STRIP_COMMENTS || mode == COMMENT_LINES) {
> > -               setup_git_directory_gently(NULL);
> > +               setup_git_directory_gently(&nongit);
> >                 git_config(git_default_config, NULL);
> >         }
> 
> Makes me wonder if `setup_git_directory_gently()` should BUG if it
> receives NULL. That would require some reshuffling in setup.c, since
> `setup_git_directory()` calls out to it with NULL... Just thinking out
> loud. In any case, and more importantly, this is the last user providing
> NULL except for the one in `setup_git_directory()`.

We could rename `setup_git_directory_gently()` to
`setup_git_directory_gently_2()`, make that `static` and then call it from
`setup_git_directory_gently()` and `setup_git_directory()`.

Ciao,
Dscho
Ævar Arnfjörð Bjarmason Dec. 19, 2018, 2:02 p.m. UTC | #4
On Mon, Dec 17 2018, Jonathan Nieder wrote:

> v2.11.0-rc3~3^2~1 (stripspace: respect repository config, 2016-11-21)

Minor nit not just on this patch, but your patches in general: I think
you're the only one using this type of template instead of the `%h
("%s", %ad)` format documented in SubmittingPatches.

I've had at least a couple of cases where I've git log --grep=<abbr sha>
and missed a commit of yours when you referred to another commit.

E.g. when composing
https://public-inbox.org/git/878t0lfwrj.fsf@evledraar.gmail.com/ I
remembered PERLLIB_EXTRA went back & forth between
working/breaking/working with your/my/your patch, so:

    git log --grep=0386dd37b1

Just found the chain up to my breaking change, but not your 7a7bfc7adc,
which refers to that commit as v1.9-rc0~88^2.

Maybe this is really a feature request. I.e. maybe we should have some
mode where --grep=<commitish> will be combined with some mode where we
try to find various forms of <commitish> in commit messages, then
normalize & match them....
Duy Nguyen Dec. 19, 2018, 5:11 p.m. UTC | #5
On Wed, Dec 19, 2018 at 6:04 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, Dec 17 2018, Jonathan Nieder wrote:
>
> > v2.11.0-rc3~3^2~1 (stripspace: respect repository config, 2016-11-21)
>
> Minor nit not just on this patch, but your patches in general: I think
> you're the only one using this type of template instead of the `%h
> ("%s", %ad)` format documented in SubmittingPatches.
>
> I've had at least a couple of cases where I've git log --grep=<abbr sha>
> and missed a commit of yours when you referred to another commit.
>
> E.g. when composing
> https://public-inbox.org/git/878t0lfwrj.fsf@evledraar.gmail.com/ I
> remembered PERLLIB_EXTRA went back & forth between
> working/breaking/working with your/my/your patch, so:
>
>     git log --grep=0386dd37b1
>
> Just found the chain up to my breaking change, but not your 7a7bfc7adc,
> which refers to that commit as v1.9-rc0~88^2.
>
> Maybe this is really a feature request. I.e. maybe we should have some
> mode where --grep=<commitish> will be combined with some mode where we
> try to find various forms of <commitish> in commit messages, then
> normalize & match them....

To follow email model, this sounds like a good trailer for related
commits, like In-Reply-To for email. We could even have special
trailer "Fixes" to indicate what commit is the problem that this
commit fixes.
SZEDER Gábor Dec. 19, 2018, 5:38 p.m. UTC | #6
On Wed, Dec 19, 2018 at 03:02:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 17 2018, Jonathan Nieder wrote:
> 
> > v2.11.0-rc3~3^2~1 (stripspace: respect repository config, 2016-11-21)
> 
> Minor nit not just on this patch, but your patches in general: I think
> you're the only one using this type of template instead of the `%h
> ("%s", %ad)` format documented in SubmittingPatches.

Or the '%h (%s, %ad)' format, which is more widely used in Git's
history, and those double quotes don't add any value whatsoever.
Jeff King Dec. 19, 2018, 6:22 p.m. UTC | #7
On Wed, Dec 19, 2018 at 03:02:14PM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Dec 17 2018, Jonathan Nieder wrote:
> 
> > v2.11.0-rc3~3^2~1 (stripspace: respect repository config, 2016-11-21)
> 
> Minor nit not just on this patch, but your patches in general: I think
> you're the only one using this type of template instead of the `%h
> ("%s", %ad)` format documented in SubmittingPatches.
> 
> I've had at least a couple of cases where I've git log --grep=<abbr sha>
> and missed a commit of yours when you referred to another commit.
> 
> E.g. when composing
> https://public-inbox.org/git/878t0lfwrj.fsf@evledraar.gmail.com/ I
> remembered PERLLIB_EXTRA went back & forth between
> working/breaking/working with your/my/your patch, so:
> 
>     git log --grep=0386dd37b1
> 
> Just found the chain up to my breaking change, but not your 7a7bfc7adc,
> which refers to that commit as v1.9-rc0~88^2.
> 
> Maybe this is really a feature request. I.e. maybe we should have some
> mode where --grep=<commitish> will be combined with some mode where we
> try to find various forms of <commitish> in commit messages, then
> normalize & match them....

That would help when you're using --grep, but not other things that are
trying to parse the commit message. Two instances I've noticed:

  - web interfaces like GitHub won't linkify this type of reference
    (whereas they will for something that looks like a hex object id)

  - my terminal makes it easy to select hex strings, but doesn't
    understand this git-describe output :)

These tools _could_ be taught a regex like /v(\d+.)(-rc\d+)?([~^]+d)*/.
But matching hex strings is a lot simpler, and works across many
projects.

So I agree with you that this git-describe format is less convenient for
readers, but my preferred solution is to use a different format, rather
than try to teach every reading tool to be more clever.

As far as I can tell, the main advantage of using "v2.11.0-rc3~3^2~1"
over its hex id is that it gives a better sense in time of which Git
version we're talking about.  The date in the parentheses does something
similar for wall-clock time, but it's left to the reader to map that to
a Git version if they choose.

Personally, I find the wall-clock time to be enough, since usually what
I want to know is "how ancient is this". And in the rare instance that I
care about the containing version, it's not a big deal to use "git tag
--contains".  If we really want to convey that information in the text,
I think it would be reasonable to say something like:

  In commit 1234abcd (the subject line, 2016-01-01, v2.11.0), we did
  blah blah blah

with a few simple rules:

  - only mention a single version, the oldest one that contains the
    commit[1]. If it's in v2.11.1, we can infer that it's in v2.12.0,
    etc.

  - only mention released commits; for the granularity we're talking
    about here, the distinction between v2.11.0 and v2.11.0-rc3 is not
    important

  - if it hasn't been in a released version, don't include a version at
    all.

That's probably over-engineering, and I'm perfectly fine with the
oid/subject/date format most people use. Just trying to give an option
if people really think the tag name is useful.

-Peff

[1] I usually compute the containing version with:

      $ git help has
      'has' is aliased to '!f() { git tag --contains "$@" | grep ^v | grep -v -- -rc | sort -V | head -1; }; f'

    though I suspect it could be done these days with fewer processes
    using "tag --sort".
Jonathan Nieder Dec. 19, 2018, 6:39 p.m. UTC | #8
Hi,

Jeff King wrote:

>   - web interfaces like GitHub won't linkify this type of reference
>     (whereas they will for something that looks like a hex object id)
>
>   - my terminal makes it easy to select hex strings, but doesn't
>     understand this git-describe output :)
>
> These tools _could_ be taught a regex like /v(\d+.)(-rc\d+)?([~^]+d)*/.
> But matching hex strings is a lot simpler, and works across many
> projects.

Is there some rule about how long the hex string has to be for this to
work?

[...]
>   In commit 1234abcd (the subject line, 2016-01-01, v2.11.0), we did
>   blah blah blah

The issue with this is that it is ambiguous about what the tag name is
referring to: does that mean that "git describe" and "git version"
tell me that v2.11.0 is the nearest *previous* release to that commit
or that "git name-rev" tells me that v2.11.0 is a nearby *subsequent*
release that contains it?

Of course the latter is the only answer that's useful, but in practice
the former is what people tend to do when they are trying to follow a
convention like this.  So we'd need some automatic linting to make it
useful.

I think a more promising approach is the Fixes trailer Duy mentioned,
which has been working well for the Linux kernel project.  I'll follow
up in a reply to his message.

Thanks,
Jonathan
Ævar Arnfjörð Bjarmason Dec. 19, 2018, 6:52 p.m. UTC | #9
On Wed, Dec 19 2018, Jeff King wrote:

> On Wed, Dec 19, 2018 at 03:02:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, Dec 17 2018, Jonathan Nieder wrote:
>>
>> > v2.11.0-rc3~3^2~1 (stripspace: respect repository config, 2016-11-21)
>>
>> Minor nit not just on this patch, but your patches in general: I think
>> you're the only one using this type of template instead of the `%h
>> ("%s", %ad)` format documented in SubmittingPatches.
>>
>> I've had at least a couple of cases where I've git log --grep=<abbr sha>
>> and missed a commit of yours when you referred to another commit.
>>
>> E.g. when composing
>> https://public-inbox.org/git/878t0lfwrj.fsf@evledraar.gmail.com/ I
>> remembered PERLLIB_EXTRA went back & forth between
>> working/breaking/working with your/my/your patch, so:
>>
>>     git log --grep=0386dd37b1
>>
>> Just found the chain up to my breaking change, but not your 7a7bfc7adc,
>> which refers to that commit as v1.9-rc0~88^2.
>>
>> Maybe this is really a feature request. I.e. maybe we should have some
>> mode where --grep=<commitish> will be combined with some mode where we
>> try to find various forms of <commitish> in commit messages, then
>> normalize & match them....
>
> That would help when you're using --grep, but not other things that are
> trying to parse the commit message. Two instances I've noticed:
>
>   - web interfaces like GitHub won't linkify this type of reference
>     (whereas they will for something that looks like a hex object id)

I wonder if we had some canonical plumbing combination of to `git
cat-file -p` and/or a utility like git-interpret-trailers that would
take a commit message and spew out BEGIN/END/SHA-1 positions for
commitish that we found whether sites like GitHub would use it.

They'd still need to do a second pass to for any of their own markup,
e.g. the elsewhere@<commitish> syntax, or referring to PRs/MRs issues
etc....

>   - my terminal makes it easy to select hex strings, but doesn't
>     understand this git-describe output :)
>
> These tools _could_ be taught a regex like /v(\d+.)(-rc\d+)?([~^]+d)*/.
> But matching hex strings is a lot simpler, and works across many
> projects.
>
> So I agree with you that this git-describe format is less convenient for
> readers, but my preferred solution is to use a different format, rather
> than try to teach every reading tool to be more clever.
>
> As far as I can tell, the main advantage of using "v2.11.0-rc3~3^2~1"
> over its hex id is that it gives a better sense in time of which Git
> version we're talking about.  The date in the parentheses does something
> similar for wall-clock time, but it's left to the reader to map that to
> a Git version if they choose.
>
> Personally, I find the wall-clock time to be enough, since usually what
> I want to know is "how ancient is this". And in the rare instance that I
> care about the containing version, it's not a big deal to use "git tag
> --contains".  If we really want to convey that information in the text,
> I think it would be reasonable to say something like:
>
>   In commit 1234abcd (the subject line, 2016-01-01, v2.11.0), we did
>   blah blah blah
>
> with a few simple rules:
>
>   - only mention a single version, the oldest one that contains the
>     commit[1]. If it's in v2.11.1, we can infer that it's in v2.12.0,
>     etc.
>
>   - only mention released commits; for the granularity we're talking
>     about here, the distinction between v2.11.0 and v2.11.0-rc3 is not
>     important
>
>   - if it hasn't been in a released version, don't include a version at
>     all.
>
> That's probably over-engineering, and I'm perfectly fine with the
> oid/subject/date format most people use. Just trying to give an option
> if people really think the tag name is useful.
>
> -Peff
>
> [1] I usually compute the containing version with:
>
>       $ git help has
>       'has' is aliased to '!f() { git tag --contains "$@" | grep ^v | grep -v -- -rc | sort -V | head -1; }; f'
>
>     though I suspect it could be done these days with fewer processes
>     using "tag --sort".
Martin Ågren Dec. 19, 2018, 9:52 p.m. UTC | #10
On Tue, 18 Dec 2018 at 13:00, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> > Makes me wonder if `setup_git_directory_gently()` should BUG if it
> > receives NULL. That would require some reshuffling in setup.c, since
> > `setup_git_directory()` calls out to it with NULL... Just thinking out
> > loud. In any case, and more importantly, this is the last user providing
> > NULL except for the one in `setup_git_directory()`.
>
> We could rename `setup_git_directory_gently()` to
> `setup_git_directory_gently_2()`, make that `static` and then call it from
> `setup_git_directory_gently()` and `setup_git_directory()`.

Thanks for the hint. I'm currently recuperating from having been lost in
a nearby corner of setup.c, so I´ll leave this tightening as a
low-hanging fruit for "someone else."

Martin
Jonathan Nieder Dec. 19, 2018, 10:14 p.m. UTC | #11
Hi,

Duy Nguyen wrote:
> On Wed, Dec 19, 2018 at 6:04 PM Ævar Arnfjörð Bjarmason

>> E.g. when composing
>> https://public-inbox.org/git/878t0lfwrj.fsf@evledraar.gmail.com/ I
>> remembered PERLLIB_EXTRA went back & forth between
>> working/breaking/working with your/my/your patch, so:
>>
>>     git log --grep=0386dd37b1
>>
>> Just found the chain up to my breaking change, but not your 7a7bfc7adc,
>> which refers to that commit as v1.9-rc0~88^2.
[...]
> To follow email model, this sounds like a good trailer for related
> commits, like In-Reply-To for email. We could even have special
> trailer "Fixes" to indicate what commit is the problem that this
> commit fixes.

In Linux kernel land, Documentation/process/submitting-patches.rst
contains the following:

-- >8 --
If your patch fixes a bug in a specific commit, e.g. you found an issue using
``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
the SHA-1 ID, and the one line summary.  For example::

	Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()")

The following ``git config`` settings can be used to add a pretty format for
outputting the above style in the ``git log`` or ``git show`` commands::

	[core]
		abbrev = 12
	[pretty]
		fixes = Fixes: %h (\"%s\")
-- 8< --

I like it because (1) the semantics are clear, (2) it's very concrete
(e.g. "first 12 characters", (3) it goes in a trailer, where other
bits intended for machine consumption already go.

Should we adopt it?  In other words, how about something like the
following?

If it seems like a good idea, I can add a commit message.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

diff --git i/Documentation/SubmittingPatches w/Documentation/SubmittingPatches
index ec8b205145..36ce1ac5d8 100644
--- i/Documentation/SubmittingPatches
+++ w/Documentation/SubmittingPatches
@@ -367,6 +367,20 @@ If you like, you can put extra tags at the end:
 You can also create your own tag or use one that's in common usage
 such as "Thanks-to:", "Based-on-patch-by:", or "Mentored-by:".
 
+If your patch fixes a bug in a specific commit, e.g. you found an issue using
+``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
+the SHA-1 ID, and the one line summary.  For example::
+
+	Fixes: 539047c19e ("revert: introduce --abort to cancel a failed cherry-pick")
+
+The following ``git config`` settings can be used to add a pretty format for
+outputting the above style in the ``git log`` or ``git show`` commands::
+
+	[core]
+		abbrev = 12
+	[pretty]
+		fixes = Fixes: %h (\"%s\")
+
 == Subsystems with dedicated maintainers
 
 Some parts of the system have dedicated maintainers with their own
Jeff King Dec. 19, 2018, 10:48 p.m. UTC | #12
On Wed, Dec 19, 2018 at 10:39:27AM -0800, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> >   - web interfaces like GitHub won't linkify this type of reference
> >     (whereas they will for something that looks like a hex object id)
> >
> >   - my terminal makes it easy to select hex strings, but doesn't
> >     understand this git-describe output :)
> >
> > These tools _could_ be taught a regex like /v(\d+.)(-rc\d+)?([~^]+d)*/.
> > But matching hex strings is a lot simpler, and works across many
> > projects.
> 
> Is there some rule about how long the hex string has to be for this to
> work?

In both cases, it has to be 7 characters. In my experience, it doesn't
produce a lot of false positives (in the case of GitHub, I believe it
actually confirms that it's a real commit; in my terminal, it highlights
anything plausible).

> >   In commit 1234abcd (the subject line, 2016-01-01, v2.11.0), we did
> >   blah blah blah
> 
> The issue with this is that it is ambiguous about what the tag name is
> referring to: does that mean that "git describe" and "git version"
> tell me that v2.11.0 is the nearest *previous* release to that commit
> or that "git name-rev" tells me that v2.11.0 is a nearby *subsequent*
> release that contains it?

Sure, it's ambiguous if you've never seen it. But if it becomes a
convention in the project, then I don't think that's an obstacle.

I'm also not sure it really matters all that much either way. If you buy
my argument that this is just about placing the general era of the
commit in the mind of the reader, then "just before v2.11" or "just
after v2.11" are about the same.

> Of course the latter is the only answer that's useful, but in practice
> the former is what people tend to do when they are trying to follow a
> convention like this.  So we'd need some automatic linting to make it
> useful.

I thought we were just talking about an informational message in one
human's writing, that would be read and interpreted by another human
(the commit id is the thing that remains machine-readable). Automatic
linting seems a bit overboard...

> I think a more promising approach is the Fixes trailer Duy mentioned,
> which has been working well for the Linux kernel project.  I'll follow
> up in a reply to his message.

I think that's a good idea if something is in fact being fixed. But
there are many other reasons to refer to another commit in prose (or
even outside of a commit message entirely).

-Peff
Jonathan Nieder Dec. 19, 2018, 11:29 p.m. UTC | #13
Jeff King wrote:
> On Wed, Dec 19, 2018 at 10:39:27AM -0800, Jonathan Nieder wrote:

>> Is there some rule about how long the hex string has to be for this to
>> work?
>
> In both cases, it has to be 7 characters.

Thanks.

[...]
>> The issue with this is that it is ambiguous about what the tag name is
>> referring to: does that mean that "git describe" and "git version"
>> tell me that v2.11.0 is the nearest *previous* release to that commit
>> or that "git name-rev" tells me that v2.11.0 is a nearby *subsequent*
>> release that contains it?
>
> Sure, it's ambiguous if you've never seen it. But if it becomes a
> convention in the project, then I don't think that's an obstacle.

I'm speaking from experience: this is hard for newcomers to grasp.

> I'm also not sure it really matters all that much either way. If you buy
> my argument that this is just about placing the general era of the
> commit in the mind of the reader, then "just before v2.11" or "just
> after v2.11" are about the same.

If it's that unreliable, I'd rather just have the hash, to be honest.

Ideally the full 40 characters, since that would make git name-rev
--stdin work. :)

[...]
>> I think a more promising approach is the Fixes trailer Duy mentioned,
>> which has been working well for the Linux kernel project.  I'll follow
>> up in a reply to his message.
>
> I think that's a good idea if something is in fact being fixed. But
> there are many other reasons to refer to another commit in prose (or
> even outside of a commit message entirely).

Sure, but in those cases do we need the ability to query on them?

To me it seems similar to having a policy on how to reference people
in commit messages (e.g. "always include their email address"), so
that I can grep for a contributor to see how they were involved in a
patch.  If it's not structured data, then at some point I stop
worrying so much about machine parsability.

Thanks,
Jonathan
Ævar Arnfjörð Bjarmason Dec. 20, 2018, 12:18 a.m. UTC | #14
On Wed, Dec 19 2018, Jonathan Nieder wrote:

> Hi,
>
> Duy Nguyen wrote:
>> On Wed, Dec 19, 2018 at 6:04 PM Ævar Arnfjörð Bjarmason
>
>>> E.g. when composing
>>> https://public-inbox.org/git/878t0lfwrj.fsf@evledraar.gmail.com/ I
>>> remembered PERLLIB_EXTRA went back & forth between
>>> working/breaking/working with your/my/your patch, so:
>>>
>>>     git log --grep=0386dd37b1
>>>
>>> Just found the chain up to my breaking change, but not your 7a7bfc7adc,
>>> which refers to that commit as v1.9-rc0~88^2.
> [...]
>> To follow email model, this sounds like a good trailer for related
>> commits, like In-Reply-To for email. We could even have special
>> trailer "Fixes" to indicate what commit is the problem that this
>> commit fixes.
>
> In Linux kernel land, Documentation/process/submitting-patches.rst
> contains the following:
>
> -- >8 --
> If your patch fixes a bug in a specific commit, e.g. you found an issue using
> ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
> the SHA-1 ID, and the one line summary.  For example::
>
> 	Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()")
>
> The following ``git config`` settings can be used to add a pretty format for
> outputting the above style in the ``git log`` or ``git show`` commands::
>
> 	[core]
> 		abbrev = 12
> 	[pretty]
> 		fixes = Fixes: %h (\"%s\")
> -- 8< --
>
> I like it because (1) the semantics are clear, (2) it's very concrete
> (e.g. "first 12 characters", (3) it goes in a trailer, where other
> bits intended for machine consumption already go.
>
> Should we adopt it?  In other words, how about something like the
> following?
>
> If it seems like a good idea, I can add a commit message.
>
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
>
> diff --git i/Documentation/SubmittingPatches w/Documentation/SubmittingPatches
> index ec8b205145..36ce1ac5d8 100644
> --- i/Documentation/SubmittingPatches
> +++ w/Documentation/SubmittingPatches
> @@ -367,6 +367,20 @@ If you like, you can put extra tags at the end:
>  You can also create your own tag or use one that's in common usage
>  such as "Thanks-to:", "Based-on-patch-by:", or "Mentored-by:".
>
> +If your patch fixes a bug in a specific commit, e.g. you found an issue using
> +``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
> +the SHA-1 ID, and the one line summary.  For example::
> +
> +	Fixes: 539047c19e ("revert: introduce --abort to cancel a failed cherry-pick")
> +
> +The following ``git config`` settings can be used to add a pretty format for
> +outputting the above style in the ``git log`` or ``git show`` commands::
> +
> +	[core]
> +		abbrev = 12
> +	[pretty]
> +		fixes = Fixes: %h (\"%s\")
> +
>  == Subsystems with dedicated maintainers
>
>  Some parts of the system have dedicated maintainers with their own

The core.abbrev=12 part of this I don't think would be a good idea, and
have submitted a patch to the kernel to remove it:
https://lore.kernel.org/lkml/20181220000112.24891-1-avarab@gmail.com/

If we find ourselves wanting to tweak core.abbrev for git.git, we should
really take a step back and see if we can just fix the
find_unique_abbrev_r() behavior so neither us nor anyone else should
need to fiddle with it.

As noted on LKML I have upcoming patches to support core.abbrev relative
values, e.g. "+2" will give you really future-proof SHAs. That should be
Good Enough(TM) for most.

The only real improvement over the approximate_object_count() method I
can think of is something where in gc / index-pack (for clone) we write
out some statistics that allow us to later cheaply estimate the
long-term growth curve of the repository, and e.g. say that a short
SHA-1 should always be good for at least N years before it becomes
ambiguous.

OTOH we could also just say that if you're a repo with >= 11 characters
for abbreviation we might as well consider you in big boy territory, and
just snap it to say 16 and be done with it. We'll have problems with 32
bit ints somewhere in git overflowing before we'd roll over to "17".
Jeff King Dec. 20, 2018, 2:51 a.m. UTC | #15
On Wed, Dec 19, 2018 at 03:29:48PM -0800, Jonathan Nieder wrote:

> > I'm also not sure it really matters all that much either way. If you buy
> > my argument that this is just about placing the general era of the
> > commit in the mind of the reader, then "just before v2.11" or "just
> > after v2.11" are about the same.
> 
> If it's that unreliable, I'd rather just have the hash, to be honest.

Well, that was sort of my point. :) I think the hash is the most
interesting part, and everything else is gravy for the reader to save
them time digging into the commit.

> > I think that's a good idea if something is in fact being fixed. But
> > there are many other reasons to refer to another commit in prose (or
> > even outside of a commit message entirely).
> 
> Sure, but in those cases do we need the ability to query on them?

I'm not sure what you mean. We were talking about how to reference
commits in prose. I think a "Fixes" trailer eliminates the need to do so
(or least makes it redundant) in _some_ cases, but the other cases are
still of interest.

> To me it seems similar to having a policy on how to reference people
> in commit messages (e.g. "always include their email address"), so
> that I can grep for a contributor to see how they were involved in a
> patch.  If it's not structured data, then at some point I stop
> worrying so much about machine parsability.

Sure. All I'm really saying is "always include the hash".

-Peff
Jacob Keller Dec. 24, 2018, 12:01 a.m. UTC | #16
On Wed, Dec 19, 2018 at 2:36 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
> In Linux kernel land, Documentation/process/submitting-patches.rst
> contains the following:
>
> -- >8 --
> If your patch fixes a bug in a specific commit, e.g. you found an issue using
> ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
> the SHA-1 ID, and the one line summary.  For example::
>
>         Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()")
>
> The following ``git config`` settings can be used to add a pretty format for
> outputting the above style in the ``git log`` or ``git show`` commands::
>
>         [core]
>                 abbrev = 12
>         [pretty]
>                 fixes = Fixes: %h (\"%s\")
> -- 8< --
>
> I like it because (1) the semantics are clear, (2) it's very concrete
> (e.g. "first 12 characters", (3) it goes in a trailer, where other
> bits intended for machine consumption already go.
>

For what it's worth, Linux's checkpatch.pl script also checks for and
enforces that commit references have this format.

I personally like having the date information, and have attempted to
get checkpatch.pl to stop complaining about the date when it's
included. (see https://patchwork.ozlabs.org/patch/821543/ for the
patch that I've had up, we've been trying to get the maintainer of
checkpatch.pl to notice and pull it in, but not very much success as
of yet).

I'd prefer to keep the format as seen on this list fairly often which
is the something like

show --pretty=%h (\"%s\", %ad) --date=short

or

show --pretty=%h (\"%s\", %cd) --date=short

I like the date since it gives a quick approximation of how long ago
the commit was made, and helps in the rare case of disambiguation.
Personally I also like the quotes since it makes it more obvious where
the subject begins and ends. They aren't strictly necessary but aid
readability to me.

Thanks,
Jake
diff mbox series

Patch

diff --git a/builtin/stripspace.c b/builtin/stripspace.c
index bdf0328869..be33eb83c1 100644
--- a/builtin/stripspace.c
+++ b/builtin/stripspace.c
@@ -30,6 +30,7 @@  int cmd_stripspace(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf buf = STRBUF_INIT;
 	enum stripspace_mode mode = STRIP_DEFAULT;
+	int nongit;
 
 	const struct option options[] = {
 		OPT_CMDMODE('s', "strip-comments", &mode,
@@ -46,7 +47,7 @@  int cmd_stripspace(int argc, const char **argv, const char *prefix)
 		usage_with_options(stripspace_usage, options);
 
 	if (mode == STRIP_COMMENTS || mode == COMMENT_LINES) {
-		setup_git_directory_gently(NULL);
+		setup_git_directory_gently(&nongit);
 		git_config(git_default_config, NULL);
 	}
 
diff --git a/t/t0030-stripspace.sh b/t/t0030-stripspace.sh
index 5ce47e8af5..0c24a0f9a3 100755
--- a/t/t0030-stripspace.sh
+++ b/t/t0030-stripspace.sh
@@ -430,9 +430,15 @@  test_expect_success '-c with changed comment char' '
 test_expect_success '-c with comment char defined in .git/config' '
 	test_config core.commentchar = &&
 	printf "= foo\n" >expect &&
-	printf "foo" | (
-		mkdir sub && cd sub && git stripspace -c
-	) >actual &&
+	rm -fr sub &&
+	mkdir sub &&
+	printf "foo" | git -C sub stripspace -c >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '-c outside git repository' '
+	printf "# foo\n" >expect &&
+	printf "foo" | nongit git stripspace -c >actual &&
 	test_cmp expect actual
 '