diff mbox series

t: avoid alternation (not POSIX) in grep's BRE

Message ID 20200528083745.15273-1-carenas@gmail.com
State New, archived
Headers show
Series t: avoid alternation (not POSIX) in grep's BRE | expand

Commit Message

Carlo Arenas May 28, 2020, 8:37 a.m. UTC
Using and escaped '|' for alternations is allowed in some implementations
of grep (GNU and busybox, al least), but it is no suppored by POSIX[1]
and therefore will fail in at least macOS and the BSD.

Change syntax to ERE and use Extended regular expression with grep
explicitly.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03

Fixes: f1e3df3169 (t: increase test coverage of signature verification
       output, 2020-03-04)
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 t/t4202-log.sh           | 2 +-
 t/t6200-fmt-merge-msg.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Junio C Hamano May 28, 2020, 3:20 p.m. UTC | #1
Carlo Marcelo Arenas Belón  <carenas@gmail.com> writes:

> Using and escaped '|' for alternations is allowed in some implementations
> of grep (GNU and busybox, al least), but it is no suppored by POSIX[1]
> and therefore will fail in at least macOS and the BSD.
>
> Change syntax to ERE and use Extended regular expression with grep
> explicitly.

Thanks.

> [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03
>
> Fixes: f1e3df3169 (t: increase test coverage of signature verification
>        output, 2020-03-04)

I do appreciate the information recorded in the log message, but not
like this.  Does everybody's tool understand the "folding" the above
two physical lines require to be able to handle it correctly?

> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  t/t4202-log.sh           | 2 +-
>  t/t6200-fmt-merge-msg.sh | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index f1ea7d97f5..a0930599aa 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -1692,7 +1692,7 @@ test_expect_success GPG 'log --graph --show-signature for merged tag with missin
>  	GNUPGHOME=. git log --graph --show-signature -n1 plain-nokey >actual &&
>  	grep "^|\\\  merged tag" actual &&
>  	grep "^| | gpg: Signature made" actual &&
> -	grep "^| | gpg: Can'"'"'t check signature: \(public key not found\|No public key\)" actual
> +	grep -E "^| | gpg: Can'"'"'t check signature: (public key not found|No public key)" actual
>  '
>  
>  test_expect_success GPG 'log --graph --show-signature for merged tag with bad signature' '
> diff --git a/t/t6200-fmt-merge-msg.sh b/t/t6200-fmt-merge-msg.sh
> index b15582a7a2..e4c2a6eca4 100755
> --- a/t/t6200-fmt-merge-msg.sh
> +++ b/t/t6200-fmt-merge-msg.sh
> @@ -103,7 +103,7 @@ test_expect_success GPG 'message for merging local tag signed by unknown key' '
>  	GNUPGHOME=. git fmt-merge-msg <.git/FETCH_HEAD >actual 2>&1 &&
>  	grep "^Merge tag ${apos}signed-good-tag${apos}" actual &&
>  	grep "^# gpg: Signature made" actual &&
> -	grep "^# gpg: Can${apos}t check signature: \(public key not found\|No public key\)" actual
> +	grep -E "^# gpg: Can${apos}t check signature: (public key not found|No public key)" actual
>  '
>  
>  test_expect_success 'message for merging external branch' '
Jeff King May 28, 2020, 3:43 p.m. UTC | #2
On Thu, May 28, 2020 at 08:20:04AM -0700, Junio C Hamano wrote:

> > [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03
> >
> > Fixes: f1e3df3169 (t: increase test coverage of signature verification
> >        output, 2020-03-04)
> 
> I do appreciate the information recorded in the log message, but not
> like this.  Does everybody's tool understand the "folding" the above
> two physical lines require to be able to handle it correctly?

If you use any of the Git tools, they support an "unfold" option like:

  git log --format='%(trailers:unfold)'

that normalizes this. However, I would not be at all surprised if people
use custom readers. I think it may be a good policy to stick to the
simplest machine-readable formats for trailers. Likewise I'd suggest
using the full sha1-hex for future-proofing in this context.

-Peff
Junio C Hamano May 28, 2020, 3:51 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> ... However, I would not be at all surprised if people
> use custom readers.

Yes.  I would be surprised if nobody did so.

> I think it may be a good policy to stick to the
> simplest machine-readable formats for trailers. Likewise I'd suggest
> using the full sha1-hex for future-proofing in this context.

Yes, or just roll it into prose like we often do.  Anybody can spot
many examples from "git log --no-merges" ;-)

commit 173cb08d5b38f423b3cae409daa6d414348c3459
Author: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Date:   Wed May 20 10:08:43 2020 -0700

    bisect: avoid tailing CR characters from revision in replay
    
    6c722cbe5a (bisect: allow CRLF line endings in "git bisect replay"
    input, 2020-05-07) includes CR as a field separator, but relies on
    it not being included in the last field, which breaks at least when
    running under OpenBSD 6.7's sh.
    
    Instead of just assume the CR will get swallowed, read the rest of
    the line into an otherwise unused variable and ignore it everywhere
    except on the call for git bisect start, where it matters.
    
    Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>


commit e68a5272b1aab46514ae01745be50948ac375146
Author: Derrick Stolee <dstolee@microsoft.com>
Date:   Tue May 19 19:48:45 2020 +0000

    fsck: use ERROR_MULTI_PACK_INDEX
    
    The multi-pack-index was added to the data verified by git-fsck in
    ea5ae6c3 "fsck: verify multi-pack-index". This implementation was
    based on the implementation for verifying the commit-graph, and a
    copy-paste error kept the ERROR_COMMIT_GRAPH flag as the bit set
    when an error appears in the multi-pack-index.
    
    Add a new flag, ERROR_MULTI_PACK_INDEX, and use that instead.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King May 28, 2020, 4:52 p.m. UTC | #4
On Thu, May 28, 2020 at 08:51:21AM -0700, Junio C Hamano wrote:

> > I think it may be a good policy to stick to the
> > simplest machine-readable formats for trailers. Likewise I'd suggest
> > using the full sha1-hex for future-proofing in this context.
> 
> Yes, or just roll it into prose like we often do.  Anybody can spot
> many examples from "git log --no-merges" ;-)

Yeah, I was sort of working under the assumption that people found
utility in putting that data in a trailer (in which case I'd really
suggest doing both; a prose one with subject meant for readers, and a
machine-readable annotation).

But I haven't really found a use for "Fixes" in machine-readable format.
I don't _mind_ people doing it if they do have a use (and I'd even
consider doing it myself if I were shown that it was useful). In the
meantime, I don't know if we want to state a project preference against
it.

-Peff
Junio C Hamano May 28, 2020, 7:20 p.m. UTC | #5
Jeff King <peff@peff.net> writes:

> But I haven't really found a use for "Fixes" in machine-readable format.
> I don't _mind_ people doing it if they do have a use (and I'd even
> consider doing it myself if I were shown that it was useful). In the
> meantime, I don't know if we want to state a project preference against
> it.

I've seen "Fixes: bug number" in projects that maintain bug
databases and automatically updates the status of the named bug when
a commit with such a footer hits certain integration branches; the
utility of such a usecase would be fairly obvious.

But "Fixes: <commit>" makes me nervous.  One reason is because a
commit very often introduces multiple bugs (or no bugs at all), so
which one (or more) of the bug is corrected cannot be read from such
a footer that _only_ blames a particular commit.

	Side note: also "fixes:" footer would cast a claim made when
	a commit was created in stone---which may later turn out to
	be false.  But the issue is not unique to "Fixes: <commit>";
	"Fixes: <bugid>" suffers exactly from the same problem.

An interesting aspect of "Fixes: <commit>" is that we can use it to
easily see who is the buggiest by dividing number of buggy commit by
number of total commits per author ;-)

I'd rather not to see people adding random footers whose utility is
dubious, but for this particular one, I am not against it strongly
enough to be tempted to declare an immediate ban.
Jeff King May 28, 2020, 8:35 p.m. UTC | #6
On Thu, May 28, 2020 at 12:20:49PM -0700, Junio C Hamano wrote:

> I've seen "Fixes: bug number" in projects that maintain bug
> databases and automatically updates the status of the named bug when
> a commit with such a footer hits certain integration branches; the
> utility of such a usecase would be fairly obvious.
> 
> But "Fixes: <commit>" makes me nervous.  One reason is because a
> commit very often introduces multiple bugs (or no bugs at all), so
> which one (or more) of the bug is corrected cannot be read from such
> a footer that _only_ blames a particular commit.
> 
> 	Side note: also "fixes:" footer would cast a claim made when
> 	a commit was created in stone---which may later turn out to
> 	be false.  But the issue is not unique to "Fixes: <commit>";
> 	"Fixes: <bugid>" suffers exactly from the same problem.
> 
> An interesting aspect of "Fixes: <commit>" is that we can use it to
> easily see who is the buggiest by dividing number of buggy commit by
> number of total commits per author ;-)

Thanks for writing this out. I agree with all of it.

You could probably get interesting numbers in our project by grepping
for [0-9a-f]{7,} in commit messages to see which commits are referenced
a lot. Those aren't always bug-fixes exactly, but they are often "we did
this in commit XYZ, but that missed this case". Or maybe it would just
tell you which commits are interesting enough that we keep coming back
to them. ;)

> I'd rather not to see people adding random footers whose utility is
> dubious, but for this particular one, I am not against it strongly
> enough to be tempted to declare an immediate ban.

Sounds reasonable.

-Peff
Jeff King May 29, 2020, 3:18 a.m. UTC | #7
On Thu, May 28, 2020 at 04:35:17PM -0400, Jeff King wrote:

> You could probably get interesting numbers in our project by grepping
> for [0-9a-f]{7,} in commit messages to see which commits are referenced
> a lot. Those aren't always bug-fixes exactly, but they are often "we did
> this in commit XYZ, but that missed this case". Or maybe it would just
> tell you which commits are interesting enough that we keep coming back
> to them. ;)

Just for fun, I tried this. You can get a list of plausible references
with:

  git log --format=%B |
  perl -lne 'print $& for /[0-9a-f]{7,}/' |
  git cat-file --batch-check='%(objectname)' |
  grep -v missing

The top hits (by count) in git.git are:

  16 d1c5f2a42d (Add git-am, applymbox replacement., 2005-10-07)

     Referenced a lot during the C conversion, but not particularly
     buggy.

  14 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391

     Not a commit. Experienced Gits might recognize this particular
     sha1.

  13 d95138e695 (setup: set env $GIT_WORK_TREE when work tree is set, like $GIT_DIR, 2015-06-26)

     This one _is_ interesting, as it caused a lot of fallout and was
     reverted. But really only 4 commits; it just got mentioned several
     times in each (and my line-by-line count doesn't notice that).

  12 b1ef400eec (setup_git_env: avoid blind fall-back to ".git", 2016-10-20)

     This one also caused a lot of fallout (and actually had 7
     commits mention it), which doesn't surprise me, given the topic.

  10 94da9193a6 (grep: add support for PCRE v2, 2017-06-01)

     I wouldn't have thought this could have caused a lot of bugs, but
     apparently it did. :)

So I dunno. A little fun to look through, but I didn't find it
especially enlightening.

-Peff
Torsten Bögershausen May 29, 2020, 3:39 a.m. UTC | #8
On Thu, May 28, 2020 at 01:37:45AM -0700, Carlo Marcelo Arenas Belón wrote:
> Using and escaped '|' for alternations is allowed in some implementations
        ^^^
Is this a typo ? "Using an escaped"...
And there are more typos and small things that deserve improvements.
> of grep (GNU and busybox, al least), but it is no suppored by POSIX[1]
> and therefore will fail in at least macOS and the BSD.

I don't think that macOS (or BSD) ever claimed to be be POSIX compliant.
How about something in this style:


Using an escaped '|' for alternations works only in some implementations
of grep (e.g. GNU and busybox).

It is not part of POSIX[1] and not supported by BSD, macOS
(and possibly other) non-GNU implementations.

Solution: Use ´grep -E´.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03


>
> Fixes: f1e3df3169 (t: increase test coverage of signature verification
>        output, 2020-03-04)
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  t/t4202-log.sh           | 2 +-
>  t/t6200-fmt-merge-msg.sh | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index f1ea7d97f5..a0930599aa 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -1692,7 +1692,7 @@ test_expect_success GPG 'log --graph --show-signature for merged tag with missin
>  	GNUPGHOME=. git log --graph --show-signature -n1 plain-nokey >actual &&
>  	grep "^|\\\  merged tag" actual &&
>  	grep "^| | gpg: Signature made" actual &&
> -	grep "^| | gpg: Can'"'"'t check signature: \(public key not found\|No public key\)" actual
> +	grep -E "^| | gpg: Can'"'"'t check signature: (public key not found|No public key)" actual
>  '
>
>  test_expect_success GPG 'log --graph --show-signature for merged tag with bad signature' '
> diff --git a/t/t6200-fmt-merge-msg.sh b/t/t6200-fmt-merge-msg.sh
> index b15582a7a2..e4c2a6eca4 100755
> --- a/t/t6200-fmt-merge-msg.sh
> +++ b/t/t6200-fmt-merge-msg.sh
> @@ -103,7 +103,7 @@ test_expect_success GPG 'message for merging local tag signed by unknown key' '
>  	GNUPGHOME=. git fmt-merge-msg <.git/FETCH_HEAD >actual 2>&1 &&
>  	grep "^Merge tag ${apos}signed-good-tag${apos}" actual &&
>  	grep "^# gpg: Signature made" actual &&
> -	grep "^# gpg: Can${apos}t check signature: \(public key not found\|No public key\)" actual
> +	grep -E "^# gpg: Can${apos}t check signature: (public key not found|No public key)" actual
>  '
>
>  test_expect_success 'message for merging external branch' '
> --
> 2.27.0.rc2.257.gaada2199e1
>
diff mbox series

Patch

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index f1ea7d97f5..a0930599aa 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1692,7 +1692,7 @@  test_expect_success GPG 'log --graph --show-signature for merged tag with missin
 	GNUPGHOME=. git log --graph --show-signature -n1 plain-nokey >actual &&
 	grep "^|\\\  merged tag" actual &&
 	grep "^| | gpg: Signature made" actual &&
-	grep "^| | gpg: Can'"'"'t check signature: \(public key not found\|No public key\)" actual
+	grep -E "^| | gpg: Can'"'"'t check signature: (public key not found|No public key)" actual
 '
 
 test_expect_success GPG 'log --graph --show-signature for merged tag with bad signature' '
diff --git a/t/t6200-fmt-merge-msg.sh b/t/t6200-fmt-merge-msg.sh
index b15582a7a2..e4c2a6eca4 100755
--- a/t/t6200-fmt-merge-msg.sh
+++ b/t/t6200-fmt-merge-msg.sh
@@ -103,7 +103,7 @@  test_expect_success GPG 'message for merging local tag signed by unknown key' '
 	GNUPGHOME=. git fmt-merge-msg <.git/FETCH_HEAD >actual 2>&1 &&
 	grep "^Merge tag ${apos}signed-good-tag${apos}" actual &&
 	grep "^# gpg: Signature made" actual &&
-	grep "^# gpg: Can${apos}t check signature: \(public key not found\|No public key\)" actual
+	grep -E "^# gpg: Can${apos}t check signature: (public key not found|No public key)" actual
 '
 
 test_expect_success 'message for merging external branch' '