diff mbox series

[v3,4/4] ref-filter: add support for %(contents:size)

Message ID 20200707174049.21714-5-chriscool@tuxfamily.org (mailing list archive)
State New, archived
Headers show
Series Add support for %(contents:size) in ref-filter | expand

Commit Message

Christian Couder July 7, 2020, 5:40 p.m. UTC
It's useful and efficient to be able to get the size of the
contents directly without having to pipe through `wc -c`.

Also the result of the following:

`git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c`

is off by one as `git for-each-ref` appends a newline character
after the contents, which can be seen by comparing its output
with the output from `git cat-file`.

As with %(contents), %(contents:size) is silently ignored, if a
ref points to something other than a commit or a tag:

```
$ git update-ref refs/mytrees/first HEAD^{tree}
$ git for-each-ref --format='%(contents)' refs/mytrees/first

$ git for-each-ref --format='%(contents:size)' refs/mytrees/first

```

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-for-each-ref.txt |  3 +++
 ref-filter.c                       |  7 ++++++-
 t/t6300-for-each-ref.sh            | 19 +++++++++++++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)

Comments

Junio C Hamano July 7, 2020, 7:45 p.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

> It's useful and efficient to be able to get the size of the
> contents directly without having to pipe through `wc -c`.
>
> Also the result of the following:
>
> `git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c`
>
> is off by one as `git for-each-ref` appends a newline character
> after the contents, which can be seen by comparing its output
> with the output from `git cat-file`.
>
> As with %(contents), %(contents:size) is silently ignored, if a
> ref points to something other than a commit or a tag:
>
> ```
> $ git update-ref refs/mytrees/first HEAD^{tree}
> $ git for-each-ref --format='%(contents)' refs/mytrees/first
>
> $ git for-each-ref --format='%(contents:size)' refs/mytrees/first
>
> ```
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  Documentation/git-for-each-ref.txt |  3 +++
>  ref-filter.c                       |  7 ++++++-
>  t/t6300-for-each-ref.sh            | 19 +++++++++++++++++++
>  3 files changed, 28 insertions(+), 1 deletion(-)

Nice.  The only questionable thing here is if we later regret for
assuming that all sizes are always measured in bytes.  If we later
find an application that wants an efficient access to "| wc -l"
(instead of "| wc -c" that motivated this patch), we'd want to be
able to say "%(contents:lines)" and at that point we may want to go
back in time and call this "%(contents:bytes)" or something.

>  test_atom head contents 'Initial
>  '
> +test_atom head contents:size '8'

These two are tied together (any change to the test script that
causes us to update the former also forces us to update the latter),
but I do not think of a way to unify the test without writing too
much boilerplate code, so let's say this is good enough at least for
now (but I may change my opinion as I read along).

>  test_atom tag contents 'Tagging at 1151968727
>  '
> +test_atom tag contents:size '22'

Likewise.

> @@ -580,6 +582,7 @@ test_atom refs/tags/subject-body contents 'the subject line
>  first body line
>  second body line
>  '
> +test_atom refs/tags/subject-body contents:size '51'

Likewise.

Of course, we _could_ update the test_atom to do something magic
only when the 'contents' atom is being asked.  We notice that $2 is
'contents', do the usual test_expect_success for 'contents', and
then measure the byte length of $3 ourselves and test
'contents:size'.  That way, all the above test updates would become
unnecessary (and the last two hunks of this patch can also go).

That approach may even allow you to hide the details of sanitize-pgp
in the updated test_atom so that the actual tests may not have to get
updated even for signed tags.

> +# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp
> +test_tag_contents_size_pgp () {
> +	ref="$1"
> +	test_expect_success $PREREQ "basic atom: $ref contents:size" "
> +		git cat-file tag $ref | tail -n +6 | wc -c >expected &&
> +		git for-each-ref --format='%(contents:size)' $ref >actual &&
> +		test_cmp expected actual
> +	"
> +}
> +
>  PREREQ=GPG
>  test_atom refs/tags/signed-empty subject ''
>  test_atom refs/tags/signed-empty contents:subject ''
> @@ -629,6 +643,7 @@ test_atom refs/tags/signed-empty body "$sig"
>  test_atom refs/tags/signed-empty contents:body ''
>  test_atom refs/tags/signed-empty contents:signature "$sig"
>  test_atom refs/tags/signed-empty contents "$sig"
> +test_tag_contents_size_pgp refs/tags/signed-empty
>  
>  test_atom refs/tags/signed-short subject 'subject line'
>  test_atom refs/tags/signed-short contents:subject 'subject line'
> @@ -637,6 +652,7 @@ test_atom refs/tags/signed-short contents:body ''
>  test_atom refs/tags/signed-short contents:signature "$sig"
>  test_atom refs/tags/signed-short contents "subject line
>  $sig"
> +test_tag_contents_size_pgp refs/tags/signed-short
>  
>  test_atom refs/tags/signed-long subject 'subject line'
>  test_atom refs/tags/signed-long contents:subject 'subject line'
> @@ -649,6 +665,7 @@ test_atom refs/tags/signed-long contents "subject line
>  
>  body contents
>  $sig"
> +test_tag_contents_size_pgp refs/tags/signed-long
>  
>  test_expect_success 'set up refs pointing to tree and blob' '
>  	git update-ref refs/mytrees/first refs/heads/master^{tree} &&
> @@ -664,6 +681,7 @@ test_atom refs/mytrees/first body ""
>  test_atom refs/mytrees/first contents:body ""
>  test_atom refs/mytrees/first contents:signature ""
>  test_atom refs/mytrees/first contents ""
> +test_atom refs/mytrees/first contents:size ""
>  
>  test_atom refs/myblobs/first subject ""
>  test_atom refs/myblobs/first contents:subject ""
> @@ -671,6 +689,7 @@ test_atom refs/myblobs/first body ""
>  test_atom refs/myblobs/first contents:body ""
>  test_atom refs/myblobs/first contents:signature ""
>  test_atom refs/myblobs/first contents ""
> +test_atom refs/myblobs/first contents:size ""
>  
>  test_expect_success 'set up multiple-sort tags' '
>  	for when in 100000 200000
Junio C Hamano July 7, 2020, 10:21 p.m. UTC | #2
Christian Couder <christian.couder@gmail.com> writes:

> +		else if (atom->u.contents.option == C_LENGTH)
> +			v->s = xstrfmt("%ld", strlen(subpos));

Please squash something like this in.  32-bit builds are failing.

 ref-filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index 8ec28f0535..73d8bfa86d 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1257,7 +1257,7 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 		else if (atom->u.contents.option == C_BODY_DEP)
 			v->s = xmemdupz(bodypos, bodylen);
 		else if (atom->u.contents.option == C_LENGTH)
-			v->s = xstrfmt("%ld", strlen(subpos));
+			v->s = xstrfmt("%"PRIuMAX, (uintmax_t)strlen(subpos));
 		else if (atom->u.contents.option == C_BODY)
 			v->s = xmemdupz(bodypos, nonsiglen);
 		else if (atom->u.contents.option == C_SIG)
Junio C Hamano July 8, 2020, 11:05 p.m. UTC | #3
Christian Couder <christian.couder@gmail.com> writes:

> +# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp
> +test_tag_contents_size_pgp () {
> +	ref="$1"
> +	test_expect_success $PREREQ "basic atom: $ref contents:size" "
> +		git cat-file tag $ref | tail -n +6 | wc -c >expected &&
> +		git for-each-ref --format='%(contents:size)' $ref >actual &&
> +		test_cmp expected actual
> +	"
> +}

Of course, this will BREAK the tests on macOS and possibly others
with "wc" that emits leading whitespaces before the number.

The tip of 'seen' has been failing ever since this topic was merged
there e.g. https://travis-ci.org/github/git/git/jobs/705986794

Thanks.
Junio C Hamano July 9, 2020, 12:14 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Of course, we _could_ update the test_atom to do something magic
> only when the 'contents' atom is being asked.  We notice that $2 is
> 'contents', do the usual test_expect_success for 'contents', and
> then measure the byte length of $3 ourselves and test
> 'contents:size'.  That way, all the above test updates would become
> unnecessary (and the last two hunks of this patch can also go).
>
> That approach may even allow you to hide the details of sanitize-pgp
> in the updated test_atom so that the actual tests may not have to get
> updated even for signed tags.

After seeing the "wc -c" portability issues, I am now even more
inclined to say that the above is the right direction.  The
portability worries can and should be encapsulated in a single
test_atom helper function, just as it can be used to hide the
differences between signed tags, annotated tags and commits.

Thanks.

>> +# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp
>> +test_tag_contents_size_pgp () {
>> +	ref="$1"
>> +	test_expect_success $PREREQ "basic atom: $ref contents:size" "
>> +		git cat-file tag $ref | tail -n +6 | wc -c >expected &&
>> +		git for-each-ref --format='%(contents:size)' $ref >actual &&
>> +		test_cmp expected actual
>> +	"
>> +}
Christian Couder July 9, 2020, 8:10 a.m. UTC | #5
On Thu, Jul 9, 2020 at 2:14 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Of course, we _could_ update the test_atom to do something magic
> > only when the 'contents' atom is being asked.  We notice that $2 is
> > 'contents', do the usual test_expect_success for 'contents', and
> > then measure the byte length of $3 ourselves and test
> > 'contents:size'.  That way, all the above test updates would become
> > unnecessary (and the last two hunks of this patch can also go).
> >
> > That approach may even allow you to hide the details of sanitize-pgp
> > in the updated test_atom so that the actual tests may not have to get
> > updated even for signed tags.
>
> After seeing the "wc -c" portability issues, I am now even more
> inclined to say that the above is the right direction.  The
> portability worries can and should be encapsulated in a single
> test_atom helper function, just as it can be used to hide the
> differences between signed tags, annotated tags and commits.

Yeah, I have been working on that and will send a new patch series soon.
The current test_atom() change looks like this:

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 371e45e5ad..e514d98574 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -52,6 +52,26 @@ test_atom() {
                sanitize_pgp <actual >actual.clean &&
                test_cmp expected actual.clean
        "
+       # Automatically test "contents:size" atom after testing "contents"
+       if test "$2" = "contents"
+       then
+               case "$1" in
+               refs/tags/signed-*)
+                       # We cannot use $3 as it expects sanitize_pgp to run
+                       git cat-file tag $ref | tail -n +6 | \
+                               wc -c | sed -e 's/^ *//' >expected ;;
+               refs/mytrees/*)
+                       echo >expected ;;
+               refs/myblobs/*)
+                       echo >expected ;;
+               *)
+                       printf '%s' "$3" | wc -c | sed -e 's/^ *//' >expected ;;
+               esac
+               test_expect_${4:-success} $PREREQ "basic atom: $1 $2:size" "
+                       git for-each-ref --format='%($2:size)' $ref >actual &&
+                       test_cmp expected actual
+               "
+       fi
 }

I am wondering if it's worth adding a preparatory patch to introduce
an helper function like the following in test-lib-functions.sh:

+# test_byte_count outputs the number of bytes in files or stdin
+#
+# It is like wc -c but without portability issues, as on macOS and
+# possibly other platforms leading whitespaces are emitted before the
+# number.
+
+test_byte_count () {
+       wc -c "$@" | sed -e 's/^ *//'
+}

Not sure about the name of this helper function as it works
differently than test_line_count().
Junio C Hamano July 9, 2020, 1:47 p.m. UTC | #6
Christian Couder <christian.couder@gmail.com> writes:

> Yeah, I have been working on that and will send a new patch series soon.
> The current test_atom() change looks like this:
>
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 371e45e5ad..e514d98574 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -52,6 +52,26 @@ test_atom() {
>                 sanitize_pgp <actual >actual.clean &&
>                 test_cmp expected actual.clean
>         "
> +       # Automatically test "contents:size" atom after testing "contents"
> +       if test "$2" = "contents"
> +       then
> +               case "$1" in
> +               refs/tags/signed-*)
> +                       # We cannot use $3 as it expects sanitize_pgp to run
> +                       git cat-file tag $ref | tail -n +6 | \
> +                               wc -c | sed -e 's/^ *//' >expected ;;
> +               refs/mytrees/*)
> +                       echo >expected ;;
> +               refs/myblobs/*)
> +                       echo >expected ;;
> +               *)
> +                       printf '%s' "$3" | wc -c | sed -e 's/^ *//' >expected ;;
> +               esac
> +               test_expect_${4:-success} $PREREQ "basic atom: $1 $2:size" "
> +                       git for-each-ref --format='%($2:size)' $ref >actual &&
> +                       test_cmp expected actual
> +               "
> +       fi
>  }
>
> I am wondering if it's worth adding a preparatory patch to introduce
> an helper function like the following in test-lib-functions.sh:
>
> +# test_byte_count outputs the number of bytes in files or stdin
> +#
> +# It is like wc -c but without portability issues, as on macOS and
> +# possibly other platforms leading whitespaces are emitted before the
> +# number.
> +
> +test_byte_count () {
> +       wc -c "$@" | sed -e 's/^ *//'
> +}
>
> Not sure about the name of this helper function as it works
> differently than test_line_count().

Yeah, if I were writing it, I'd call it "sane_wc_c" or something.

But more importantly, I think the invocation of "|sed" is overkill.
If I were writing it, I would go more like...

	if test $2 = contents
	then
		case "$1" in 
		...)
			expect=$(git cat-file ... | wc -c)
			;;
		refs/mytrees/* | refs/myblobs/*)
			expect=0
			;;
		*)
			expect=$(printf ... | wc -c)
			;;
		esac

		# leave $expect unquoted to lose possible leading whitespaces
	        echo $expect >expect
		test_expect_success "..." '
			...
			test_cmp expect actual
		'
	fi
diff mbox series

Patch

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 788258c3ad..049bc93e6a 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -236,6 +236,9 @@  The complete message (subject, body, trailers and signature) of a
 commit or tag object is `contents`. This field can also be used in the
 following ways:
 
+contents:size::
+	The size in bytes of the complete message.
+
 contents:subject::
 	The "subject" of the commit or tag message.  It's actually the
 	concatenation of all lines of the commit message up to the
diff --git a/ref-filter.c b/ref-filter.c
index 8447cb09be..8ec28f0535 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -127,7 +127,8 @@  static struct used_atom {
 			unsigned int nobracket : 1, push : 1, push_remote : 1;
 		} remote_ref;
 		struct {
-			enum { C_BARE, C_BODY, C_BODY_DEP, C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
+			enum { C_BARE, C_BODY, C_BODY_DEP, C_LENGTH,
+			       C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
@@ -338,6 +339,8 @@  static int contents_atom_parser(const struct ref_format *format, struct used_ato
 		atom->u.contents.option = C_BARE;
 	else if (!strcmp(arg, "body"))
 		atom->u.contents.option = C_BODY;
+	else if (!strcmp(arg, "size"))
+		atom->u.contents.option = C_LENGTH;
 	else if (!strcmp(arg, "signature"))
 		atom->u.contents.option = C_SIG;
 	else if (!strcmp(arg, "subject"))
@@ -1253,6 +1256,8 @@  static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			v->s = copy_subject(subpos, sublen);
 		else if (atom->u.contents.option == C_BODY_DEP)
 			v->s = xmemdupz(bodypos, bodylen);
+		else if (atom->u.contents.option == C_LENGTH)
+			v->s = xstrfmt("%ld", strlen(subpos));
 		else if (atom->u.contents.option == C_BODY)
 			v->s = xmemdupz(bodypos, nonsiglen);
 		else if (atom->u.contents.option == C_SIG)
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 371e45e5ad..b580e27a32 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -125,6 +125,7 @@  test_atom head contents:body ''
 test_atom head contents:signature ''
 test_atom head contents 'Initial
 '
+test_atom head contents:size '8'
 test_atom head HEAD '*'
 
 test_atom tag refname refs/tags/testtag
@@ -170,6 +171,7 @@  test_atom tag contents:body ''
 test_atom tag contents:signature ''
 test_atom tag contents 'Tagging at 1151968727
 '
+test_atom tag contents:size '22'
 test_atom tag HEAD ' '
 
 test_expect_success 'Check invalid atoms names are errors' '
@@ -580,6 +582,7 @@  test_atom refs/tags/subject-body contents 'the subject line
 first body line
 second body line
 '
+test_atom refs/tags/subject-body contents:size '51'
 
 test_expect_success 'create tag with multiline subject' '
 	cat >msg <<-\EOF &&
@@ -606,6 +609,7 @@  second subject line
 first body line
 second body line
 '
+test_atom refs/tags/multiline contents:size '73'
 
 test_expect_success GPG 'create signed tags' '
 	git tag -s -m "" signed-empty &&
@@ -622,6 +626,16 @@  sig='-----BEGIN PGP SIGNATURE-----
 -----END PGP SIGNATURE-----
 '
 
+# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp
+test_tag_contents_size_pgp () {
+	ref="$1"
+	test_expect_success $PREREQ "basic atom: $ref contents:size" "
+		git cat-file tag $ref | tail -n +6 | wc -c >expected &&
+		git for-each-ref --format='%(contents:size)' $ref >actual &&
+		test_cmp expected actual
+	"
+}
+
 PREREQ=GPG
 test_atom refs/tags/signed-empty subject ''
 test_atom refs/tags/signed-empty contents:subject ''
@@ -629,6 +643,7 @@  test_atom refs/tags/signed-empty body "$sig"
 test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
 test_atom refs/tags/signed-empty contents "$sig"
+test_tag_contents_size_pgp refs/tags/signed-empty
 
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
@@ -637,6 +652,7 @@  test_atom refs/tags/signed-short contents:body ''
 test_atom refs/tags/signed-short contents:signature "$sig"
 test_atom refs/tags/signed-short contents "subject line
 $sig"
+test_tag_contents_size_pgp refs/tags/signed-short
 
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
@@ -649,6 +665,7 @@  test_atom refs/tags/signed-long contents "subject line
 
 body contents
 $sig"
+test_tag_contents_size_pgp refs/tags/signed-long
 
 test_expect_success 'set up refs pointing to tree and blob' '
 	git update-ref refs/mytrees/first refs/heads/master^{tree} &&
@@ -664,6 +681,7 @@  test_atom refs/mytrees/first body ""
 test_atom refs/mytrees/first contents:body ""
 test_atom refs/mytrees/first contents:signature ""
 test_atom refs/mytrees/first contents ""
+test_atom refs/mytrees/first contents:size ""
 
 test_atom refs/myblobs/first subject ""
 test_atom refs/myblobs/first contents:subject ""
@@ -671,6 +689,7 @@  test_atom refs/myblobs/first body ""
 test_atom refs/myblobs/first contents:body ""
 test_atom refs/myblobs/first contents:signature ""
 test_atom refs/myblobs/first contents ""
+test_atom refs/myblobs/first contents:size ""
 
 test_expect_success 'set up multiple-sort tags' '
 	for when in 100000 200000