ref-filter: add support for %(contents:size)

Message ID	20200701132308.16691-1-chriscool@tuxfamily.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=EAeL=AM=vger.kernel.org=git-owner@kernel.org> From: Christian Couder <christian.couder@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Christian Couder <chriscool@tuxfamily.org> Subject: [PATCH] ref-filter: add support for %(contents:size) Date: Wed, 1 Jul 2020 15:23:08 +0200 Message-Id: <20200701132308.16691-1-chriscool@tuxfamily.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	ref-filter: add support for %(contents:size) \| expand ref-filter: add support for %(contents:size)

Message ID

20200701132308.16691-1-chriscool@tuxfamily.org (mailing list archive)

State

New, archived

Headers

From: Christian Couder <christian.couder@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
        Christian Couder <chriscool@tuxfamily.org>
Subject: [PATCH] ref-filter: add support for %(contents:size)
Date: Wed,  1 Jul 2020 15:23:08 +0200
Message-Id: <20200701132308.16691-1-chriscool@tuxfamily.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

ref-filter: add support for %(contents:size) | expand

Commit Message

Christian Couder July 1, 2020, 1:23 p.m. UTC

It's useful and efficient to be able to get the size of the
contents directly without having to pipe through `wc -c`.

Also the result of the following:

`git for-each-ref --format='%(contents)' | wc -c`

is off by one as `git for-each-ref` appends a newline character
after the contents, which can be seen by comparing its ouput
with the output from `git cat-file`.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-for-each-ref.txt | 27 +++++++++++++++------------
 ref-filter.c                       |  7 ++++++-
 t/t6300-for-each-ref.sh            |  2 ++
 3 files changed, 23 insertions(+), 13 deletions(-)

Comments

Jeff King July 1, 2020, 3:20 p.m. UTC | #1

On Wed, Jul 01, 2020 at 03:23:08PM +0200, Christian Couder wrote:

> It's useful and efficient to be able to get the size of the
> contents directly without having to pipe through `wc -c`.
> 
> Also the result of the following:
> 
> `git for-each-ref --format='%(contents)' | wc -c`
> 
> is off by one as `git for-each-ref` appends a newline character
> after the contents, which can be seen by comparing its ouput
> with the output from `git cat-file`.

It could also be accessed much more quickly, since we don't actually
need to load the object contents into memory to know the size.  cat-file
does these kind of optimizations (by building on oid_object_info()), and
its %(objectsize) will do the minimum amount of work needed.

I was going to suggest that instead of adding %(contents:size), you just
add %(objectsize). That would match cat-file's existing option, and we
hope to unify the formatters eventually. But it already exists (and I
think is even optimized courtesy of Olga's work).

> -The complete message in a commit and tag object is `contents`.
> -Its first line is `contents:subject`, where subject is the concatenation
> -of all lines of the commit message up to the first blank line.  The next
> -line is `contents:body`, where body is all of the lines after the first
> -blank line.  The optional GPG signature is `contents:signature`.  The
> -first `N` lines of the message is obtained using `contents:lines=N`.
> -Additionally, the trailers as interpreted by linkgit:git-interpret-trailers[1]
> -are obtained as `trailers` (or by using the historical alias
> -`contents:trailers`).  Non-trailer lines from the trailer block can be omitted
> -with `trailers:only`. Whitespace-continuations can be removed from trailers so
> -that each trailer appears on a line by itself with its full content with
> -`trailers:unfold`. Both can be used together as `trailers:unfold,only`.
> +The complete message in a commit and tag object is `contents`.  Its
> +size in bytes is `contents:size`.  Its first line is
> +`contents:subject`, where subject is the concatenation of all lines of
> +the commit message up to the first blank line.  The next line is
> +`contents:body`, where body is all of the lines after the first blank
> +line.  The optional GPG signature is `contents:signature`.  The first
> +`N` lines of the message is obtained using `contents:lines=N`.
> +Additionally, the trailers as interpreted by
> +linkgit:git-interpret-trailers[1] are obtained as `trailers` (or by
> +using the historical alias `contents:trailers`).  Non-trailer lines
> +from the trailer block can be omitted with
> +`trailers:only`. Whitespace-continuations can be removed from trailers
> +so that each trailer appears on a line by itself with its full content
> +with `trailers:unfold`. Both can be used together as
> +`trailers:unfold,only`.

Definitely not a new problem, but boy is that a dense paragraph. I
suspect an unordered list might be a nicer way of presenting the list of
format specifiers.

-Peff

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 6dcd39f6f6..673ace94d1 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -232,18 +232,21 @@  Fields that have name-email-date tuple as its value (`author`,
 `committer`, and `tagger`) can be suffixed with `name`, `email`,
 and `date` to extract the named component.
 
-The complete message in a commit and tag object is `contents`.
-Its first line is `contents:subject`, where subject is the concatenation
-of all lines of the commit message up to the first blank line.  The next
-line is `contents:body`, where body is all of the lines after the first
-blank line.  The optional GPG signature is `contents:signature`.  The
-first `N` lines of the message is obtained using `contents:lines=N`.
-Additionally, the trailers as interpreted by linkgit:git-interpret-trailers[1]
-are obtained as `trailers` (or by using the historical alias
-`contents:trailers`).  Non-trailer lines from the trailer block can be omitted
-with `trailers:only`. Whitespace-continuations can be removed from trailers so
-that each trailer appears on a line by itself with its full content with
-`trailers:unfold`. Both can be used together as `trailers:unfold,only`.
+The complete message in a commit and tag object is `contents`.  Its
+size in bytes is `contents:size`.  Its first line is
+`contents:subject`, where subject is the concatenation of all lines of
+the commit message up to the first blank line.  The next line is
+`contents:body`, where body is all of the lines after the first blank
+line.  The optional GPG signature is `contents:signature`.  The first
+`N` lines of the message is obtained using `contents:lines=N`.
+Additionally, the trailers as interpreted by
+linkgit:git-interpret-trailers[1] are obtained as `trailers` (or by
+using the historical alias `contents:trailers`).  Non-trailer lines
+from the trailer block can be omitted with
+`trailers:only`. Whitespace-continuations can be removed from trailers
+so that each trailer appears on a line by itself with its full content
+with `trailers:unfold`. Both can be used together as
+`trailers:unfold,only`.
 
 For sorting purposes, fields with numeric values sort in numeric order
 (`objectsize`, `authordate`, `committerdate`, `creatordate`, `taggerdate`).
diff --git a/ref-filter.c b/ref-filter.c
index bf7b70299b..036a95d0d2 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -127,7 +127,8 @@  static struct used_atom {
 			unsigned int nobracket : 1, push : 1, push_remote : 1;
 		} remote_ref;
 		struct {
-			enum { C_BARE, C_BODY, C_BODY_DEP, C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
+			enum { C_BARE, C_BODY, C_BODY_DEP, C_LENGTH,
+			       C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
@@ -338,6 +339,8 @@  static int contents_atom_parser(const struct ref_format *format, struct used_ato
 		atom->u.contents.option = C_BARE;
 	else if (!strcmp(arg, "body"))
 		atom->u.contents.option = C_BODY;
+	else if (!strcmp(arg, "size"))
+		atom->u.contents.option = C_LENGTH;
 	else if (!strcmp(arg, "signature"))
 		atom->u.contents.option = C_SIG;
 	else if (!strcmp(arg, "subject"))
@@ -1253,6 +1256,8 @@  static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			v->s = copy_subject(subpos, sublen);
 		else if (atom->u.contents.option == C_BODY_DEP)
 			v->s = xmemdupz(bodypos, bodylen);
+		else if (atom->u.contents.option == C_LENGTH)
+			v->s = xstrfmt("%ld", strlen(subpos));
 		else if (atom->u.contents.option == C_BODY)
 			v->s = xmemdupz(bodypos, nonsiglen);
 		else if (atom->u.contents.option == C_SIG)
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index da59fadc5d..4f730acd48 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -125,6 +125,7 @@  test_atom head contents:body ''
 test_atom head contents:signature ''
 test_atom head contents 'Initial
 '
+test_atom head contents:size '8'
 test_atom head HEAD '*'
 
 test_atom tag refname refs/tags/testtag
@@ -170,6 +171,7 @@  test_atom tag contents:body ''
 test_atom tag contents:signature ''
 test_atom tag contents 'Tagging at 1151968727
 '
+test_atom tag contents:size '22'
 test_atom tag HEAD ' '
 
 test_expect_success 'Check invalid atoms names are errors' '

ref-filter: add support for %(contents:size)

Commit Message

Comments

Patch