diff mbox series

[2/2,GSOC] ref-filter: add %(header) atom

Message ID aa6d73f3e526f416ee1e4e332e9ca3119efba0e8.1622126603.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series ref-filter: add %(raw) atom | expand

Commit Message

ZheNing Hu May 27, 2021, 2:43 p.m. UTC
From: ZheNing Hu <adlternative@gmail.com>

Add new formatting option `%(header)`, which will print the
the structured header part of the raw object data.

In the storage layout of an object: blob and tree only
contains raw data; commit and tag raw data contains two part:
header and contents. The header of tag contains "object OOO",
"type TTT", "tag AAA", "tagger GGG"; The header of commit
contains "tree RRR", "parent PPP", "author UUU", "committer CCC".

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-for-each-ref.txt |  7 +++++
 ref-filter.c                       | 26 +++++++++++++++++
 t/t6300-for-each-ref.sh            | 45 ++++++++++++++++++++++++++++++
 3 files changed, 78 insertions(+)

Comments

Felipe Contreras May 27, 2021, 4:37 p.m. UTC | #1
ZheNing Hu via GitGitGadget wrote:

> @@ -1372,6 +1389,15 @@ static void grab_raw_data(struct atom_value *val, int deref, void *buf, unsigned
>  				    &bodypos, &bodylen, &nonsiglen,
>  				    &sigpos, &siglen);
>  
> +		if (starts_with(name, "header")) {
> +			size_t header_len = subpos - (const char *)buf - 1;
> +			if (atom->u.header.option == H_BARE) {
> +				v->s = xmemdupz(buf, header_len);
> +			} else if (atom->u.header.option == H_LENGTH)

No need for braces in the if.

> +				v->s = xstrfmt("%"PRIuMAX, (uintmax_t)header_len);
> +			continue;
> +		}
> +
>  		if (atom->u.contents.option == C_SUB)
>  			v->s = copy_subject(subpos, sublen);
>  		else if (atom->u.contents.option == C_SUB_SANITIZE) {
Junio C Hamano May 28, 2021, 3:06 a.m. UTC | #2
"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> Add new formatting option `%(header)`, which will print the
> the structured header part of the raw object data.
>
> In the storage layout of an object: blob and tree only
> contains raw data; commit and tag raw data contains two part:
> header and contents. The header of tag contains "object OOO",
> "type TTT", "tag AAA", "tagger GGG"; The header of commit
> contains "tree RRR", "parent PPP", "author UUU", "committer CCC".
>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>  Documentation/git-for-each-ref.txt |  7 +++++
>  ref-filter.c                       | 26 +++++++++++++++++
>  t/t6300-for-each-ref.sh            | 45 ++++++++++++++++++++++++++++++
>  3 files changed, 78 insertions(+)

While having this may not be wrong, I am not sure who needs it.  Is
your "cat-file --batch" topic needs this new atom?
Junio C Hamano May 28, 2021, 4:36 a.m. UTC | #3
"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  		struct {
>  			enum { RAW_BARE, RAW_LENGTH } option;
>  		} raw_data;
> +		struct {
> +			enum { H_BARE, H_LENGTH } option;
> +		} header;

Raw does not use R_{BARE,LENGTH} and uses raw_data member.  Header
should follow suit unless there is a compelling reason not to, no?

		struct {
			enum { HEADER_BARE, HEADER_LENGTH } option;
		} header_data;

perhaps?

> @@ -1372,6 +1389,15 @@ static void grab_raw_data(struct atom_value *val, int deref, void *buf, unsigned
>  				    &bodypos, &bodylen, &nonsiglen,
>  				    &sigpos, &siglen);
>  
> +		if (starts_with(name, "header")) {
> +			size_t header_len = subpos - (const char *)buf - 1;

Hmph, is this correct?  I would expect that the "header" part of a
commit or a tag object excludes the blank line after the header
fields.  In other words, the "header" would be separated by a blank
line from the "body", and that separating blank line is not part of
"header" or "body".

Otherwise, if there is a user of %(header), it needs to be coded to
ignore the last blank line but has to diagnose it as an error if
there is a blank line before that.

> +			if (atom->u.header.option == H_BARE) {
> +				v->s = xmemdupz(buf, header_len);
> +			} else if (atom->u.header.option == H_LENGTH)
> +				v->s = xstrfmt("%"PRIuMAX, (uintmax_t)header_len);
> +			continue;
> +		}
> +
>  		if (atom->u.contents.option == C_SUB)
>  			v->s = copy_subject(subpos, sublen);
>  		else if (atom->u.contents.option == C_SUB_SANITIZE) {
ZheNing Hu May 28, 2021, 3:19 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> 于2021年5月28日周五 下午12:36写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> >               struct {
> >                       enum { RAW_BARE, RAW_LENGTH } option;
> >               } raw_data;
> > +             struct {
> > +                     enum { H_BARE, H_LENGTH } option;
> > +             } header;
>
> Raw does not use R_{BARE,LENGTH} and uses raw_data member.  Header
> should follow suit unless there is a compelling reason not to, no?
>
>                 struct {
>                         enum { HEADER_BARE, HEADER_LENGTH } option;
>                 } header_data;
>
> perhaps?
>

OK.


> > @@ -1372,6 +1389,15 @@ static void grab_raw_data(struct atom_value *val, int deref, void *buf, unsigned
> >                                   &bodypos, &bodylen, &nonsiglen,
> >                                   &sigpos, &siglen);
> >
> > +             if (starts_with(name, "header")) {
> > +                     size_t header_len = subpos - (const char *)buf - 1;
>
> Hmph, is this correct?  I would expect that the "header" part of a
> commit or a tag object excludes the blank line after the header
> fields.  In other words, the "header" would be separated by a blank
> line from the "body", and that separating blank line is not part of
> "header" or "body".
>
> Otherwise, if there is a user of %(header), it needs to be coded to
> ignore the last blank line but has to diagnose it as an error if
> there is a blank line before that.
>

I am a bit confused, Is there any problem with me doing this?

> > +                     size_t header_len = subpos - (const char *)buf - 1;

"header" part starts from "buf" and header_len have minus 1 so that
header part will not touch the blank line. At the same time, "contents"
part starts from subpos, and it also does not touch the blank line.

> While having this may not be wrong, I am not sure who needs it.  Is
> your "cat-file --batch" topic needs this new atom?

Ok, I will remove it from this topic temporarily.

Thanks.
--
ZheNing Hu
diff mbox series

Patch

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index f6ae751fd256..7827e48cde75 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -249,6 +249,13 @@  Note that `--format=%(raw)` should not combine with `--python`, `--shell`, `--tc
 `--perl` because if our binary raw data is passed to a variable in the host language,
 the host languages may cause escape errors.
 
+The structured header part of the raw data in a commit or a tag object is `header`,
+it composed of "tree XXX", "parent YYY", etc lines in commits, or composed of
+"object OOO", "type TTT", etc lines in tags.
+
+header:size::
+	The header size of the object.
+
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
 
diff --git a/ref-filter.c b/ref-filter.c
index c2abf5da7006..2f426830f562 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -141,6 +141,9 @@  static struct used_atom {
 		struct {
 			enum { RAW_BARE, RAW_LENGTH } option;
 		} raw_data;
+		struct {
+			enum { H_BARE, H_LENGTH } option;
+		} header;
 		struct {
 			cmp_status cmp_status;
 			const char *str;
@@ -385,6 +388,18 @@  static int raw_atom_parser(const struct ref_format *format, struct used_atom *at
 	return 0;
 }
 
+static int header_atom_parser(const struct ref_format *format, struct used_atom *atom,
+			      const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		atom->u.header.option = H_BARE;
+	else if (!strcmp(arg, "size"))
+		atom->u.header.option = H_LENGTH;
+	else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(header) argument: %s"), arg);
+	return 0;
+}
+
 static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
 			   const char *arg, struct strbuf *err)
 {
@@ -546,6 +561,7 @@  static struct {
 	{ "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
 	{ "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
 	{ "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
+	{ "header", SOURCE_OBJ, FIELD_STR, header_atom_parser },
 	{ "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	{ "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	{ "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },
@@ -1362,6 +1378,7 @@  static void grab_raw_data(struct atom_value *val, int deref, void *buf, unsigned
 		if ((obj->type != OBJ_TAG &&
 		     obj->type != OBJ_COMMIT) ||
 		    (strcmp(name, "body") &&
+		     !starts_with(name, "header") &&
 		     !starts_with(name, "subject") &&
 		     !starts_with(name, "trailers") &&
 		     !starts_with(name, "contents")))
@@ -1372,6 +1389,15 @@  static void grab_raw_data(struct atom_value *val, int deref, void *buf, unsigned
 				    &bodypos, &bodylen, &nonsiglen,
 				    &sigpos, &siglen);
 
+		if (starts_with(name, "header")) {
+			size_t header_len = subpos - (const char *)buf - 1;
+			if (atom->u.header.option == H_BARE) {
+				v->s = xmemdupz(buf, header_len);
+			} else if (atom->u.header.option == H_LENGTH)
+				v->s = xstrfmt("%"PRIuMAX, (uintmax_t)header_len);
+			continue;
+		}
+
 		if (atom->u.contents.option == C_SUB)
 			v->s = copy_subject(subpos, sublen);
 		else if (atom->u.contents.option == C_SUB_SANITIZE) {
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 07de4a84d70b..11fc8fc53649 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -232,6 +232,35 @@  test_expect_success 'basic atom: refs/tags/testtag *raw' '
 	test_cmp expected.clean actual.clean
 '
 
+test_expect_success 'basic atom: refs/tags/testtag header' '
+	cat >expected <<-EOF &&
+	object ea122842f48be4afb2d1fc6a4b96c05885ab7463
+	type commit
+	tag testtag
+	tagger C O Mitter <committer@example.com> 1151968725 +0200
+
+	EOF
+	git for-each-ref --format="%(header)" refs/tags/testtag >actual &&
+	test_cmp expected actual &&
+	echo "131" >expected &&
+	git for-each-ref --format="%(header:size)" refs/tags/testtag >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'basic atom: refs/heads/main header' '
+	cat >expected <<-EOF &&
+	tree 8039ce043250c402d62ca312e9596e42ce1c7bb0
+	author A U Thor <author@example.com> 1151968724 +0200
+	committer C O Mitter <committer@example.com> 1151968723 +0200
+
+	EOF
+	git for-each-ref --format="%(header)" refs/heads/main >actual &&
+	test_cmp expected actual &&
+	echo "162" >expected &&
+	git for-each-ref --format="%(header:size)" refs/heads/main >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'Check invalid atoms names are errors' '
 	test_must_fail git for-each-ref --format="%(INVALID)" refs/heads
 '
@@ -768,6 +797,14 @@  test_expect_success 'basic atom: refs/mytrees/first raw' '
 	test_cmp expected actual
 '
 
+test_expect_success 'basic atom: refs/mytrees/first header' '
+	echo "" >expected &&
+	git for-each-ref --format="%(header)" refs/mytrees/first >actual &&
+	test_cmp expected actual &&
+	git for-each-ref --format="%(header:size)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_atom refs/myblobs/first subject ""
 test_atom refs/myblobs/first contents:subject ""
 test_atom refs/myblobs/first body ""
@@ -785,6 +822,14 @@  test_expect_success 'basic atom: refs/myblobs/first raw' '
 	test_cmp expected actual
 '
 
+test_expect_success 'basic atom: refs/myblobs/first header' '
+	echo "" >expected &&
+	git for-each-ref --format="%(header)" refs/myblobs/first >actual &&
+	test_cmp expected actual &&
+	git for-each-ref --format="%(header:size)" refs/myblobs/first >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'set up refs pointing to binary blob' '
 	printf "%b" "a\0b\0c" >blob1 &&
 	printf "%b" "a\0c\0b" >blob2 &&