diff mbox series

[2/8,GSOC] ref-filter: add %(raw) atom

Message ID abee6a03becb929ffb292648d1ef64e61b66d53d.1623496458.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series cat-file: reuse ref-filter logic | expand

Commit Message

ZheNing Hu June 12, 2021, 11:14 a.m. UTC
From: ZheNing Hu <adlternative@gmail.com>

Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.

The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depends on the output of the atom being
text (specifically, no embedded NULs in it).

E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.

Therefore, add a new member in `struct atom_value`: `s_size`, which
can record raw object size, it can help us add raw object data to
the buffer or compare two buffers which contain raw object data.

Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, `--perl` because if our binary raw data is passed to a variable
in the host language, the host language may not support arbitrary binary
data in the variables of its string type.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-for-each-ref.txt |   9 ++
 ref-filter.c                       | 139 +++++++++++++++----
 t/t6300-for-each-ref.sh            | 207 +++++++++++++++++++++++++++++
 3 files changed, 328 insertions(+), 27 deletions(-)

Comments

Ævar Arnfjörð Bjarmason June 17, 2021, 7:10 a.m. UTC | #1
On Sat, Jun 12 2021, ZheNing Hu via GitGitGadget wrote:

> From: ZheNing Hu <adlternative@gmail.com>
>
> Add new formatting option `%(raw)`, which will print the raw
> object data without any changes. It will help further to migrate
> all cat-file formatting logic from cat-file to ref-filter.

Nice goal and feature to have.

> The raw data of blob, tree objects may contain '\0', but most of
> the logic in `ref-filter` depends on the output of the atom being
> text (specifically, no embedded NULs in it).
>
> E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
> add the data to the buffer. The raw data of a tree object is
> `100644 one\0...`, only the `100644 one` will be added to the buffer,
> which is incorrect.
>
> Therefore, add a new member in `struct atom_value`: `s_size`, which
> can record raw object size, it can help us add raw object data to
> the buffer or compare two buffers which contain raw object data.

Most of the functions that deal with this already use a strbuf in some
way, before we had a const char *, now there's a size_t to go along with
it, why not simply use a strbuf in the struct for the data? You'll then
get the size and \0 handling for free, and any functions to deal with
conversion can stick to the strbuf API, there seems to be a lot of back
and forth now.

> Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
> `--tcl`, `--perl` because if our binary raw data is passed to a variable
> in the host language, the host language may not support arbitrary binary
> data in the variables of its string type.

Perl at least deals with that just fine, and to the extent that it
doesn't any new problems here would have nothing to do with \0 being in
the data. Perl doesn't have a notion of "binary has \0 in it", it always
supports \0, it has a notion of "is it utf-8 or not?", so any encoding
problems wouldn't be new. I'd think that the same would be true of
Python, but I'm not sure.


> +test_expect_success 'basic atom: refs/tags/testtag *raw' '
> +	git cat-file commit refs/tags/testtag^{} >expected &&
> +	git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
> +	sanitize_pgp <expected >expected.clean &&
> +	sanitize_pgp <actual >actual.clean &&
> +	echo "" >>expected.clean &&

Just "echo" will do, ditto for the rest. Also odd to go back and forth
between populating expected.clean & actual.clean.


> +test_expect_success 'set up refs pointing to binary blob' '
> +	printf "a\0b\0c" >blob1 &&
> +	printf "a\0c\0b" >blob2 &&
> +	printf "\0a\0b\0c" >blob3 &&
> +	printf "abc" >blob4 &&
> +	printf "\0 \0 \0 " >blob5 &&
> +	printf "\0 \0a\0 " >blob6 &&
> +	printf "  " >blob7 &&
> +	>blob8 &&
> +	git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
> +	git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
> +	git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
> +	git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
> +	git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
> +	git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
> +	git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
> +	git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8

Hrm, xargs just to avoid:

    git update-ref ... $(git hash-object) ?

> +test_expect_success '%(raw) with --python must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --python
> +'
> +
> +test_expect_success '%(raw) with --tcl must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --tcl
> +'
> +
> +test_expect_success '%(raw) with --perl must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --perl
> +'
> +
> +test_expect_success '%(raw) with --shell must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --shell
> +'
> +
> +test_expect_success '%(raw) with --shell and --sort=raw must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
> +'

s/must failed/must fail/, but see question above about encoding in these
languages...
Junio C Hamano June 17, 2021, 7:34 a.m. UTC | #2
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
>> `--tcl`, `--perl` because if our binary raw data is passed to a variable
>> in the host language, the host language may not support arbitrary binary
>> data in the variables of its string type.
>
> Perl at least deals with that just fine, and to the extent that it
> doesn't any new problems here would have nothing to do with \0 being in
> the data. Perl doesn't have a notion of "binary has \0 in it", it always
> supports \0, it has a notion of "is it utf-8 or not?", so any encoding
> problems wouldn't be new. I'd think that the same would be true of
> Python, but I'm not sure.

During an earlier iteration long time ago, as we knew Perl is
capable of handling sequence of bytes including NUL, it was decided
to start more strict by rejecting binary for all languages, which
can later be loosened, to limit the scope of the initial
implementation.

>> +	git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
>> +	git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
>> +	git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
>> +	git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
>> +	git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
>> +	git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
>> +	git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
>> +	git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
>
> Hrm, xargs just to avoid:
>
>     git update-ref ... $(git hash-object) ?

That's horrible.  Thanks for noticing.

We'd want to catch segfaults from both hash-object and update-ref.
One way to do so may be

	O=$(git hash-object -w blob1) &&
	git update-ref refs/myblobs/blob1 "$O"

Thanks for a review.
ZheNing Hu June 17, 2021, 9:22 a.m. UTC | #3
Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午3:16写道:
>
> > The raw data of blob, tree objects may contain '\0', but most of
> > the logic in `ref-filter` depends on the output of the atom being
> > text (specifically, no embedded NULs in it).
> >
> > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
> > add the data to the buffer. The raw data of a tree object is
> > `100644 one\0...`, only the `100644 one` will be added to the buffer,
> > which is incorrect.
> >
> > Therefore, add a new member in `struct atom_value`: `s_size`, which
> > can record raw object size, it can help us add raw object data to
> > the buffer or compare two buffers which contain raw object data.
>
> Most of the functions that deal with this already use a strbuf in some
> way, before we had a const char *, now there's a size_t to go along with
> it, why not simply use a strbuf in the struct for the data? You'll then
> get the size and \0 handling for free, and any functions to deal with
> conversion can stick to the strbuf API, there seems to be a lot of back
> and forth now.
>

Yes, strbuf is a suitable choice when using <str,len> pair.
But if replace v->s with strbuf, the possible changes will be larger.

> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
> > `--tcl`, `--perl` because if our binary raw data is passed to a variable
> > in the host language, the host language may not support arbitrary binary
> > data in the variables of its string type.
>
> Perl at least deals with that just fine, and to the extent that it
> doesn't any new problems here would have nothing to do with \0 being in
> the data. Perl doesn't have a notion of "binary has \0 in it", it always
> supports \0, it has a notion of "is it utf-8 or not?", so any encoding
> problems wouldn't be new. I'd think that the same would be true of
> Python, but I'm not sure.
>

Not python safe. See [1].
Regarding the perl language, I support Junio's point of view: it can be
re-supported in the future.

>
> > +test_expect_success 'basic atom: refs/tags/testtag *raw' '
> > +     git cat-file commit refs/tags/testtag^{} >expected &&
> > +     git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
> > +     sanitize_pgp <expected >expected.clean &&
> > +     sanitize_pgp <actual >actual.clean &&
> > +     echo "" >>expected.clean &&
>
> Just "echo" will do, ditto for the rest. Also odd to go back and forth
> between populating expected.clean & actual.clean.
>

Are you saying that sanitize_pgp is not needed?

>
> > +test_expect_success 'set up refs pointing to binary blob' '
> > +     printf "a\0b\0c" >blob1 &&
> > +     printf "a\0c\0b" >blob2 &&
> > +     printf "\0a\0b\0c" >blob3 &&
> > +     printf "abc" >blob4 &&
> > +     printf "\0 \0 \0 " >blob5 &&
> > +     printf "\0 \0a\0 " >blob6 &&
> > +     printf "  " >blob7 &&
> > +     >blob8 &&
> > +     git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
> > +     git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
> > +     git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
> > +     git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
> > +     git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
> > +     git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
> > +     git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
> > +     git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
>
> Hrm, xargs just to avoid:
>
>     git update-ref ... $(git hash-object) ?
>

I didn’t think about it, just for convenience.

> > +test_expect_success '%(raw) with --python must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --python
> > +'
> > +
> > +test_expect_success '%(raw) with --tcl must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --tcl
> > +'
> > +
> > +test_expect_success '%(raw) with --perl must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --perl
> > +'
> > +
> > +test_expect_success '%(raw) with --shell must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --shell
> > +'
> > +
> > +test_expect_success '%(raw) with --shell and --sort=raw must failed' '
> > +     test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
> > +'
>
> s/must failed/must fail/, but see question above about encoding in these
> languages...


[1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@mail.gmail.com/

Thanks for a review.
--
ZheNing Hu
Ævar Arnfjörð Bjarmason June 17, 2021, 2:37 p.m. UTC | #4
On Thu, Jun 17 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午3:16写道:
>>
>> > The raw data of blob, tree objects may contain '\0', but most of
>> > the logic in `ref-filter` depends on the output of the atom being
>> > text (specifically, no embedded NULs in it).
>> >
>> > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
>> > add the data to the buffer. The raw data of a tree object is
>> > `100644 one\0...`, only the `100644 one` will be added to the buffer,
>> > which is incorrect.
>> >
>> > Therefore, add a new member in `struct atom_value`: `s_size`, which
>> > can record raw object size, it can help us add raw object data to
>> > the buffer or compare two buffers which contain raw object data.
>>
>> Most of the functions that deal with this already use a strbuf in some
>> way, before we had a const char *, now there's a size_t to go along with
>> it, why not simply use a strbuf in the struct for the data? You'll then
>> get the size and \0 handling for free, and any functions to deal with
>> conversion can stick to the strbuf API, there seems to be a lot of back
>> and forth now.
>>
>
> Yes, strbuf is a suitable choice when using <str,len> pair.
> But if replace v->s with strbuf, the possible changes will be larger.

I for one would like to see it done that way, those changes are usually
easy to read. Also it seems a large part of 2/8 is extra new code
because we didn't do that, e.g. getting length differently if something
is a strbuf or not, passing char*/size_t pairs to new functions etc.

>> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
>> > `--tcl`, `--perl` because if our binary raw data is passed to a variable
>> > in the host language, the host language may not support arbitrary binary
>> > data in the variables of its string type.
>>
>> Perl at least deals with that just fine, and to the extent that it
>> doesn't any new problems here would have nothing to do with \0 being in
>> the data. Perl doesn't have a notion of "binary has \0 in it", it always
>> supports \0, it has a notion of "is it utf-8 or not?", so any encoding
>> problems wouldn't be new. I'd think that the same would be true of
>> Python, but I'm not sure.
>>
>
> Not python safe. See [1].
> Regarding the perl language, I support Junio's point of view: it can be
> re-supported in the future.

Ah, I'd missed that. Anyway, if it's easy it seems you discovered that
Perl deals with it correctly, so we could just have it support this.

>>
>> > +test_expect_success 'basic atom: refs/tags/testtag *raw' '
>> > +     git cat-file commit refs/tags/testtag^{} >expected &&
>> > +     git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
>> > +     sanitize_pgp <expected >expected.clean &&
>> > +     sanitize_pgp <actual >actual.clean &&
>> > +     echo "" >>expected.clean &&
>>
>> Just "echo" will do, ditto for the rest. Also odd to go back and forth
>> between populating expected.clean & actual.clean.
>>
>
> Are you saying that sanitize_pgp is not needed?

No that instead of:

    echo "" >x

You can do:

    echo >x

And also that going back and forth between populating different files is
confusing, i.e. this:


    echo a >x
    echo c >y
    echo b >>x

is better as:

    echo a >x
    echo b >>x
    echo c >y


>>
>> > +test_expect_success 'set up refs pointing to binary blob' '
>> > +     printf "a\0b\0c" >blob1 &&
>> > +     printf "a\0c\0b" >blob2 &&
>> > +     printf "\0a\0b\0c" >blob3 &&
>> > +     printf "abc" >blob4 &&
>> > +     printf "\0 \0 \0 " >blob5 &&
>> > +     printf "\0 \0a\0 " >blob6 &&
>> > +     printf "  " >blob7 &&
>> > +     >blob8 &&
>> > +     git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
>> > +     git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
>> > +     git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
>> > +     git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
>> > +     git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
>> > +     git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
>> > +     git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
>> > +     git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
>>
>> Hrm, xargs just to avoid:
>>
>>     git update-ref ... $(git hash-object) ?
>>
>
> I didn’t think about it, just for convenience.

*nod*, Junio had a good suggestion.

>> > +test_expect_success '%(raw) with --python must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --python
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --tcl must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --tcl
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --perl must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --perl
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --shell must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --shell
>> > +'
>> > +
>> > +test_expect_success '%(raw) with --shell and --sort=raw must failed' '
>> > +     test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
>> > +'
>>
>> s/must failed/must fail/, but see question above about encoding in these
>> languages...
>
>
> [1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@mail.gmail.com/
>
> Thanks for a review.
ZheNing Hu June 17, 2021, 4:14 p.m. UTC | #5
Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午10:45写道:
>
>
> On Thu, Jun 17 2021, ZheNing Hu wrote:
>
> >
> > Yes, strbuf is a suitable choice when using <str,len> pair.
> > But if replace v->s with strbuf, the possible changes will be larger.
>
> I for one would like to see it done that way, those changes are usually
> easy to read. Also it seems a large part of 2/8 is extra new code
> because we didn't do that, e.g. getting length differently if something
> is a strbuf or not, passing char*/size_t pairs to new functions etc.
>

After some refactoring, I found that there are two problems:
1. There are a lot of codes like this in ref-filter to fill v->s:

v->s = show_ref(...)
v->s = copy_email(...)

It is very difficult to modify here: We know that show_ref()
or copy_email() will allocate a block of memory to v->s, but
if v->s is a strbuf, what should we do? In copy_email(), we
can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr()
instead of xstrdup() and xmemdupz(). But show_ref() will call
external functions like shorten_unambiguous_ref(), we don’t know
whether it will return us NULL or a dynamically allocated memory.
If continue to pass v->s to the inner function, it is not a feasible
method. Or we can use strbuf_attach() + strlen(), I'm not sure
this is a good method.

2. See:

-       for (i = 0; i < used_atom_cnt; i++) {
+       for (i = 0; i < used_atom_cnt; i++) {
                struct atom_value *v = &ref->value[i];
-               if (v->s == NULL && used_atom[i].source == SOURCE_NONE)
+               if (v->s.len == 0 && used_atom[i].source == SOURCE_NONE)
                        return strbuf_addf_ret(err, -1, _("missing
object %s for %s"),

oid_to_hex(&ref->objectname), ref->refname);
        }

In the case of using strbuf, I don’t know how to distinguish between an empty
strbuf and NULL. It can be easily distinguished by using c-style "const char*".

> >
> > Not python safe. See [1].
> > Regarding the perl language, I support Junio's point of view: it can be
> > re-supported in the future.
>
> Ah, I'd missed that. Anyway, if it's easy it seems you discovered that
> Perl deals with it correctly, so we could just have it support this.
>

Well, it's ok, support for perl will be put in a separate commit.

> >>
> >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' '
> >> > +     git cat-file commit refs/tags/testtag^{} >expected &&
> >> > +     git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
> >> > +     sanitize_pgp <expected >expected.clean &&
> >> > +     sanitize_pgp <actual >actual.clean &&
> >> > +     echo "" >>expected.clean &&
> >>
> >> Just "echo" will do, ditto for the rest. Also odd to go back and forth
> >> between populating expected.clean & actual.clean.
> >>
> >
> > Are you saying that sanitize_pgp is not needed?
>
> No that instead of:
>
>     echo "" >x
>
> You can do:
>
>     echo >x
>
> And also that going back and forth between populating different files is
> confusing, i.e. this:
>
>
>     echo a >x
>     echo c >y
>     echo b >>x
>
> is better as:
>
>     echo a >x
>     echo b >>x
>     echo c >y
>
>

Thanks, I get what you meant now.

> >>
> >> > +test_expect_success 'set up refs pointing to binary blob' '
> >> > +     printf "a\0b\0c" >blob1 &&
> >> > +     printf "a\0c\0b" >blob2 &&
> >> > +     printf "\0a\0b\0c" >blob3 &&
> >> > +     printf "abc" >blob4 &&
> >> > +     printf "\0 \0 \0 " >blob5 &&
> >> > +     printf "\0 \0a\0 " >blob6 &&
> >> > +     printf "  " >blob7 &&
> >> > +     >blob8 &&
> >> > +     git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
> >> > +     git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
> >> > +     git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
> >> > +     git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
> >> > +     git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
> >> > +     git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
> >> > +     git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
> >> > +     git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
> >>
> >> Hrm, xargs just to avoid:
> >>
> >>     git update-ref ... $(git hash-object) ?
> >>
> >
> > I didn’t think about it, just for convenience.
>
> *nod*, Junio had a good suggestion.
>

ok.

Thanks.
--
ZheNing Hu
Ævar Arnfjörð Bjarmason June 18, 2021, 10:49 a.m. UTC | #6
On Fri, Jun 18 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年6月17日周四 下午10:45写道:
>>
>>
>> On Thu, Jun 17 2021, ZheNing Hu wrote:
>>
>> >
>> > Yes, strbuf is a suitable choice when using <str,len> pair.
>> > But if replace v->s with strbuf, the possible changes will be larger.
>>
>> I for one would like to see it done that way, those changes are usually
>> easy to read. Also it seems a large part of 2/8 is extra new code
>> because we didn't do that, e.g. getting length differently if something
>> is a strbuf or not, passing char*/size_t pairs to new functions etc.
>>
>
> After some refactoring, I found that there are two problems:
> 1. There are a lot of codes like this in ref-filter to fill v->s:
>
> v->s = show_ref(...)
> v->s = copy_email(...)
>
> It is very difficult to modify here: We know that show_ref()
> or copy_email() will allocate a block of memory to v->s, but
> if v->s is a strbuf, what should we do? In copy_email(), we
> can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr()
> instead of xstrdup() and xmemdupz(). But show_ref() will call
> external functions like shorten_unambiguous_ref(), we don’t know
> whether it will return us NULL or a dynamically allocated memory.
> If continue to pass v->s to the inner function, it is not a feasible
> method. Or we can use strbuf_attach() + strlen(), I'm not sure
> this is a good method.
>
> 2. See:
>
> -       for (i = 0; i < used_atom_cnt; i++) {
> +       for (i = 0; i < used_atom_cnt; i++) {
>                 struct atom_value *v = &ref->value[i];
> -               if (v->s == NULL && used_atom[i].source == SOURCE_NONE)
> +               if (v->s.len == 0 && used_atom[i].source == SOURCE_NONE)
>                         return strbuf_addf_ret(err, -1, _("missing
> object %s for %s"),
>
> oid_to_hex(&ref->objectname), ref->refname);
>         }
>
> In the case of using strbuf, I don’t know how to distinguish between an empty
> strbuf and NULL. It can be easily distinguished by using c-style "const char*".

Yes, sometimes it's just too much of a hassle, looking at
shorten_unambiguous_ref() which returns a xstrdup()'d value that could
indeed be strbuf_attach'd. I haven't tried the conversion myself,
perhaps it's too much hassle.

Just a suggestion from reading your patch in isolation.


>> >
>> > Not python safe. See [1].
>> > Regarding the perl language, I support Junio's point of view: it can be
>> > re-supported in the future.
>>
>> Ah, I'd missed that. Anyway, if it's easy it seems you discovered that
>> Perl deals with it correctly, so we could just have it support this.
>>
>
> Well, it's ok, support for perl will be put in a separate commit.
>
>> >>
>> >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' '
>> >> > +     git cat-file commit refs/tags/testtag^{} >expected &&
>> >> > +     git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
>> >> > +     sanitize_pgp <expected >expected.clean &&
>> >> > +     sanitize_pgp <actual >actual.clean &&
>> >> > +     echo "" >>expected.clean &&
>> >>
>> >> Just "echo" will do, ditto for the rest. Also odd to go back and forth
>> >> between populating expected.clean & actual.clean.
>> >>
>> >
>> > Are you saying that sanitize_pgp is not needed?
>>
>> No that instead of:
>>
>>     echo "" >x
>>
>> You can do:
>>
>>     echo >x
>>
>> And also that going back and forth between populating different files is
>> confusing, i.e. this:
>>
>>
>>     echo a >x
>>     echo c >y
>>     echo b >>x
>>
>> is better as:
>>
>>     echo a >x
>>     echo b >>x
>>     echo c >y
>>
>>
>
> Thanks, I get what you meant now.
>
>> >>
>> >> > +test_expect_success 'set up refs pointing to binary blob' '
>> >> > +     printf "a\0b\0c" >blob1 &&
>> >> > +     printf "a\0c\0b" >blob2 &&
>> >> > +     printf "\0a\0b\0c" >blob3 &&
>> >> > +     printf "abc" >blob4 &&
>> >> > +     printf "\0 \0 \0 " >blob5 &&
>> >> > +     printf "\0 \0a\0 " >blob6 &&
>> >> > +     printf "  " >blob7 &&
>> >> > +     >blob8 &&
>> >> > +     git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
>> >> > +     git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
>> >> > +     git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
>> >> > +     git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
>> >> > +     git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
>> >> > +     git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
>> >> > +     git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
>> >> > +     git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
>> >>
>> >> Hrm, xargs just to avoid:
>> >>
>> >>     git update-ref ... $(git hash-object) ?
>> >>
>> >
>> > I didn’t think about it, just for convenience.
>>
>> *nod*, Junio had a good suggestion.
>>
>
> ok.
>
> Thanks.
Christian Couder June 18, 2021, 1:47 p.m. UTC | #7
On Fri, Jun 18, 2021 at 12:51 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:

> On Fri, Jun 18 2021, ZheNing Hu wrote:

> > After some refactoring, I found that there are two problems:
> > 1. There are a lot of codes like this in ref-filter to fill v->s:
> >
> > v->s = show_ref(...)
> > v->s = copy_email(...)
> >
> > It is very difficult to modify here: We know that show_ref()
> > or copy_email() will allocate a block of memory to v->s, but
> > if v->s is a strbuf, what should we do? In copy_email(), we
> > can pass the v->s to copy_email() and use strbuf_add()/strbuf_addstr()
> > instead of xstrdup() and xmemdupz(). But show_ref() will call
> > external functions like shorten_unambiguous_ref(), we don’t know
> > whether it will return us NULL or a dynamically allocated memory.
> > If continue to pass v->s to the inner function, it is not a feasible
> > method. Or we can use strbuf_attach() + strlen(), I'm not sure
> > this is a good method.

If you resend this patch, it might be a good idea to add a short
version of the above explanations into the commit message.

[...]

> > In the case of using strbuf, I don’t know how to distinguish between an empty
> > strbuf and NULL. It can be easily distinguished by using c-style "const char*".

Maybe this could also be part of the explanation.

> Yes, sometimes it's just too much of a hassle, looking at
> shorten_unambiguous_ref() which returns a xstrdup()'d value that could
> indeed be strbuf_attach'd. I haven't tried the conversion myself,
> perhaps it's too much hassle.
>
> Just a suggestion from reading your patch in isolation.

Yeah, thanks for the review anyway!
diff mbox series

Patch

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 2ae2478de706..7f1f0a1ca3b6 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -235,6 +235,15 @@  and `date` to extract the named component.  For email fields (`authoremail`,
 without angle brackets, and `:localpart` to get the part before the `@` symbol
 out of the trimmed email.
 
+The raw data in an object is `raw`.
+
+raw:size::
+	The raw data size of the object.
+
+Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
+`--perl` because the host language may not support arbitrary binary data in the
+variables of its string type.
+
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
 
diff --git a/ref-filter.c b/ref-filter.c
index 5cee6512fbaf..7822be903071 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -144,6 +144,7 @@  enum atom_type {
 	ATOM_BODY,
 	ATOM_TRAILERS,
 	ATOM_CONTENTS,
+	ATOM_RAW,
 	ATOM_UPSTREAM,
 	ATOM_PUSH,
 	ATOM_SYMREF,
@@ -189,6 +190,9 @@  static struct used_atom {
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
+		struct {
+			enum { RAW_BARE, RAW_LENGTH } option;
+		} raw_data;
 		struct {
 			cmp_status cmp_status;
 			const char *str;
@@ -426,6 +430,18 @@  static int contents_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
+static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+				const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		atom->u.raw_data.option = RAW_BARE;
+	else if (!strcmp(arg, "size"))
+		atom->u.raw_data.option = RAW_LENGTH;
+	else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
+	return 0;
+}
+
 static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
 			   const char *arg, struct strbuf *err)
 {
@@ -586,6 +602,7 @@  static struct {
 	[ATOM_BODY] = { "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
 	[ATOM_TRAILERS] = { "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
 	[ATOM_CONTENTS] = { "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
+	[ATOM_RAW] = { "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
 	[ATOM_UPSTREAM] = { "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_PUSH] = { "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_SYMREF] = { "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },
@@ -620,12 +637,15 @@  struct ref_formatting_state {
 
 struct atom_value {
 	const char *s;
+	size_t s_size;
 	int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state,
 		       struct strbuf *err);
 	uintmax_t value; /* used for sorting when not FIELD_STR */
 	struct used_atom *atom;
 };
 
+#define ATOM_VALUE_S_SIZE_INIT (-1)
+
 /*
  * Used to parse format string and sort specifiers
  */
@@ -644,13 +664,6 @@  static int parse_ref_filter_atom(const struct ref_format *format,
 		return strbuf_addf_ret(err, -1, _("malformed field name: %.*s"),
 				       (int)(ep-atom), atom);
 
-	/* Do we have the atom already used elsewhere? */
-	for (i = 0; i < used_atom_cnt; i++) {
-		int len = strlen(used_atom[i].name);
-		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
-			return i;
-	}
-
 	/*
 	 * If the atom name has a colon, strip it and everything after
 	 * it off - it specifies the format for this entry, and
@@ -660,6 +673,13 @@  static int parse_ref_filter_atom(const struct ref_format *format,
 	arg = memchr(sp, ':', ep - sp);
 	atom_len = (arg ? arg : ep) - sp;
 
+	/* Do we have the atom already used elsewhere? */
+	for (i = 0; i < used_atom_cnt; i++) {
+		int len = strlen(used_atom[i].name);
+		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
+			return i;
+	}
+
 	/* Is the atom a valid one? */
 	for (i = 0; i < ARRAY_SIZE(valid_atom); i++) {
 		int len = strlen(valid_atom[i].name);
@@ -709,11 +729,14 @@  static int parse_ref_filter_atom(const struct ref_format *format,
 	return at;
 }
 
-static void quote_formatting(struct strbuf *s, const char *str, int quote_style)
+static void quote_formatting(struct strbuf *s, const char *str, size_t len, int quote_style)
 {
 	switch (quote_style) {
 	case QUOTE_NONE:
-		strbuf_addstr(s, str);
+		if (len != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(s, str, len);
+		else
+			strbuf_addstr(s, str);
 		break;
 	case QUOTE_SHELL:
 		sq_quote_buf(s, str);
@@ -740,9 +763,12 @@  static int append_atom(struct atom_value *v, struct ref_formatting_state *state,
 	 * encountered.
 	 */
 	if (!state->stack->prev)
-		quote_formatting(&state->stack->output, v->s, state->quote_style);
+		quote_formatting(&state->stack->output, v->s, v->s_size, state->quote_style);
 	else
-		strbuf_addstr(&state->stack->output, v->s);
+		if (v->s_size != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(&state->stack->output, v->s, v->s_size);
+		else
+			strbuf_addstr(&state->stack->output, v->s);
 	return 0;
 }
 
@@ -842,21 +868,23 @@  static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state
 	return 0;
 }
 
-static int is_empty(const char *s)
+static int is_empty(struct strbuf *buf)
 {
-	while (*s != '\0') {
-		if (!isspace(*s))
-			return 0;
-		s++;
-	}
-	return 1;
-}
+	const char *cur = buf->buf;
+	const char *end = buf->buf + buf->len;
+
+	while (cur != end && (isspace(*cur)))
+		cur++;
+
+	return cur == end;
+ }
 
 static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state,
 			     struct strbuf *err)
 {
 	struct ref_formatting_stack *cur = state->stack;
 	struct if_then_else *if_then_else = NULL;
+	size_t str_len = 0;
 
 	if (cur->at_end == if_then_else_handler)
 		if_then_else = (struct if_then_else *)cur->at_end_data;
@@ -867,18 +895,22 @@  static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_sta
 	if (if_then_else->else_atom_seen)
 		return strbuf_addf_ret(err, -1, _("format: %%(then) atom used after %%(else)"));
 	if_then_else->then_atom_seen = 1;
+	if (if_then_else->str)
+		str_len = strlen(if_then_else->str);
 	/*
 	 * If the 'equals' or 'notequals' attribute is used then
 	 * perform the required comparison. If not, only non-empty
 	 * strings satisfy the 'if' condition.
 	 */
 	if (if_then_else->cmp_status == COMPARE_EQUAL) {
-		if (!strcmp(if_then_else->str, cur->output.buf))
+		if (str_len == cur->output.len &&
+		    !memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
 	} else if (if_then_else->cmp_status == COMPARE_UNEQUAL) {
-		if (strcmp(if_then_else->str, cur->output.buf))
+		if (str_len != cur->output.len ||
+		    memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
-	} else if (cur->output.len && !is_empty(cur->output.buf))
+	} else if (cur->output.len && !is_empty(&cur->output))
 		if_then_else->condition_satisfied = 1;
 	strbuf_reset(&cur->output);
 	return 0;
@@ -924,7 +956,7 @@  static int end_atom_handler(struct atom_value *atomv, struct ref_formatting_stat
 	 * only on the topmost supporting atom.
 	 */
 	if (!current->prev->prev) {
-		quote_formatting(&s, current->output.buf, state->quote_style);
+		quote_formatting(&s, current->output.buf, current->output.len, state->quote_style);
 		strbuf_swap(&current->output, &s);
 	}
 	strbuf_release(&s);
@@ -974,6 +1006,10 @@  int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
+		if (format->quote_style && used_atom[at].atom_type == ATOM_RAW &&
+		    used_atom[at].u.raw_data.option == RAW_BARE)
+			die(_("--format=%.*s cannot be used with"
+			      "--python, --shell, --tcl, --perl"), (int)(ep - sp - 2), sp + 2);
 		cp = ep + 1;
 
 		if (skip_prefix(used_atom[at].name, "color:", &color))
@@ -1362,17 +1398,29 @@  static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
 	size_t sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;
 	void *buf = data->content;
+	unsigned long buf_size = data->size;
 
 	for (i = 0; i < used_atom_cnt; i++) {
 		struct used_atom *atom = &used_atom[i];
 		const char *name = atom->name;
 		struct atom_value *v = &val[i];
+		enum atom_type atom_type = atom->atom_type;
 
 		if (!!deref != (*name == '*'))
 			continue;
 		if (deref)
 			name++;
 
+		if (atom_type == ATOM_RAW) {
+			if (atom->u.raw_data.option == RAW_BARE) {
+				v->s = xmemdupz(buf, buf_size);
+				v->s_size = buf_size;
+			} else if (atom->u.raw_data.option == RAW_LENGTH) {
+				v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
+			}
+			continue;
+		}
+
 		if ((data->type != OBJ_TAG &&
 		     data->type != OBJ_COMMIT) ||
 		    (strcmp(name, "body") &&
@@ -1460,9 +1508,11 @@  static void grab_values(struct atom_value *val, int deref, struct object *obj, s
 		break;
 	case OBJ_TREE:
 		/* grab_tree_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, data);
 		break;
 	case OBJ_BLOB:
 		/* grab_blob_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, data);
 		break;
 	default:
 		die("Eh?  Object of type %d?", obj->type);
@@ -1766,6 +1816,7 @@  static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 		const char *refname;
 		struct branch *branch = NULL;
 
+		v->s_size = ATOM_VALUE_S_SIZE_INIT;
 		v->handler = append_atom;
 		v->atom = atom;
 
@@ -2369,6 +2420,19 @@  static int compare_detached_head(struct ref_array_item *a, struct ref_array_item
 	return 0;
 }
 
+static int memcasecmp(const void *vs1, const void *vs2, size_t n)
+{
+	const char *s1 = vs1, *s2 = vs2;
+	const char *end = s1 + n;
+
+	for (; s1 < end; s1++, s2++) {
+		int diff = tolower(*s1) - tolower(*s2);
+		if (diff)
+			return diff;
+	}
+	return 0;
+}
+
 static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, struct ref_array_item *b)
 {
 	struct atom_value *va, *vb;
@@ -2389,10 +2453,30 @@  static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
 	} else if (s->sort_flags & REF_SORTING_VERSION) {
 		cmp = versioncmp(va->s, vb->s);
 	} else if (cmp_type == FIELD_STR) {
-		int (*cmp_fn)(const char *, const char *);
-		cmp_fn = s->sort_flags & REF_SORTING_ICASE
-			? strcasecmp : strcmp;
-		cmp = cmp_fn(va->s, vb->s);
+		if (va->s_size == ATOM_VALUE_S_SIZE_INIT &&
+		    vb->s_size == ATOM_VALUE_S_SIZE_INIT) {
+			int (*cmp_fn)(const char *, const char *);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? strcasecmp : strcmp;
+			cmp = cmp_fn(va->s, vb->s);
+		} else {
+			size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(va->s) : va->s_size;
+			size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(vb->s) : vb->s_size;
+			int (*cmp_fn)(const void *, const void *, size_t);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? memcasecmp : memcmp;
+
+			cmp = cmp_fn(va->s, vb->s, b_size > a_size ?
+				     a_size : b_size);
+			if (!cmp) {
+				if (a_size > b_size)
+					cmp = 1;
+				else if (a_size < b_size)
+					cmp = -1;
+			}
+		}
 	} else {
 		if (va->value < vb->value)
 			cmp = -1;
@@ -2492,6 +2576,7 @@  int format_ref_array_item(struct ref_array_item *info,
 	}
 	if (format->need_color_reset_at_eol) {
 		struct atom_value resetv;
+		resetv.s_size = ATOM_VALUE_S_SIZE_INIT;
 		resetv.s = GIT_COLOR_RESET;
 		if (append_atom(&resetv, &state, error_buf)) {
 			pop_stack_element(&state.stack);
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9e0214076b4d..e2867de791e7 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -130,6 +130,8 @@  test_atom head parent:short=10 ''
 test_atom head numparent 0
 test_atom head object ''
 test_atom head type ''
+test_atom head raw "$(git cat-file commit refs/heads/main)
+"
 test_atom head '*objectname' ''
 test_atom head '*objecttype' ''
 test_atom head author 'A U Thor <author@example.com> 1151968724 +0200'
@@ -221,6 +223,15 @@  test_atom tag contents 'Tagging at 1151968727
 '
 test_atom tag HEAD ' '
 
+test_expect_success 'basic atom: refs/tags/testtag *raw' '
+	git cat-file commit refs/tags/testtag^{} >expected &&
+	git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'Check invalid atoms names are errors' '
 	test_must_fail git for-each-ref --format="%(INVALID)" refs/heads
 '
@@ -686,6 +697,15 @@  test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
 test_atom refs/tags/signed-empty contents "$sig"
 
+test_expect_success GPG 'basic atom: refs/tags/signed-empty raw' '
+	git cat-file tag refs/tags/signed-empty >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-empty >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short subject:sanitize 'subject-line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
@@ -695,6 +715,15 @@  test_atom refs/tags/signed-short contents:signature "$sig"
 test_atom refs/tags/signed-short contents "subject line
 $sig"
 
+test_expect_success GPG 'basic atom: refs/tags/signed-short raw' '
+	git cat-file tag refs/tags/signed-short >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-short >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long subject:sanitize 'subject-line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
@@ -708,6 +737,15 @@  test_atom refs/tags/signed-long contents "subject line
 body contents
 $sig"
 
+test_expect_success GPG 'basic atom: refs/tags/signed-long raw' '
+	git cat-file tag refs/tags/signed-long >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-long >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'set up refs pointing to tree and blob' '
 	git update-ref refs/mytrees/first refs/heads/main^{tree} &&
 	git update-ref refs/myblobs/first refs/heads/main:one
@@ -720,6 +758,16 @@  test_atom refs/mytrees/first contents:body ""
 test_atom refs/mytrees/first contents:signature ""
 test_atom refs/mytrees/first contents ""
 
+test_expect_success 'basic atom: refs/mytrees/first raw' '
+	git cat-file tree refs/mytrees/first >expected &&
+	echo "" >>expected &&
+	git for-each-ref --format="%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/mytrees/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_atom refs/myblobs/first subject ""
 test_atom refs/myblobs/first contents:subject ""
 test_atom refs/myblobs/first body ""
@@ -727,6 +775,165 @@  test_atom refs/myblobs/first contents:body ""
 test_atom refs/myblobs/first contents:signature ""
 test_atom refs/myblobs/first contents ""
 
+test_expect_success 'basic atom: refs/myblobs/first raw' '
+	git cat-file blob refs/myblobs/first >expected &&
+	echo "" >>expected &&
+	git for-each-ref --format="%(raw)" refs/myblobs/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/myblobs/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/myblobs/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'set up refs pointing to binary blob' '
+	printf "a\0b\0c" >blob1 &&
+	printf "a\0c\0b" >blob2 &&
+	printf "\0a\0b\0c" >blob3 &&
+	printf "abc" >blob4 &&
+	printf "\0 \0 \0 " >blob5 &&
+	printf "\0 \0a\0 " >blob6 &&
+	printf "  " >blob7 &&
+	>blob8 &&
+	git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
+	git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
+	git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
+	git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
+	git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
+	git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
+	git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
+	git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
+'
+
+test_expect_success 'Verify sorts with raw' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob3
+	refs/myblobs/blob7
+	refs/mytrees/first
+	refs/myblobs/first
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob4
+	refs/heads/main
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'Verify sorts with raw:size' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/first
+	refs/myblobs/blob7
+	refs/heads/main
+	refs/myblobs/blob4
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/mytrees/first
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw:size \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'validate raw atom with %(if:equals)' '
+	cat >expected <<-EOF &&
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	refs/myblobs/blob4
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	EOF
+	git for-each-ref --format="%(if:equals=abc)%(raw)%(then)%(refname)%(else)not equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+test_expect_success 'validate raw atom with %(if:notequals)' '
+	cat >expected <<-EOF &&
+	refs/heads/ambiguous
+	refs/heads/main
+	refs/heads/newtag
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	equals
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob7
+	refs/myblobs/blob8
+	refs/myblobs/first
+	EOF
+	git for-each-ref --format="%(if:notequals=abc)%(raw)%(then)%(refname)%(else)equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'empty raw refs with %(if)' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob1 not empty
+	refs/myblobs/blob2 not empty
+	refs/myblobs/blob3 not empty
+	refs/myblobs/blob4 not empty
+	refs/myblobs/blob5 not empty
+	refs/myblobs/blob6 not empty
+	refs/myblobs/blob7 empty
+	refs/myblobs/blob8 empty
+	refs/myblobs/first not empty
+	EOF
+	git for-each-ref --format="%(refname) %(if)%(raw)%(then)not empty%(else)empty%(end)" \
+		refs/myblobs/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success '%(raw) with --python must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --python
+'
+
+test_expect_success '%(raw) with --tcl must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --tcl
+'
+
+test_expect_success '%(raw) with --perl must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --perl
+'
+
+test_expect_success '%(raw) with --shell must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --shell
+'
+
+test_expect_success '%(raw) with --shell and --sort=raw must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
+'
+
+test_expect_success '%(raw:size) with --shell' '
+	git for-each-ref --format="%(raw:size)" | while read line
+	do
+		echo "'\''$line'\''" >>expect
+	done &&
+	git for-each-ref --format="%(raw:size)" --shell >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'for-each-ref --format compare with cat-file --batch' '
+	git rev-parse refs/mytrees/first | git cat-file --batch >expected &&
+	git for-each-ref --format="%(objectname) %(objecttype) %(objectsize)
+%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'set up multiple-sort tags' '
 	for when in 100000 200000
 	do