diff mbox series

[3/9] ref-filter: strip signature when parsing tag trailers

Message ID 20240909231445.GC921834@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 99448c3d7824bb1d1092ce4f9b981c32d1951d3a
Headers show
Series ref-filter %(trailer) fixes | expand

Commit Message

Jeff King Sept. 9, 2024, 11:14 p.m. UTC
To expand the "%(trailers)" placeholder, we have to feed the commit or
tag body to the trailer API. But that API doesn't know anything about
signatures, and will be confused by a signed tag like this:

  the subject

  the body

  Some-trailer: foo
  -----BEGIN PGP SIGNATURE-----
  ...etc...

because it will start looking for trailers after the signature, and get
stopped walking backwards by the very non-trailer signature lines. So it
thinks there are no trailers.

This problem has existed since %(trailers) was added to the ref-filter
code, but back then trailers on tags weren't something we really
considered (commits don't have the same problem because their signatures
are embedded in the header). But since 066cef7707 (builtin/tag: add
--trailer option, 2024-05-05), we'd generate an object like the above
for "git tag -s --trailer 'Some-trailer: foo' my-tag".

The implementation here is pretty simple: we just make a NUL-terminated
copy of the non-signature part of the tag (which we've already parsed)
and pass it to the trailer API. There are some alternatives I rejected,
at least for now:

  - the trailer code already understands skipping past some cruft at the
    end of a commit, such as patch dividers. see find_end_of_log_message().
    We could teach it to do the same for signatures. But since this is
    the only context where we'd want that feature, and since we've already
    parsed the object into subject/body/signature here, it seemed easier
    to just pass in the truncated message.

  - it would be nice if we could just pass in a pointer/len pair to the
    trailer API (rather than a NUL-terminated string) to avoid the extra
    copy. I think this is possible, since as noted above, the trailer
    code already has to deal with ignoring some cruft at the end of the
    input. But after an initial attempt at this, it got pretty messy, as
    we have to touch a lot of intermediate functions that are also
    called in other contexts.

    So I went for the simple and stupid thing, at least for now. I don't
    think the extra copy overhead will be all that bad. The previous
    patch noted that an extra copy seemed to cause about 1-2% slowdown
    for something simple like "%(subject)". But here we are only
    triggering it for "%(trailers)" (and only when there is a
    signature), and the trailer code is a bit allocation-heavy already.
    I couldn't measure any difference formatting "%(trailers)" on
    linux.git before and after (even though there are not even any
    trailers to find).

Reported-by: Brooke Kuhlmann <brooke@alchemists.io>
Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c            | 10 +++++++++-
 t/t6300-for-each-ref.sh | 18 ++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

Comments

Patrick Steinhardt Sept. 10, 2024, 6:08 a.m. UTC | #1
On Mon, Sep 09, 2024 at 07:14:45PM -0400, Jeff King wrote:
> The implementation here is pretty simple: we just make a NUL-terminated
> copy of the non-signature part of the tag (which we've already parsed)
> and pass it to the trailer API. There are some alternatives I rejected,
> at least for now:
> 
>   - the trailer code already understands skipping past some cruft at the
>     end of a commit, such as patch dividers. see find_end_of_log_message().

s/./,

>     We could teach it to do the same for signatures. But since this is
>     the only context where we'd want that feature, and since we've already
>     parsed the object into subject/body/signature here, it seemed easier
>     to just pass in the truncated message.
> 
>   - it would be nice if we could just pass in a pointer/len pair to the

s/it/It

> diff --git a/ref-filter.c b/ref-filter.c
> index 0f5513ba7e..e39f505a81 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2008,9 +2008,17 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
>  			v->s = strbuf_detach(&s, NULL);
>  		} else if (atom->u.contents.option == C_TRAILERS) {
>  			struct strbuf s = STRBUF_INIT;
> +			const char *msg;
> +			char *to_free = NULL;
> +
> +			if (siglen)
> +				msg = to_free = xmemdupz(subpos, sigpos - subpos);
> +			else
> +				msg = subpos;
>  
>  			/* Format the trailer info according to the trailer_opts given */
> -			format_trailers_from_commit(&atom->u.contents.trailer_opts, subpos, &s);
> +			format_trailers_from_commit(&atom->u.contents.trailer_opts, msg, &s);
> +			free(to_free);
>  
>  			v->s = strbuf_detach(&s, NULL);
>  		} else if (atom->u.contents.option == C_BARE)

I've been surprised that we use `subpos` as the starting point here,
which includes the whole message including its subject. I would have
thought that it was sufficient to only pass the message body as input,
which saves allocating some bytes. At least `trailer_info_get()` does
not seem to care about the subject at all.

In any case this would be a micro-optimization anyway, it just left me
scratching my head for a second or two.

Patrick
Jeff King Sept. 10, 2024, 6:28 a.m. UTC | #2
On Tue, Sep 10, 2024 at 08:08:55AM +0200, Patrick Steinhardt wrote:

> > diff --git a/ref-filter.c b/ref-filter.c
> > index 0f5513ba7e..e39f505a81 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2008,9 +2008,17 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
> >  			v->s = strbuf_detach(&s, NULL);
> >  		} else if (atom->u.contents.option == C_TRAILERS) {
> >  			struct strbuf s = STRBUF_INIT;
> > +			const char *msg;
> > +			char *to_free = NULL;
> > +
> > +			if (siglen)
> > +				msg = to_free = xmemdupz(subpos, sigpos - subpos);
> > +			else
> > +				msg = subpos;
> >  
> >  			/* Format the trailer info according to the trailer_opts given */
> > -			format_trailers_from_commit(&atom->u.contents.trailer_opts, subpos, &s);
> > +			format_trailers_from_commit(&atom->u.contents.trailer_opts, msg, &s);
> > +			free(to_free);
> >  
> >  			v->s = strbuf_detach(&s, NULL);
> >  		} else if (atom->u.contents.option == C_BARE)
> 
> I've been surprised that we use `subpos` as the starting point here,
> which includes the whole message including its subject. I would have
> thought that it was sufficient to only pass the message body as input,
> which saves allocating some bytes. At least `trailer_info_get()` does
> not seem to care about the subject at all.
> 
> In any case this would be a micro-optimization anyway, it just left me
> scratching my head for a second or two.

Yeah, I suspect it would be fine to start at "bodypos" instead (and then
you could just use "nonsiglen" directly as the length). But there may be
corner cases for instances where there's no subject/body at all (though
having _just_ trailers feels weird).

At any rate, I was just following the existing code, which passed in the
whole contents starting from subpos, to avoid any unexpected changes.

-Peff
diff mbox series

Patch

diff --git a/ref-filter.c b/ref-filter.c
index 0f5513ba7e..e39f505a81 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2008,9 +2008,17 @@  static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
 			v->s = strbuf_detach(&s, NULL);
 		} else if (atom->u.contents.option == C_TRAILERS) {
 			struct strbuf s = STRBUF_INIT;
+			const char *msg;
+			char *to_free = NULL;
+
+			if (siglen)
+				msg = to_free = xmemdupz(subpos, sigpos - subpos);
+			else
+				msg = subpos;
 
 			/* Format the trailer info according to the trailer_opts given */
-			format_trailers_from_commit(&atom->u.contents.trailer_opts, subpos, &s);
+			format_trailers_from_commit(&atom->u.contents.trailer_opts, msg, &s);
+			free(to_free);
 
 			v->s = strbuf_detach(&s, NULL);
 		} else if (atom->u.contents.option == C_BARE)
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 7c208e20a6..b830b542c4 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1835,6 +1835,24 @@  sig_crlf="$(printf "%s" "$sig" | append_cr; echo dummy)"
 sig_crlf=${sig_crlf%dummy}
 test_atom refs/tags/fake-sig-crlf contents:signature "$sig_crlf"
 
+test_expect_success 'set up tag with signature and trailers' '
+	git tag -F - fake-sig-trailer <<-\EOF
+	this is the subject
+
+	this is the body
+
+	My-Trailer: foo
+	-----BEGIN PGP SIGNATURE-----
+
+	not a real signature, but we just care about the
+	subject/body/trailer parsing.
+	-----END PGP SIGNATURE-----
+	EOF
+'
+
+# use "separator=" here to suppress the terminating newline
+test_atom refs/tags/fake-sig-trailer trailers:separator= 'My-Trailer: foo'
+
 test_expect_success 'git for-each-ref --stdin: empty' '
 	>in &&
 	git for-each-ref --format="%(refname)" --stdin <in >actual &&