[10/9] ref-filter: fix leak with unterminated %(if) atoms

Message ID	4faf815b780218769520561ecf3abca384a2ee6c.1725951400.git.ps@pks.im (mailing list archive)
State	Accepted
Commit	04d9744f839dc90f27f08f94cc26f8bb33b3adfa
Headers	show Received: from fhigh5-smtp.messagingengine.com (fhigh5-smtp.messagingengine.com [103.168.172.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B35217B427 for <git@vger.kernel.org>; Tue, 10 Sep 2024 06:57:20 +0000 (UTC) Feedback-ID: i197146af:Fastmail Date: Tue, 10 Sep 2024 08:57:15 +0200 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: git@vger.kernel.org, Brooke Kuhlmann <brooke@alchemists.io> Subject: [PATCH 10/9] ref-filter: fix leak with unterminated %(if) atoms Message-ID: <4faf815b780218769520561ecf3abca384a2ee6c.1725951400.git.ps@pks.im> References: <20240909230758.GA921697@coredump.intra.peff.net> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240909230758.GA921697@coredump.intra.peff.net>
Series	ref-filter %(trailer) fixes \| expand [0/9] ref-filter %(trailer) fixes [1/9] t6300: drop newline from wrapped test title [2/9] ref-filter: avoid extra copies of payload/signature [3/9] ref-filter: strip signature when parsing tag trailers [4/9] ref-filter: drop useless cast in trailers_atom_parser() [5/9] ref-filter: store ref_trailer_buf data per-atom [6/9] ref-filter: fix leak of %(trailers) "argbuf" [7/9] ref-filter: fix leak with %(describe) arguments [8/9] ref-filter: fix leak when formatting %(push:remoteref) [9/9] ref-filter: add ref_format_clear() function [10/9] ref-filter: fix leak with unterminated %(if) atoms

Patrick Steinhardt Sept. 10, 2024, 6:57 a.m. UTC

When parsing `%(if)` atoms we expect a few other atoms to exist to
complete it, like `%(then)` and `%(end)`. Whether or not we have seen
these other atoms is tracked in an allocated `if_then_else` structure,
which gets free'd by the `if_then_else_handler()` once we have parsed
the complete conditional expression.

This results in a memory leak when the `%(if)` atom is not terminated
correctly and thus incomplete. We never end up executing its handler and
thus don't end up freeing the structure.

Plug this memory leak by introducing a new `at_end_data_free` callback
function. If set, we'll execute it in `pop_stack_element()` and pass it
the `at_end_data` variable with the intent to free its state. Wire it up
for the `%(if)` atom accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 ref-filter.c                   | 8 +++++---
 t/t6302-for-each-ref-filter.sh | 1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

Jeff King Sept. 10, 2024, 7:12 a.m. UTC | #1

On Tue, Sep 10, 2024 at 08:57:15AM +0200, Patrick Steinhardt wrote:

> When parsing `%(if)` atoms we expect a few other atoms to exist to
> complete it, like `%(then)` and `%(end)`. Whether or not we have seen
> these other atoms is tracked in an allocated `if_then_else` structure,
> which gets free'd by the `if_then_else_handler()` once we have parsed
> the complete conditional expression.
> 
> This results in a memory leak when the `%(if)` atom is not terminated
> correctly and thus incomplete. We never end up executing its handler and
> thus don't end up freeing the structure.
> 
> Plug this memory leak by introducing a new `at_end_data_free` callback
> function. If set, we'll execute it in `pop_stack_element()` and pass it
> the `at_end_data` variable with the intent to free its state. Wire it up
> for the `%(if)` atom accordingly.

Ah, thanks for explaining. The patch makes much more sense now. :)

In particular, this:

> @@ -1169,6 +1170,8 @@ static void pop_stack_element(struct ref_formatting_stack **stack)
>  	if (prev)
>  		strbuf_addbuf(&prev->output, &current->output);
>  	strbuf_release(&current->output);
> +	if (current->at_end_data_free)
> +		current->at_end_data_free(current->at_end_data);
>  	free(current);
>  	*stack = prev;
>  }

which frees on pop, replaces the manual:

> @@ -1228,15 +1231,13 @@ static void if_then_else_handler(struct ref_formatting_stack **stack)
>  	}
>  
>  	*stack = cur;
> -	free(if_then_else);
>  }

free that was happening in the success case.

I think putting this on top of my series makes sense.

-Peff

Junio C Hamano Sept. 10, 2024, 4:48 p.m. UTC | #2

Patrick Steinhardt <ps@pks.im> writes:

> When parsing `%(if)` atoms we expect a few other atoms to exist to
> complete it, like `%(then)` and `%(end)`. Whether or not we have seen
> these other atoms is tracked in an allocated `if_then_else` structure,
> which gets free'd by the `if_then_else_handler()` once we have parsed
> the complete conditional expression.
>
> This results in a memory leak when the `%(if)` atom is not terminated
> correctly and thus incomplete. We never end up executing its handler and
> thus don't end up freeing the structure.
>
> Plug this memory leak by introducing a new `at_end_data_free` callback
> function. If set, we'll execute it in `pop_stack_element()` and pass it
> the `at_end_data` variable with the intent to free its state. Wire it up
> for the `%(if)` atom accordingly.

Sounds good.  We diagnose unclosed "%(if)", report mismatch, and
die() soon, so plugging this may more about "let's silence leak
checker so that it can be more effective to help us find real leaks
that matter", not "this is leaking proportionally to the size of the
user data, and must be plugged".

I see this code snippet (not touched by your patch):

	if (state.stack->prev) {
		pop_stack_element(&state.stack);
		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
	}

and wonder how this handles the case where state.stack->prev->prev
is also not NULL.  Shouldn't it be looping while .prev is not NULL?

e.g.

diff --git c/ref-filter.c w/ref-filter.c
index b06e18a569..d2040f5047 100644
--- c/ref-filter.c
+++ w/ref-filter.c
@@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info,
 		}
 	}
 	if (state.stack->prev) {
-		pop_stack_element(&state.stack);
+		while (state.stack->prev)
+			pop_stack_element(&state.stack);
 		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
 	}
 	strbuf_addbuf(final_buf, &state.stack->output);

Patrick Steinhardt Sept. 12, 2024, 10:22 a.m. UTC | #3

On Tue, Sep 10, 2024 at 09:48:32AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > When parsing `%(if)` atoms we expect a few other atoms to exist to
> > complete it, like `%(then)` and `%(end)`. Whether or not we have seen
> > these other atoms is tracked in an allocated `if_then_else` structure,
> > which gets free'd by the `if_then_else_handler()` once we have parsed
> > the complete conditional expression.
> >
> > This results in a memory leak when the `%(if)` atom is not terminated
> > correctly and thus incomplete. We never end up executing its handler and
> > thus don't end up freeing the structure.
> >
> > Plug this memory leak by introducing a new `at_end_data_free` callback
> > function. If set, we'll execute it in `pop_stack_element()` and pass it
> > the `at_end_data` variable with the intent to free its state. Wire it up
> > for the `%(if)` atom accordingly.
> 
> Sounds good.  We diagnose unclosed "%(if)", report mismatch, and
> die() soon, so plugging this may more about "let's silence leak
> checker so that it can be more effective to help us find real leaks
> that matter", not "this is leaking proportionally to the size of the
> user data, and must be plugged".
> 
> I see this code snippet (not touched by your patch):
> 
> 	if (state.stack->prev) {
> 		pop_stack_element(&state.stack);
> 		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
> 	}
> 
> and wonder how this handles the case where state.stack->prev->prev
> is also not NULL.  Shouldn't it be looping while .prev is not NULL?
> 
> e.g.
> 
> diff --git c/ref-filter.c w/ref-filter.c
> index b06e18a569..d2040f5047 100644
> --- c/ref-filter.c
> +++ w/ref-filter.c
> @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info,
>  		}
>  	}
>  	if (state.stack->prev) {
> -		pop_stack_element(&state.stack);
> +		while (state.stack->prev)
> +			pop_stack_element(&state.stack);
>  		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
>  	}
>  	strbuf_addbuf(final_buf, &state.stack->output);

Hm. It certainly feels like we should do that. I couldn't construct a
test case that fails with the leak sanitizer though. If it's a leak I'm
sure I'll eventually hit it when I continue down the road headed towards
leak-free-ness.

Patrick

Jeff King Sept. 12, 2024, 11:18 a.m. UTC | #4

On Thu, Sep 12, 2024 at 12:22:16PM +0200, Patrick Steinhardt wrote:

> > diff --git c/ref-filter.c w/ref-filter.c
> > index b06e18a569..d2040f5047 100644
> > --- c/ref-filter.c
> > +++ w/ref-filter.c
> > @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info,
> >  		}
> >  	}
> >  	if (state.stack->prev) {
> > -		pop_stack_element(&state.stack);
> > +		while (state.stack->prev)
> > +			pop_stack_element(&state.stack);
> >  		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
> >  	}
> >  	strbuf_addbuf(final_buf, &state.stack->output);
> 
> Hm. It certainly feels like we should do that. I couldn't construct a
> test case that fails with the leak sanitizer though. If it's a leak I'm
> sure I'll eventually hit it when I continue down the road headed towards
> leak-free-ness.

Hmm. I think just:

  ./git for-each-ref --format='%(if)%(then)%(if)%(then)%(if)%(then)'

should trigger it, and running it in the debugger I can see that we exit
the function with multiple entries.

Valgrind claims the memory is still reachable, but I don't see how. The
"state" variable is accessible only inside that function. The only thing
we do after returning is die(). I wonder if it is a false negative
because the stack is left undisturbed (especially because the compiler
knows that die() does not return).

At any rate, I think the same would apply to the earlier error returns:

diff --git a/ref-filter.c b/ref-filter.c
index b06e18a569..a339f0ab0f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3454,7 +3454,8 @@ int format_ref_array_item(struct ref_array_item *info,
 		pos = parse_ref_filter_atom(format, sp + 2, ep, error_buf);
 		if (pos < 0 || get_ref_atom_value(info, pos, &atomv, error_buf) ||
 		    atomv->handler(atomv, &state, error_buf)) {
-			pop_stack_element(&state.stack);
+			while (state.stack->prev)
+				pop_stack_element(&state.stack);
 			return -1;
 		}
 	}
@@ -3466,7 +3467,8 @@ int format_ref_array_item(struct ref_array_item *info,
 		struct atom_value resetv = ATOM_VALUE_INIT;
 		resetv.s = GIT_COLOR_RESET;
 		if (append_atom(&resetv, &state, error_buf)) {
-			pop_stack_element(&state.stack);
+			while (state.stack->prev)
+				pop_stack_element(&state.stack);
 			return -1;
 		}
 	}

I wasn't sure why the non-error code path wouldn't need the same, but it
looks like there's some popping that happens in the various callbacks?
I'm not very familiar with this code, and it's hard to follow the flow
through the function pointers.

All that said, I am content to leave it for now. Even if it's a real
leak, it's one that happens once per program right before exiting with
an error. Most of the value in cleaning up trivial leaks like that are
to reduce the noise from analyzers so that we can find the much more
important leaks that scale with the input. If the analyzers aren't
complaining and we think it's trivial, it may not be worth spending a
lot of time on.

-Peff

Patrick Steinhardt Sept. 12, 2024, 11:32 a.m. UTC | #5

On Thu, Sep 12, 2024 at 07:18:58AM -0400, Jeff King wrote:
> All that said, I am content to leave it for now. Even if it's a real
> leak, it's one that happens once per program right before exiting with
> an error. Most of the value in cleaning up trivial leaks like that are
> to reduce the noise from analyzers so that we can find the much more
> important leaks that scale with the input. If the analyzers aren't
> complaining and we think it's trivial, it may not be worth spending a
> lot of time on.

Agreed. Thanks for digging!

Patrick

Junio C Hamano Sept. 12, 2024, 8:24 p.m. UTC | #6

Jeff King <peff@peff.net> writes:

> On Thu, Sep 12, 2024 at 12:22:16PM +0200, Patrick Steinhardt wrote:
>
>> > diff --git c/ref-filter.c w/ref-filter.c
>> > index b06e18a569..d2040f5047 100644
>> > --- c/ref-filter.c
>> > +++ w/ref-filter.c
>> > @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info,
>> >  		}
>> >  	}
>> >  	if (state.stack->prev) {
>> > -		pop_stack_element(&state.stack);
>> > +		while (state.stack->prev)
>> > +			pop_stack_element(&state.stack);
>> >  		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
>> >  	}
>> >  	strbuf_addbuf(final_buf, &state.stack->output);
>> 
>> Hm. It certainly feels like we should do that. I couldn't construct a
>> test case that fails with the leak sanitizer though. If it's a leak I'm
>> sure I'll eventually hit it when I continue down the road headed towards
>> leak-free-ness.
>
> Hmm. I think just:
>
>   ./git for-each-ref --format='%(if)%(then)%(if)%(then)%(if)%(then)'
>
> should trigger it, and running it in the debugger I can see that we exit
> the function with multiple entries.
>
> Valgrind claims the memory is still reachable, but I don't see how. The
> "state" variable is accessible only inside that function. The only thing
> we do after returning is die(). I wonder if it is a false negative
> because the stack is left undisturbed (especially because the compiler
> knows that die() does not return).

Yup, the reason why I didn't add any test was because the leak
checker failed to notice the apparent leak.

> At any rate, I think the same would apply to the earlier error returns:
> ...
> All that said, I am content to leave it for now. Even if it's a real
> leak, it's one that happens once per program right before exiting with
> an error. Most of the value in cleaning up trivial leaks like that are
> to reduce the noise from analyzers so that we can find the much more
> important leaks that scale with the input. If the analyzers aren't
> complaining and we think it's trivial, it may not be worth spending a
> lot of time on.

That is good to me, too.

[10/9] ref-filter: fix leak with unterminated %(if) atoms

Commit Message

Comments

Patch