diff mbox series

[10/9] ref-filter: fix leak with unterminated %(if) atoms

Message ID 4faf815b780218769520561ecf3abca384a2ee6c.1725951400.git.ps@pks.im (mailing list archive)
State Accepted
Commit 04d9744f839dc90f27f08f94cc26f8bb33b3adfa
Headers show
Series ref-filter %(trailer) fixes | expand

Commit Message

Patrick Steinhardt Sept. 10, 2024, 6:57 a.m. UTC
When parsing `%(if)` atoms we expect a few other atoms to exist to
complete it, like `%(then)` and `%(end)`. Whether or not we have seen
these other atoms is tracked in an allocated `if_then_else` structure,
which gets free'd by the `if_then_else_handler()` once we have parsed
the complete conditional expression.

This results in a memory leak when the `%(if)` atom is not terminated
correctly and thus incomplete. We never end up executing its handler and
thus don't end up freeing the structure.

Plug this memory leak by introducing a new `at_end_data_free` callback
function. If set, we'll execute it in `pop_stack_element()` and pass it
the `at_end_data` variable with the intent to free its state. Wire it up
for the `%(if)` atom accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 ref-filter.c                   | 8 +++++---
 t/t6302-for-each-ref-filter.sh | 1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

Comments

Jeff King Sept. 10, 2024, 7:12 a.m. UTC | #1
On Tue, Sep 10, 2024 at 08:57:15AM +0200, Patrick Steinhardt wrote:

> When parsing `%(if)` atoms we expect a few other atoms to exist to
> complete it, like `%(then)` and `%(end)`. Whether or not we have seen
> these other atoms is tracked in an allocated `if_then_else` structure,
> which gets free'd by the `if_then_else_handler()` once we have parsed
> the complete conditional expression.
> 
> This results in a memory leak when the `%(if)` atom is not terminated
> correctly and thus incomplete. We never end up executing its handler and
> thus don't end up freeing the structure.
> 
> Plug this memory leak by introducing a new `at_end_data_free` callback
> function. If set, we'll execute it in `pop_stack_element()` and pass it
> the `at_end_data` variable with the intent to free its state. Wire it up
> for the `%(if)` atom accordingly.

Ah, thanks for explaining. The patch makes much more sense now. :)

In particular, this:

> @@ -1169,6 +1170,8 @@ static void pop_stack_element(struct ref_formatting_stack **stack)
>  	if (prev)
>  		strbuf_addbuf(&prev->output, &current->output);
>  	strbuf_release(&current->output);
> +	if (current->at_end_data_free)
> +		current->at_end_data_free(current->at_end_data);
>  	free(current);
>  	*stack = prev;
>  }

which frees on pop, replaces the manual:

> @@ -1228,15 +1231,13 @@ static void if_then_else_handler(struct ref_formatting_stack **stack)
>  	}
>  
>  	*stack = cur;
> -	free(if_then_else);
>  }

free that was happening in the success case.

I think putting this on top of my series makes sense.

-Peff
Junio C Hamano Sept. 10, 2024, 4:48 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> When parsing `%(if)` atoms we expect a few other atoms to exist to
> complete it, like `%(then)` and `%(end)`. Whether or not we have seen
> these other atoms is tracked in an allocated `if_then_else` structure,
> which gets free'd by the `if_then_else_handler()` once we have parsed
> the complete conditional expression.
>
> This results in a memory leak when the `%(if)` atom is not terminated
> correctly and thus incomplete. We never end up executing its handler and
> thus don't end up freeing the structure.
>
> Plug this memory leak by introducing a new `at_end_data_free` callback
> function. If set, we'll execute it in `pop_stack_element()` and pass it
> the `at_end_data` variable with the intent to free its state. Wire it up
> for the `%(if)` atom accordingly.

Sounds good.  We diagnose unclosed "%(if)", report mismatch, and
die() soon, so plugging this may more about "let's silence leak
checker so that it can be more effective to help us find real leaks
that matter", not "this is leaking proportionally to the size of the
user data, and must be plugged".

I see this code snippet (not touched by your patch):

	if (state.stack->prev) {
		pop_stack_element(&state.stack);
		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
	}

and wonder how this handles the case where state.stack->prev->prev
is also not NULL.  Shouldn't it be looping while .prev is not NULL?

e.g.

diff --git c/ref-filter.c w/ref-filter.c
index b06e18a569..d2040f5047 100644
--- c/ref-filter.c
+++ w/ref-filter.c
@@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info,
 		}
 	}
 	if (state.stack->prev) {
-		pop_stack_element(&state.stack);
+		while (state.stack->prev)
+			pop_stack_element(&state.stack);
 		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
 	}
 	strbuf_addbuf(final_buf, &state.stack->output);
Patrick Steinhardt Sept. 12, 2024, 10:22 a.m. UTC | #3
On Tue, Sep 10, 2024 at 09:48:32AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > When parsing `%(if)` atoms we expect a few other atoms to exist to
> > complete it, like `%(then)` and `%(end)`. Whether or not we have seen
> > these other atoms is tracked in an allocated `if_then_else` structure,
> > which gets free'd by the `if_then_else_handler()` once we have parsed
> > the complete conditional expression.
> >
> > This results in a memory leak when the `%(if)` atom is not terminated
> > correctly and thus incomplete. We never end up executing its handler and
> > thus don't end up freeing the structure.
> >
> > Plug this memory leak by introducing a new `at_end_data_free` callback
> > function. If set, we'll execute it in `pop_stack_element()` and pass it
> > the `at_end_data` variable with the intent to free its state. Wire it up
> > for the `%(if)` atom accordingly.
> 
> Sounds good.  We diagnose unclosed "%(if)", report mismatch, and
> die() soon, so plugging this may more about "let's silence leak
> checker so that it can be more effective to help us find real leaks
> that matter", not "this is leaking proportionally to the size of the
> user data, and must be plugged".
> 
> I see this code snippet (not touched by your patch):
> 
> 	if (state.stack->prev) {
> 		pop_stack_element(&state.stack);
> 		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
> 	}
> 
> and wonder how this handles the case where state.stack->prev->prev
> is also not NULL.  Shouldn't it be looping while .prev is not NULL?
> 
> e.g.
> 
> diff --git c/ref-filter.c w/ref-filter.c
> index b06e18a569..d2040f5047 100644
> --- c/ref-filter.c
> +++ w/ref-filter.c
> @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info,
>  		}
>  	}
>  	if (state.stack->prev) {
> -		pop_stack_element(&state.stack);
> +		while (state.stack->prev)
> +			pop_stack_element(&state.stack);
>  		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
>  	}
>  	strbuf_addbuf(final_buf, &state.stack->output);

Hm. It certainly feels like we should do that. I couldn't construct a
test case that fails with the leak sanitizer though. If it's a leak I'm
sure I'll eventually hit it when I continue down the road headed towards
leak-free-ness.

Patrick
Jeff King Sept. 12, 2024, 11:18 a.m. UTC | #4
On Thu, Sep 12, 2024 at 12:22:16PM +0200, Patrick Steinhardt wrote:

> > diff --git c/ref-filter.c w/ref-filter.c
> > index b06e18a569..d2040f5047 100644
> > --- c/ref-filter.c
> > +++ w/ref-filter.c
> > @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info,
> >  		}
> >  	}
> >  	if (state.stack->prev) {
> > -		pop_stack_element(&state.stack);
> > +		while (state.stack->prev)
> > +			pop_stack_element(&state.stack);
> >  		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
> >  	}
> >  	strbuf_addbuf(final_buf, &state.stack->output);
> 
> Hm. It certainly feels like we should do that. I couldn't construct a
> test case that fails with the leak sanitizer though. If it's a leak I'm
> sure I'll eventually hit it when I continue down the road headed towards
> leak-free-ness.

Hmm. I think just:

  ./git for-each-ref --format='%(if)%(then)%(if)%(then)%(if)%(then)'

should trigger it, and running it in the debugger I can see that we exit
the function with multiple entries.

Valgrind claims the memory is still reachable, but I don't see how. The
"state" variable is accessible only inside that function. The only thing
we do after returning is die(). I wonder if it is a false negative
because the stack is left undisturbed (especially because the compiler
knows that die() does not return).

At any rate, I think the same would apply to the earlier error returns:

diff --git a/ref-filter.c b/ref-filter.c
index b06e18a569..a339f0ab0f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3454,7 +3454,8 @@ int format_ref_array_item(struct ref_array_item *info,
 		pos = parse_ref_filter_atom(format, sp + 2, ep, error_buf);
 		if (pos < 0 || get_ref_atom_value(info, pos, &atomv, error_buf) ||
 		    atomv->handler(atomv, &state, error_buf)) {
-			pop_stack_element(&state.stack);
+			while (state.stack->prev)
+				pop_stack_element(&state.stack);
 			return -1;
 		}
 	}
@@ -3466,7 +3467,8 @@ int format_ref_array_item(struct ref_array_item *info,
 		struct atom_value resetv = ATOM_VALUE_INIT;
 		resetv.s = GIT_COLOR_RESET;
 		if (append_atom(&resetv, &state, error_buf)) {
-			pop_stack_element(&state.stack);
+			while (state.stack->prev)
+				pop_stack_element(&state.stack);
 			return -1;
 		}
 	}

I wasn't sure why the non-error code path wouldn't need the same, but it
looks like there's some popping that happens in the various callbacks?
I'm not very familiar with this code, and it's hard to follow the flow
through the function pointers.

All that said, I am content to leave it for now. Even if it's a real
leak, it's one that happens once per program right before exiting with
an error. Most of the value in cleaning up trivial leaks like that are
to reduce the noise from analyzers so that we can find the much more
important leaks that scale with the input. If the analyzers aren't
complaining and we think it's trivial, it may not be worth spending a
lot of time on.

-Peff
Patrick Steinhardt Sept. 12, 2024, 11:32 a.m. UTC | #5
On Thu, Sep 12, 2024 at 07:18:58AM -0400, Jeff King wrote:
> All that said, I am content to leave it for now. Even if it's a real
> leak, it's one that happens once per program right before exiting with
> an error. Most of the value in cleaning up trivial leaks like that are
> to reduce the noise from analyzers so that we can find the much more
> important leaks that scale with the input. If the analyzers aren't
> complaining and we think it's trivial, it may not be worth spending a
> lot of time on.

Agreed. Thanks for digging!

Patrick
Junio C Hamano Sept. 12, 2024, 8:24 p.m. UTC | #6
Jeff King <peff@peff.net> writes:

> On Thu, Sep 12, 2024 at 12:22:16PM +0200, Patrick Steinhardt wrote:
>
>> > diff --git c/ref-filter.c w/ref-filter.c
>> > index b06e18a569..d2040f5047 100644
>> > --- c/ref-filter.c
>> > +++ w/ref-filter.c
>> > @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info,
>> >  		}
>> >  	}
>> >  	if (state.stack->prev) {
>> > -		pop_stack_element(&state.stack);
>> > +		while (state.stack->prev)
>> > +			pop_stack_element(&state.stack);
>> >  		return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing"));
>> >  	}
>> >  	strbuf_addbuf(final_buf, &state.stack->output);
>> 
>> Hm. It certainly feels like we should do that. I couldn't construct a
>> test case that fails with the leak sanitizer though. If it's a leak I'm
>> sure I'll eventually hit it when I continue down the road headed towards
>> leak-free-ness.
>
> Hmm. I think just:
>
>   ./git for-each-ref --format='%(if)%(then)%(if)%(then)%(if)%(then)'
>
> should trigger it, and running it in the debugger I can see that we exit
> the function with multiple entries.
>
> Valgrind claims the memory is still reachable, but I don't see how. The
> "state" variable is accessible only inside that function. The only thing
> we do after returning is die(). I wonder if it is a false negative
> because the stack is left undisturbed (especially because the compiler
> knows that die() does not return).

Yup, the reason why I didn't add any test was because the leak
checker failed to notice the apparent leak.

> At any rate, I think the same would apply to the earlier error returns:
> ...
> All that said, I am content to leave it for now. Even if it's a real
> leak, it's one that happens once per program right before exiting with
> an error. Most of the value in cleaning up trivial leaks like that are
> to reduce the noise from analyzers so that we can find the much more
> important leaks that scale with the input. If the analyzers aren't
> complaining and we think it's trivial, it may not be worth spending a
> lot of time on.

That is good to me, too.
diff mbox series

Patch

diff --git a/ref-filter.c b/ref-filter.c
index ce1bcfad857..b06e18a569a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1001,6 +1001,7 @@  struct ref_formatting_stack {
 	struct ref_formatting_stack *prev;
 	struct strbuf output;
 	void (*at_end)(struct ref_formatting_stack **stack);
+	void (*at_end_data_free)(void *data);
 	void *at_end_data;
 };
 
@@ -1169,6 +1170,8 @@  static void pop_stack_element(struct ref_formatting_stack **stack)
 	if (prev)
 		strbuf_addbuf(&prev->output, &current->output);
 	strbuf_release(&current->output);
+	if (current->at_end_data_free)
+		current->at_end_data_free(current->at_end_data);
 	free(current);
 	*stack = prev;
 }
@@ -1228,15 +1231,13 @@  static void if_then_else_handler(struct ref_formatting_stack **stack)
 	}
 
 	*stack = cur;
-	free(if_then_else);
 }
 
 static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state,
 			   struct strbuf *err UNUSED)
 {
 	struct ref_formatting_stack *new_stack;
-	struct if_then_else *if_then_else = xcalloc(1,
-						    sizeof(struct if_then_else));
+	struct if_then_else *if_then_else = xcalloc(1, sizeof(*if_then_else));
 
 	if_then_else->str = atomv->atom->u.if_then_else.str;
 	if_then_else->cmp_status = atomv->atom->u.if_then_else.cmp_status;
@@ -1245,6 +1246,7 @@  static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state
 	new_stack = state->stack;
 	new_stack->at_end = if_then_else_handler;
 	new_stack->at_end_data = if_then_else;
+	new_stack->at_end_data_free = free;
 	return 0;
 }
 
diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index 163c378cfd1..7f44d3c3f22 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -2,6 +2,7 @@ 
 
 test_description='test for-each-refs usage of ref-filter APIs'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-gpg.sh