Message ID | 4faf815b780218769520561ecf3abca384a2ee6c.1725951400.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | 04d9744f839dc90f27f08f94cc26f8bb33b3adfa |
Headers | show |
Series | ref-filter %(trailer) fixes | expand |
On Tue, Sep 10, 2024 at 08:57:15AM +0200, Patrick Steinhardt wrote: > When parsing `%(if)` atoms we expect a few other atoms to exist to > complete it, like `%(then)` and `%(end)`. Whether or not we have seen > these other atoms is tracked in an allocated `if_then_else` structure, > which gets free'd by the `if_then_else_handler()` once we have parsed > the complete conditional expression. > > This results in a memory leak when the `%(if)` atom is not terminated > correctly and thus incomplete. We never end up executing its handler and > thus don't end up freeing the structure. > > Plug this memory leak by introducing a new `at_end_data_free` callback > function. If set, we'll execute it in `pop_stack_element()` and pass it > the `at_end_data` variable with the intent to free its state. Wire it up > for the `%(if)` atom accordingly. Ah, thanks for explaining. The patch makes much more sense now. :) In particular, this: > @@ -1169,6 +1170,8 @@ static void pop_stack_element(struct ref_formatting_stack **stack) > if (prev) > strbuf_addbuf(&prev->output, ¤t->output); > strbuf_release(¤t->output); > + if (current->at_end_data_free) > + current->at_end_data_free(current->at_end_data); > free(current); > *stack = prev; > } which frees on pop, replaces the manual: > @@ -1228,15 +1231,13 @@ static void if_then_else_handler(struct ref_formatting_stack **stack) > } > > *stack = cur; > - free(if_then_else); > } free that was happening in the success case. I think putting this on top of my series makes sense. -Peff
Patrick Steinhardt <ps@pks.im> writes: > When parsing `%(if)` atoms we expect a few other atoms to exist to > complete it, like `%(then)` and `%(end)`. Whether or not we have seen > these other atoms is tracked in an allocated `if_then_else` structure, > which gets free'd by the `if_then_else_handler()` once we have parsed > the complete conditional expression. > > This results in a memory leak when the `%(if)` atom is not terminated > correctly and thus incomplete. We never end up executing its handler and > thus don't end up freeing the structure. > > Plug this memory leak by introducing a new `at_end_data_free` callback > function. If set, we'll execute it in `pop_stack_element()` and pass it > the `at_end_data` variable with the intent to free its state. Wire it up > for the `%(if)` atom accordingly. Sounds good. We diagnose unclosed "%(if)", report mismatch, and die() soon, so plugging this may more about "let's silence leak checker so that it can be more effective to help us find real leaks that matter", not "this is leaking proportionally to the size of the user data, and must be plugged". I see this code snippet (not touched by your patch): if (state.stack->prev) { pop_stack_element(&state.stack); return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing")); } and wonder how this handles the case where state.stack->prev->prev is also not NULL. Shouldn't it be looping while .prev is not NULL? e.g. diff --git c/ref-filter.c w/ref-filter.c index b06e18a569..d2040f5047 100644 --- c/ref-filter.c +++ w/ref-filter.c @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info, } } if (state.stack->prev) { - pop_stack_element(&state.stack); + while (state.stack->prev) + pop_stack_element(&state.stack); return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing")); } strbuf_addbuf(final_buf, &state.stack->output);
On Tue, Sep 10, 2024 at 09:48:32AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > When parsing `%(if)` atoms we expect a few other atoms to exist to > > complete it, like `%(then)` and `%(end)`. Whether or not we have seen > > these other atoms is tracked in an allocated `if_then_else` structure, > > which gets free'd by the `if_then_else_handler()` once we have parsed > > the complete conditional expression. > > > > This results in a memory leak when the `%(if)` atom is not terminated > > correctly and thus incomplete. We never end up executing its handler and > > thus don't end up freeing the structure. > > > > Plug this memory leak by introducing a new `at_end_data_free` callback > > function. If set, we'll execute it in `pop_stack_element()` and pass it > > the `at_end_data` variable with the intent to free its state. Wire it up > > for the `%(if)` atom accordingly. > > Sounds good. We diagnose unclosed "%(if)", report mismatch, and > die() soon, so plugging this may more about "let's silence leak > checker so that it can be more effective to help us find real leaks > that matter", not "this is leaking proportionally to the size of the > user data, and must be plugged". > > I see this code snippet (not touched by your patch): > > if (state.stack->prev) { > pop_stack_element(&state.stack); > return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing")); > } > > and wonder how this handles the case where state.stack->prev->prev > is also not NULL. Shouldn't it be looping while .prev is not NULL? > > e.g. > > diff --git c/ref-filter.c w/ref-filter.c > index b06e18a569..d2040f5047 100644 > --- c/ref-filter.c > +++ w/ref-filter.c > @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info, > } > } > if (state.stack->prev) { > - pop_stack_element(&state.stack); > + while (state.stack->prev) > + pop_stack_element(&state.stack); > return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing")); > } > strbuf_addbuf(final_buf, &state.stack->output); Hm. It certainly feels like we should do that. I couldn't construct a test case that fails with the leak sanitizer though. If it's a leak I'm sure I'll eventually hit it when I continue down the road headed towards leak-free-ness. Patrick
On Thu, Sep 12, 2024 at 12:22:16PM +0200, Patrick Steinhardt wrote: > > diff --git c/ref-filter.c w/ref-filter.c > > index b06e18a569..d2040f5047 100644 > > --- c/ref-filter.c > > +++ w/ref-filter.c > > @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info, > > } > > } > > if (state.stack->prev) { > > - pop_stack_element(&state.stack); > > + while (state.stack->prev) > > + pop_stack_element(&state.stack); > > return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing")); > > } > > strbuf_addbuf(final_buf, &state.stack->output); > > Hm. It certainly feels like we should do that. I couldn't construct a > test case that fails with the leak sanitizer though. If it's a leak I'm > sure I'll eventually hit it when I continue down the road headed towards > leak-free-ness. Hmm. I think just: ./git for-each-ref --format='%(if)%(then)%(if)%(then)%(if)%(then)' should trigger it, and running it in the debugger I can see that we exit the function with multiple entries. Valgrind claims the memory is still reachable, but I don't see how. The "state" variable is accessible only inside that function. The only thing we do after returning is die(). I wonder if it is a false negative because the stack is left undisturbed (especially because the compiler knows that die() does not return). At any rate, I think the same would apply to the earlier error returns: diff --git a/ref-filter.c b/ref-filter.c index b06e18a569..a339f0ab0f 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -3454,7 +3454,8 @@ int format_ref_array_item(struct ref_array_item *info, pos = parse_ref_filter_atom(format, sp + 2, ep, error_buf); if (pos < 0 || get_ref_atom_value(info, pos, &atomv, error_buf) || atomv->handler(atomv, &state, error_buf)) { - pop_stack_element(&state.stack); + while (state.stack->prev) + pop_stack_element(&state.stack); return -1; } } @@ -3466,7 +3467,8 @@ int format_ref_array_item(struct ref_array_item *info, struct atom_value resetv = ATOM_VALUE_INIT; resetv.s = GIT_COLOR_RESET; if (append_atom(&resetv, &state, error_buf)) { - pop_stack_element(&state.stack); + while (state.stack->prev) + pop_stack_element(&state.stack); return -1; } } I wasn't sure why the non-error code path wouldn't need the same, but it looks like there's some popping that happens in the various callbacks? I'm not very familiar with this code, and it's hard to follow the flow through the function pointers. All that said, I am content to leave it for now. Even if it's a real leak, it's one that happens once per program right before exiting with an error. Most of the value in cleaning up trivial leaks like that are to reduce the noise from analyzers so that we can find the much more important leaks that scale with the input. If the analyzers aren't complaining and we think it's trivial, it may not be worth spending a lot of time on. -Peff
On Thu, Sep 12, 2024 at 07:18:58AM -0400, Jeff King wrote: > All that said, I am content to leave it for now. Even if it's a real > leak, it's one that happens once per program right before exiting with > an error. Most of the value in cleaning up trivial leaks like that are > to reduce the noise from analyzers so that we can find the much more > important leaks that scale with the input. If the analyzers aren't > complaining and we think it's trivial, it may not be worth spending a > lot of time on. Agreed. Thanks for digging! Patrick
Jeff King <peff@peff.net> writes: > On Thu, Sep 12, 2024 at 12:22:16PM +0200, Patrick Steinhardt wrote: > >> > diff --git c/ref-filter.c w/ref-filter.c >> > index b06e18a569..d2040f5047 100644 >> > --- c/ref-filter.c >> > +++ w/ref-filter.c >> > @@ -3471,7 +3471,8 @@ int format_ref_array_item(struct ref_array_item *info, >> > } >> > } >> > if (state.stack->prev) { >> > - pop_stack_element(&state.stack); >> > + while (state.stack->prev) >> > + pop_stack_element(&state.stack); >> > return strbuf_addf_ret(error_buf, -1, _("format: %%(end) atom missing")); >> > } >> > strbuf_addbuf(final_buf, &state.stack->output); >> >> Hm. It certainly feels like we should do that. I couldn't construct a >> test case that fails with the leak sanitizer though. If it's a leak I'm >> sure I'll eventually hit it when I continue down the road headed towards >> leak-free-ness. > > Hmm. I think just: > > ./git for-each-ref --format='%(if)%(then)%(if)%(then)%(if)%(then)' > > should trigger it, and running it in the debugger I can see that we exit > the function with multiple entries. > > Valgrind claims the memory is still reachable, but I don't see how. The > "state" variable is accessible only inside that function. The only thing > we do after returning is die(). I wonder if it is a false negative > because the stack is left undisturbed (especially because the compiler > knows that die() does not return). Yup, the reason why I didn't add any test was because the leak checker failed to notice the apparent leak. > At any rate, I think the same would apply to the earlier error returns: > ... > All that said, I am content to leave it for now. Even if it's a real > leak, it's one that happens once per program right before exiting with > an error. Most of the value in cleaning up trivial leaks like that are > to reduce the noise from analyzers so that we can find the much more > important leaks that scale with the input. If the analyzers aren't > complaining and we think it's trivial, it may not be worth spending a > lot of time on. That is good to me, too.
diff --git a/ref-filter.c b/ref-filter.c index ce1bcfad857..b06e18a569a 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1001,6 +1001,7 @@ struct ref_formatting_stack { struct ref_formatting_stack *prev; struct strbuf output; void (*at_end)(struct ref_formatting_stack **stack); + void (*at_end_data_free)(void *data); void *at_end_data; }; @@ -1169,6 +1170,8 @@ static void pop_stack_element(struct ref_formatting_stack **stack) if (prev) strbuf_addbuf(&prev->output, ¤t->output); strbuf_release(¤t->output); + if (current->at_end_data_free) + current->at_end_data_free(current->at_end_data); free(current); *stack = prev; } @@ -1228,15 +1231,13 @@ static void if_then_else_handler(struct ref_formatting_stack **stack) } *stack = cur; - free(if_then_else); } static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state, struct strbuf *err UNUSED) { struct ref_formatting_stack *new_stack; - struct if_then_else *if_then_else = xcalloc(1, - sizeof(struct if_then_else)); + struct if_then_else *if_then_else = xcalloc(1, sizeof(*if_then_else)); if_then_else->str = atomv->atom->u.if_then_else.str; if_then_else->cmp_status = atomv->atom->u.if_then_else.cmp_status; @@ -1245,6 +1246,7 @@ static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state new_stack = state->stack; new_stack->at_end = if_then_else_handler; new_stack->at_end_data = if_then_else; + new_stack->at_end_data_free = free; return 0; } diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh index 163c378cfd1..7f44d3c3f22 100755 --- a/t/t6302-for-each-ref-filter.sh +++ b/t/t6302-for-each-ref-filter.sh @@ -2,6 +2,7 @@ test_description='test for-each-refs usage of ref-filter APIs' +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh . "$TEST_DIRECTORY"/lib-gpg.sh
When parsing `%(if)` atoms we expect a few other atoms to exist to complete it, like `%(then)` and `%(end)`. Whether or not we have seen these other atoms is tracked in an allocated `if_then_else` structure, which gets free'd by the `if_then_else_handler()` once we have parsed the complete conditional expression. This results in a memory leak when the `%(if)` atom is not terminated correctly and thus incomplete. We never end up executing its handler and thus don't end up freeing the structure. Plug this memory leak by introducing a new `at_end_data_free` callback function. If set, we'll execute it in `pop_stack_element()` and pass it the `at_end_data` variable with the intent to free its state. Wire it up for the `%(if)` atom accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- ref-filter.c | 8 +++++--- t/t6302-for-each-ref-filter.sh | 1 + 2 files changed, 6 insertions(+), 3 deletions(-)