diff mbox series

revision: fix missing null for freed memory

Message ID 20250208061702.88469-1-forivall@gmail.com (mailing list archive)
State New
Headers show
Series revision: fix missing null for freed memory | expand

Commit Message

Emily Klassen Feb. 8, 2025, 6:17 a.m. UTC
"git log --graph --no-graph" missed cleaning up the output_prefix and
output_prefix_data pointers. This resulted in a segfault when using "--patch",
"--name-status" or "--name-only", as the output_prefix_data continued to be in
use after free()

Signed-off-by: Emily M Klassen <forivall@gmail.com>
---
I previously reported this a few hours ago, and ended up digging in and figuring
it out. I'll make sure to bottom reply in the follow ups to this patch.

 revision.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Junio C Hamano Feb. 8, 2025, 9:53 p.m. UTC | #1
Emily M Klassen <forivall@gmail.com> writes:

> "git log --graph --no-graph" missed cleaning up the output_prefix and
> output_prefix_data pointers. This resulted in a segfault when using "--patch",
> "--name-status" or "--name-only", as the output_prefix_data continued to be in
> use after free()
>
> Signed-off-by: Emily M Klassen <forivall@gmail.com>
> ---
> I previously reported this a few hours ago, and ended up digging in and figuring
> it out. I'll make sure to bottom reply in the follow ups to this patch.
>
>  revision.c | 2 ++
>  1 file changed, 2 insertions(+)

Reading the symptom in the proposed log message (which is very
clearly written, by the way), it seems that this is reproducible?

Can we have a test to make sure that the fix would not be broken
later?

> diff --git a/revision.c b/revision.c
> index 474fa1e767..84cb028e11 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -2615,6 +2615,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>  		graph_clear(revs->graph);
>  		revs->graph = graph_init(revs);
>  	} else if (!strcmp(arg, "--no-graph")) {
> +		revs->diffopt.output_prefix = NULL;
> +		revs->diffopt.output_prefix_data = NULL;
>  		graph_clear(revs->graph);
>  		revs->graph = NULL;
>  	} else if (!strcmp(arg, "--encode-email-headers")) {

Interesting.

In response to "--graph" (the code we can see in the context before
this part), we clear the revs->graph and then call graph_init(revs)
for ourselves, and we do not have to futz with diffopt at all, and
it works OK because output_prefix_data and output_prefix would be
overwritten by the graph_init() to the value we want to use anyway.

But of course, after "--no-graph", nobody clears these two members
for us, so we'd need to clear them here.

It might make the API less error-prone if the "clear" function
cleared the .graph and diffopt->output_prefix{,_data} together
but among three existing callers of graph_clear(), only this caller
needs to clear these two members, so it probably would not matter.

So in short, this seems to be a good fix for the immediate issue,
and it is unlikely that we'd need any follow-up work.
Junio C Hamano Feb. 10, 2025, 4:02 p.m. UTC | #2
Emily M Klassen <forivall@gmail.com> writes:

> Subject: Re: [PATCH] revision: fix missing null for freed memory
>
> "git log --graph --no-graph" missed cleaning up the output_prefix and
> output_prefix_data pointers. This resulted in a segfault when using "--patch",
> "--name-status" or "--name-only", as the output_prefix_data continued to be in
> use after free()

Rereading the title, I cannot make sense out of "fix missing null"
and guess what it wants to say.  Is "null" here used as a verb to
mean "to assign a NULL to a variable that points at ..."?

    revision: clear graph callback upon "--no-graph"

    "git log --graph --no-graph" first populates the .output_prefix
    member of diffopt, which is a callback function, to compute
    "--graph" header, and then discards the data the callback needs
    to compute the graph header but forgets to clear .output_prefix
    pointer in response to "--no-graph".  At runtime, we end up
    calling the function that we should not.

    Clear the member to stop making callback, and for a better
    hyginene, also clear the pointer pointing at a freed memory.

or something?

Other than that, as I said earlier, the patch looks good.

Thanks.

> Signed-off-by: Emily M Klassen <forivall@gmail.com>
> ---
> I previously reported this a few hours ago, and ended up digging in and figuring
> it out. I'll make sure to bottom reply in the follow ups to this patch.
>
>  revision.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/revision.c b/revision.c
> index 474fa1e767..84cb028e11 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -2615,6 +2615,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>  		graph_clear(revs->graph);
>  		revs->graph = graph_init(revs);
>  	} else if (!strcmp(arg, "--no-graph")) {
> +		revs->diffopt.output_prefix = NULL;
> +		revs->diffopt.output_prefix_data = NULL;
>  		graph_clear(revs->graph);
>  		revs->graph = NULL;
>  	} else if (!strcmp(arg, "--encode-email-headers")) {
Emily Klassen Feb. 10, 2025, 8:56 p.m. UTC | #3
On Mon, Feb 10, 2025 at 8:02 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Emily M Klassen <forivall@gmail.com> writes:
>
> > Subject: Re: [PATCH] revision: fix missing null for freed memory
> >
> > "git log --graph --no-graph" missed cleaning up the output_prefix and
> > output_prefix_data pointers. This resulted in a segfault when using "--patch",
> > "--name-status" or "--name-only", as the output_prefix_data continued to be in
> > use after free()
>
> Rereading the title, I cannot make sense out of "fix missing null"
> and guess what it wants to say.  Is "null" here used as a verb to
> mean "to assign a NULL to a variable that points at ..."?

Yeah, this was meant to say something like "fix missing null assignment after
freeing graph data", and I didn't really have the energy to think of a better
summary at the time.

>
>     revision: clear graph callback upon "--no-graph"
>
>     "git log --graph --no-graph" first populates the .output_prefix
>     member of diffopt, which is a callback function, to compute
>     "--graph" header, and then discards the data the callback needs
>     to compute the graph header but forgets to clear .output_prefix
>     pointer in response to "--no-graph".  At runtime, we end up
>     calling the function that we should not.
>
>     Clear the member to stop making callback, and for a better
>     hyginene, also clear the pointer pointing at a freed memory.
>
> or something?

Yup, this works well. A small bit of rephrasing for readability:

    revision: clear graph prefix callback upon "--no-graph"

    "git log --graph --no-graph" misses some cleanup: handling
    "--graph", it assigns the .output_prefix member of diffopt, which
    is a callback function to compute the graph prefix when displaying
    a diff. Then, when handling "--no-graph" it discards the data the
    callback needs to compute the graph header but forgets to clear
    .output_prefix pointer.  At runtime, we  call the function when we
    should not. It also passes a stale pointer to the data, which leads
    to a segfault when the callback is used for "--patch",
    "--name-status" or "--name-only".

    Clear the member to stop the callback from being called, and for
    hygiene, also clear the pointer pointing at a freed memory.

>
> Other than that, as I said earlier, the patch looks good.
>
> Thanks.

Awesome. I'll also add a test before re-submitting, as mentioned in
your other message.

Thanks for the feedback!

>
> > Signed-off-by: Emily M Klassen <forivall@gmail.com>
> > ---
> > I previously reported this a few hours ago, and ended up digging in and figuring
> > it out. I'll make sure to bottom reply in the follow ups to this patch.
> >
> >  revision.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/revision.c b/revision.c
> > index 474fa1e767..84cb028e11 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -2615,6 +2615,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
> >               graph_clear(revs->graph);
> >               revs->graph = graph_init(revs);
> >       } else if (!strcmp(arg, "--no-graph")) {
> > +             revs->diffopt.output_prefix = NULL;
> > +             revs->diffopt.output_prefix_data = NULL;
> >               graph_clear(revs->graph);
> >               revs->graph = NULL;
> >       } else if (!strcmp(arg, "--encode-email-headers")) {
Patrick Steinhardt Feb. 11, 2025, 7:55 a.m. UTC | #4
On Fri, Feb 07, 2025 at 10:17:02PM -0800, Emily M Klassen wrote:
> "git log --graph --no-graph" missed cleaning up the output_prefix and
> output_prefix_data pointers. This resulted in a segfault when using "--patch",
> "--name-status" or "--name-only", as the output_prefix_data continued to be in
> use after free()
> 
> Signed-off-by: Emily M Klassen <forivall@gmail.com>
> ---
> I previously reported this a few hours ago, and ended up digging in and figuring
> it out. I'll make sure to bottom reply in the follow ups to this patch.

Do we know when this bug was introduced? Is it a recent regression or a
long-standing issue? Might be nice to point out in the commit message if
we do know.

Patrick
Ben Knoble Feb. 11, 2025, 7:31 p.m. UTC | #5
On Tue, Feb 11, 2025 at 2:56 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Fri, Feb 07, 2025 at 10:17:02PM -0800, Emily M Klassen wrote:
> > "git log --graph --no-graph" missed cleaning up the output_prefix and
> > output_prefix_data pointers. This resulted in a segfault when using "--patch",
> > "--name-status" or "--name-only", as the output_prefix_data continued to be in
> > use after free()
> >
> > Signed-off-by: Emily M Klassen <forivall@gmail.com>
> > ---
> > I previously reported this a few hours ago, and ended up digging in and figuring
> > it out. I'll make sure to bottom reply in the follow ups to this patch.
>
> Do we know when this bug was introduced? Is it a recent regression or a
> long-standing issue? Might be nice to point out in the commit message if
> we do know.
>
> Patrick

Out of morbid curiosity, I've started bisecting this, but it's proven
slightly more subtle than I anticipated. If nobody beats me to it,
I'll post my results when I have them.
Ben Knoble Feb. 11, 2025, 8:22 p.m. UTC | #6
On Tue, Feb 11, 2025 at 2:31 PM D. Ben Knoble <ben.knoble@gmail.com> wrote:
>
> On Tue, Feb 11, 2025 at 2:56 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > On Fri, Feb 07, 2025 at 10:17:02PM -0800, Emily M Klassen wrote:
> > > "git log --graph --no-graph" missed cleaning up the output_prefix and
> > > output_prefix_data pointers. This resulted in a segfault when using "--patch",
> > > "--name-status" or "--name-only", as the output_prefix_data continued to be in
> > > use after free()
> > >
> > > Signed-off-by: Emily M Klassen <forivall@gmail.com>
> > > ---
> > > I previously reported this a few hours ago, and ended up digging in and figuring
> > > it out. I'll make sure to bottom reply in the follow ups to this patch.
> >
> > Do we know when this bug was introduced? Is it a recent regression or a
> > long-standing issue? Might be nice to point out in the commit message if
> > we do know.
> >
> > Patrick
>
> Out of morbid curiosity, I've started bisecting this, but it's proven
> slightly more subtle than I anticipated. If nobody beats me to it,
> I'll post my results when I have them.
>
> --
> D. Ben Knoble

2.{30,35}.0 fails to recognize --no-graph, so I checked "git log --grep no-graph
origin/master" with "git describe --contains" and decided that 2.36.0 was first
release recognizing --no-graph, but it didn't build for me (possibly an issue on
my end). I got 2.37.0 built, and it was "good," so that's where I started.

Here's my "bisect run" script.

    #! /bin/sh -x
    make || exit 125
    # segfault has exit >128
    ./bin-wrappers/git --no-pager log -2 --graph --no-graph --patch
--cc || exit 1

The --cc is important, since this repro logs from where the bisect is! Without
it, if the head commits are both merges (likely), the repro will accidentally
mark the commit as good when looking further for a commit with a patch will
fail. Omitting -2 might work, too, but that makes "git log" take longer.

With --first-parent on bisect, we find 3eb4cc451e (Merge branch
'jk/output-prefix-cleanup', 2024-10-10), which looks like a reasonable
candidate. Restarting between that commit and it's first parent, we get
19752d9c91 (diff: return line_prefix directly when possible, 2024-10-03). That
commit actually looks relatively innocuous, and I haven't tracked down how it
really impacts the problem or fix.

Perhaps the topic merge is more helpful to folks in assessing where the bug came
from. Otherwise, it may be that --cc is not enough and we should
bisect with --diff-merges=1 or something else guaranteed to generate a
diff and trip the bug.

To my shame, I didn't save the log from the --first-parent bisect from 2.37.0 to
388218fac7 (The ninth batch, 2025-02-10), but I did save the smaller one.

    # bad: [3eb4cc451ed97123ff76e183a5be8a7dc164d1f6] Merge branch
'jk/output-prefix-cleanup'
    # good: [31bc4454de66c22bc8570fd3af52a99843ac69b0] Merge branch
'ps/leakfixes-part-8'
    git bisect start '@' '@^'
    # good: [436728fe9d75d05fa2439f867ca2039012b86e69] diff: return
const char from output_prefix callback
    git bisect good 436728fe9d75d05fa2439f867ca2039012b86e69
    # bad: [1164e270b5af80516625b628945ec7365d992055] diff: store
graph prefix buf in git_graph struct
    git bisect bad 1164e270b5af80516625b628945ec7365d992055
    # bad: [19752d9c912478b9eef0bd83c2cf6da98974f536] diff: return
line_prefix directly when possible
    git bisect bad 19752d9c912478b9eef0bd83c2cf6da98974f536
    # first bad commit: [19752d9c912478b9eef0bd83c2cf6da98974f536]
diff: return line_prefix directly when possible
Jeff King Feb. 11, 2025, 9:29 p.m. UTC | #7
On Tue, Feb 11, 2025 at 03:22:28PM -0500, D. Ben Knoble wrote:

> 2.{30,35}.0 fails to recognize --no-graph, so I checked "git log --grep no-graph
> origin/master" with "git describe --contains" and decided that 2.36.0 was first
> release recognizing --no-graph, but it didn't build for me (possibly an issue on
> my end). I got 2.37.0 built, and it was "good," so that's where I started.
> 
> Here's my "bisect run" script.
> 
>     #! /bin/sh -x
>     make || exit 125
>     # segfault has exit >128
>     ./bin-wrappers/git --no-pager log -2 --graph --no-graph --patch
> --cc || exit 1

I don't think this is quite enough. The problem is a use-after-free, so
the behavior is undefined. Depending on whether that heap block is
reused, it might work just fine, or output garbage data, or segfault.

I'd have _thought_ it would usually just segfault, but it almost always
just output garbage for me. Building with:

  make SANITIZE=address,undefined

is a good way to get reliable results for this kind of memory error.
Doing that shows that v2.37.0 is actually bad. And bisecting shows that
it has been broken since 087c745833 (log: add a --no-graph option,
2022-02-11), which is not too surprising.

> The --cc is important, since this repro logs from where the bisect is! Without
> it, if the head commits are both merges (likely), the repro will accidentally
> mark the commit as good when looking further for a commit with a patch will
> fail. Omitting -2 might work, too, but that makes "git log" take longer.

I've also run into non-determinism when bisecting like this, because my
test command depends on the value of HEAD. The best solution here is to
just feed a stable tip to git-log. I bisected on:

  git log --graph --no-graph --patch origin >/dev/null

(I didn't need "-2" because good commits failed with "unrecognized
argument" and bad ones were killed by ASan immediately ;) ).

-Peff
Junio C Hamano Feb. 11, 2025, 11:09 p.m. UTC | #8
Jeff King <peff@peff.net> writes:

> Doing that shows that v2.37.0 is actually bad. And bisecting shows that
> it has been broken since 087c745833 (log: add a --no-graph option,
> 2022-02-11), which is not too surprising.

Yeah, broken from its beginning is usually how these kinds of bugs
turn out to be.

> I've also run into non-determinism when bisecting like this, because my
> test command depends on the value of HEAD. The best solution here is to
> just feed a stable tip to git-log. I bisected on:
>
>   git log --graph --no-graph --patch origin >/dev/null
>
> (I didn't need "-2" because good commits failed with "unrecognized
> argument" and bad ones were killed by ASan immediately ;) ).

Thanks for sharing a good tip.
Patrick Steinhardt Feb. 12, 2025, 5:30 a.m. UTC | #9
On Tue, Feb 11, 2025 at 04:29:09PM -0500, Jeff King wrote:
> On Tue, Feb 11, 2025 at 03:22:28PM -0500, D. Ben Knoble wrote:
> 
> > 2.{30,35}.0 fails to recognize --no-graph, so I checked "git log --grep no-graph
> > origin/master" with "git describe --contains" and decided that 2.36.0 was first
> > release recognizing --no-graph, but it didn't build for me (possibly an issue on
> > my end). I got 2.37.0 built, and it was "good," so that's where I started.
> > 
> > Here's my "bisect run" script.
> > 
> >     #! /bin/sh -x
> >     make || exit 125
> >     # segfault has exit >128
> >     ./bin-wrappers/git --no-pager log -2 --graph --no-graph --patch
> > --cc || exit 1
> 
> I don't think this is quite enough. The problem is a use-after-free, so
> the behavior is undefined. Depending on whether that heap block is
> reused, it might work just fine, or output garbage data, or segfault.
> 
> I'd have _thought_ it would usually just segfault, but it almost always
> just output garbage for me. Building with:
> 
>   make SANITIZE=address,undefined
> 
> is a good way to get reliable results for this kind of memory error.
> Doing that shows that v2.37.0 is actually bad. And bisecting shows that
> it has been broken since 087c745833 (log: add a --no-graph option,
> 2022-02-11), which is not too surprising.

Thanks all for bisecting :)

Patrick
Junio C Hamano Feb. 13, 2025, 12:42 a.m. UTC | #10
Emily Klassen <forivall@gmail.com> writes:

> Awesome. I'll also add a test before re-submitting, as mentioned in
> your other message.

Thanks.
Ben Knoble Feb. 13, 2025, 9:07 p.m. UTC | #11
> 
> Le 11 févr. 2025 à 16:29, Jeff King <peff@peff.net> a écrit :
> 
> On Tue, Feb 11, 2025 at 03:22:28PM -0500, D. Ben Knoble wrote:
> 
>> 2.{30,35}.0 fails to recognize --no-graph, so I checked "git log --grep no-graph
>> origin/master" with "git describe --contains" and decided that 2.36.0 was first
>> release recognizing --no-graph, but it didn't build for me (possibly an issue on
>> my end). I got 2.37.0 built, and it was "good," so that's where I started.
>> 
>> Here's my "bisect run" script.
>> 
>>    #! /bin/sh -x
>>    make || exit 125
>>    # segfault has exit >128
>>    ./bin-wrappers/git --no-pager log -2 --graph --no-graph --patch
>> --cc || exit 1
> 
> I don't think this is quite enough. The problem is a use-after-free, so
> the behavior is undefined. Depending on whether that heap block is
> reused, it might work just fine, or output garbage data, or segfault.
> 
> I'd have _thought_ it would usually just segfault, but it almost always
> just output garbage for me. Building with:
> 
>  make SANITIZE=address,undefined
> 
> is a good way to get reliable results for this kind of memory error.
> Doing that shows that v2.37.0 is actually bad. And bisecting shows that
> it has been broken since 087c745833 (log: add a --no-graph option,
> 2022-02-11), which is not too surprising.

Ah, fun, that’s more like what I was expecting. And thanks for the advice!

> 
>> The --cc is important, since this repro logs from where the bisect is! Without
>> it, if the head commits are both merges (likely), the repro will accidentally
>> mark the commit as good when looking further for a commit with a patch will
>> fail. Omitting -2 might work, too, but that makes "git log" take longer.
> 
> I've also run into non-determinism when bisecting like this, because my
> test command depends on the value of HEAD. The best solution here is to
> just feed a stable tip to git-log. I bisected on:
> 
>  git log --graph --no-graph --patch origin >/dev/null
> 
> (I didn't need "-2" because good commits failed with "unrecognized
> argument" and bad ones were killed by ASan immediately ;) ).
> 
> -Peff
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index 474fa1e767..84cb028e11 100644
--- a/revision.c
+++ b/revision.c
@@ -2615,6 +2615,8 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		graph_clear(revs->graph);
 		revs->graph = graph_init(revs);
 	} else if (!strcmp(arg, "--no-graph")) {
+		revs->diffopt.output_prefix = NULL;
+		revs->diffopt.output_prefix_data = NULL;
 		graph_clear(revs->graph);
 		revs->graph = NULL;
 	} else if (!strcmp(arg, "--encode-email-headers")) {