diff mbox series

[v2] rerere: fix crash during clear

Message ID 20240218194603.1210895-1-marcel@roethke.info (mailing list archive)
State New
Headers show
Series [v2] rerere: fix crash during clear | expand

Commit Message

Marcel Röthke Feb. 18, 2024, 7:46 p.m. UTC
When rerere_clear is called, for instance when aborting a rebase, and
the current conflict does not have a pre or postimage recorded git
crashes with a SEGFAULT in has_rerere_resolution when accessing the
status member of struct rerere_dir. This happens because scan_rerere_dir
only allocates the status field in struct rerere_dir when a post or
preimage was found. In some cases a segfault may happen even if a post
or preimage was recorded if it was not for the variant of interest and
the number of the variant that is present is lower than the variant of
interest.

Solve this by making sure the status field is large enough to
accommodate for the variant of interest so it can be accessed without
checking if it is large enough.

An alternative solution would be to always check before accessing the
status field, but I think the chosen solution aligns better with the
assumptions made elsewhere in the code.

Signed-off-by: Marcel Röthke <marcel@roethke.info>
---
Range-diff against v1:
1:  93f982d170 ! 1:  68178298fe rerere: fix crash in during clear
    @@ Metadata
     Author: Marcel Röthke <marcel@roethke.info>
     
      ## Commit message ##
    -    rerere: fix crash in during clear
    +    rerere: fix crash during clear
     
         When rerere_clear is called, for instance when aborting a rebase, and
         the current conflict does not have a pre or postimage recorded git
    @@ Commit message
         the number of the variant that is present is lower than the variant of
         interest.
     
    -    This patch solves this by making sure the status field is large enough
    -    to accommodate for the variant of interest so it can be accesses without
    +    Solve this by making sure the status field is large enough to
    +    accommodate for the variant of interest so it can be accessed without
         checking if it is large enough.
     
         An alternative solution would be to always check before accessing the
    @@ rerere.c: static void read_rr(struct repository *r, struct string_list *rr)
      		buf.buf[hexsz] = '\0';
      		id = new_rerere_id_hex(buf.buf);
      		id->variant = variant;
    -+		/* make sure id->collection->status has enough space
    -+		 * for the variant we are interested in */
    ++		/*
    ++		 * make sure id->collection->status has enough space
    ++		 * for the variant we are interested in
    ++		 */
     +		fit_variant(id->collection, variant);
      		string_list_insert(rr, path)->util = id;
      	}

 rerere.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Junio C Hamano Feb. 20, 2024, 1:22 a.m. UTC | #1
Marcel Röthke <marcel@roethke.info> writes:

> When rerere_clear is called, for instance when aborting a rebase, and
> the current conflict does not have a pre or postimage recorded git
> crashes with a SEGFAULT in has_rerere_resolution when accessing the
> status member of struct rerere_dir.

I had to read this twice before realizing the reason why I found it
hard to grok was because of a missing comma between "recorded" and
"git".

> This happens because scan_rerere_dir
> only allocates the status field in struct rerere_dir when a post or
> preimage was found.

But that is not really the root cause, no?  Readers following the
above text are probably wondering why the preimage was not recorded,
when a conflict resulted in stopping a mergy-command and invoking
rerere machinery, before rerere_clear() got called.  Is that
something that usually happen?  How?  Do we have a reproduction
sequence of such a state that we can make it into a new test in
t4200 where we already have tests for "git rerere clear" and its
friends?

> In some cases a segfault may happen even if a post
> or preimage was recorded if it was not for the variant of interest and
> the number of the variant that is present is lower than the variant of
> interest.

Ditto.  What sequence of events would lead to such a state?

The answer *can* be ".git/rr-cache being a normal directory, the
user can poke around, removing files randomly, which can create such
a problematic situation", and the reproduction test *can* also be to
simulate such an end-user action, but I am asking primarily because
I want to make sure that we are *not* losing or failing to create
necessary preimage files, causing this problem to users who do not
muck with files in .git/rr-cache themselves.

Thanks.
Marcel Röthke Feb. 24, 2024, 11:25 a.m. UTC | #2
On 2024-02-19 17:22:43, Junio C Hamano wrote:
> Marcel Röthke <marcel@roethke.info> writes:
>
> > When rerere_clear is called, for instance when aborting a rebase, and
> > the current conflict does not have a pre or postimage recorded git
> > crashes with a SEGFAULT in has_rerere_resolution when accessing the
> > status member of struct rerere_dir.
>
> I had to read this twice before realizing the reason why I found it
> hard to grok was because of a missing comma between "recorded" and
> "git".

fixed

> > This happens because scan_rerere_dir
> > only allocates the status field in struct rerere_dir when a post or
> > preimage was found.
>
> But that is not really the root cause, no?  Readers following the
> above text are probably wondering why the preimage was not recorded,
> when a conflict resulted in stopping a mergy-command and invoking
> rerere machinery, before rerere_clear() got called.  Is that
> something that usually happen?  How?  Do we have a reproduction
> sequence of such a state that we can make it into a new test in
> t4200 where we already have tests for "git rerere clear" and its
> friends?

I'm unfortunately not sure how it happened. I do have the initial state
of the repository and I think I know the commands that were executed,
but I could not reproduce it.

I will look into adding a test case though.
Junio C Hamano March 24, 2024, 9:51 p.m. UTC | #3
Marcel Röthke <marcel@roethke.info> writes:

> On 2024-02-19 17:22:43, Junio C Hamano wrote:
>> Marcel Röthke <marcel@roethke.info> writes:
>>
>> > When rerere_clear is called, for instance when aborting a rebase, and
>> > the current conflict does not have a pre or postimage recorded git
>> > crashes with a SEGFAULT in has_rerere_resolution when accessing the
>> > status member of struct rerere_dir.
>>
>> I had to read this twice before realizing the reason why I found it
>> hard to grok was because of a missing comma between "recorded" and
>> "git".
>
> fixed
> ...
> I'm unfortunately not sure how it happened. I do have the initial state
> of the repository and I think I know the commands that were executed,
> but I could not reproduce it.
>
> I will look into adding a test case though.

It has been about a month.  Any new development on this topic?

Thanks.
Marcel Röthke March 25, 2024, 7:30 p.m. UTC | #4
On 2024-03-24 14:51:17, Junio C Hamano wrote:
> Marcel Röthke <marcel@roethke.info> writes:
>
> > On 2024-02-19 17:22:43, Junio C Hamano wrote:
> >> Marcel Röthke <marcel@roethke.info> writes:
> >>
> >> > When rerere_clear is called, for instance when aborting a rebase, and
> >> > the current conflict does not have a pre or postimage recorded git
> >> > crashes with a SEGFAULT in has_rerere_resolution when accessing the
> >> > status member of struct rerere_dir.
> >>
> >> I had to read this twice before realizing the reason why I found it
> >> hard to grok was because of a missing comma between "recorded" and
> >> "git".
> >
> > fixed
> > ...
> > I'm unfortunately not sure how it happened. I do have the initial state
> > of the repository and I think I know the commands that were executed,
> > but I could not reproduce it.
> >
> > I will look into adding a test case though.
>
> It has been about a month.  Any new development on this topic?
>
> Thanks.

No, I have been otherwise occupied. But I plan to get back to this soon. Probably this week, next week at the latest.
Marcel Röthke April 7, 2024, 8:12 p.m. UTC | #5
After reproducing this without manually corrupting the database, I
noticed that the summary line is no longer accurate. Because it fixes
two SEGFAULTs and one of them does not happen during clear. Can I change this
in v3 or should I start a new thread?
Junio C Hamano April 8, 2024, 3:18 p.m. UTC | #6
Marcel Röthke <marcel@roethke.info> writes:

> After reproducing this without manually corrupting the database, I
> noticed that the summary line is no longer accurate. Because it fixes
> two SEGFAULTs and one of them does not happen during clear. Can I change this
> in v3 or should I start a new thread?

Updating the title in the course of evolution of a topic/patch is
perfectly normal.  Please do so in v3 as continuation of the
previous round.

Thanks.
diff mbox series

Patch

diff --git a/rerere.c b/rerere.c
index ca7e77ba68..4683d6cbb1 100644
--- a/rerere.c
+++ b/rerere.c
@@ -219,6 +219,11 @@  static void read_rr(struct repository *r, struct string_list *rr)
 		buf.buf[hexsz] = '\0';
 		id = new_rerere_id_hex(buf.buf);
 		id->variant = variant;
+		/*
+		 * make sure id->collection->status has enough space
+		 * for the variant we are interested in
+		 */
+		fit_variant(id->collection, variant);
 		string_list_insert(rr, path)->util = id;
 	}
 	strbuf_release(&buf);