diff mbox series

Segfaults in git rebase --continue and git rerere

Message ID YTlgyHnD3fFWwLu3@camp.crustytoothpaste.net (mailing list archive)
State New, archived
Headers show
Series Segfaults in git rebase --continue and git rerere | expand

Commit Message

brian m. carlson Sept. 9, 2021, 1:18 a.m. UTC
I'm having a bit of a problem with segfaults using
2.33.0.252.g9b09ab0cd71.  I was in the process of running "git rebase
--continue" with that version to resolve some conflicts in a project I'm
working on.  At that point, it segfaulted, and I got this (apologies for
the French):

  error: impossible d'analyser la section en conflit dans 'scutiger-lfs/src/bin/git-lfs-transfer.rs'
  error: impossible d'analyser la section en conflit dans 'scutiger-lfs/src/bin/git-lfs-transfer.rs'
  Pré-image enregistrée pour 'scutiger-lfs/src/bin/git-lfs-transfer.rs'
  [HEAD détachée 5134185] scutiger-lfs: move BatchItem, Mode, and Oid to library
   2 files changed, 74 insertions(+), 9 deletions(-)
  Fusion automatique de scutiger-lfs/src/processor.rs
  Fusion automatique de scutiger-lfs/src/bin/git-lfs-transfer.rs
  CONFLIT (contenu) : Conflit de fusion dans scutiger-lfs/src/bin/git-lfs-transfer.rs
  error: impossible d'appliquer 2cd1615... scutiger-lfs: move download and verify actions into backend
  Resolve all conflicts manually, mark them as resolved with
  "git add/rm <conflicted_files>", then run "git rebase --continue".
  You can instead skip this commit: run "git rebase --skip".
  To abort and get back to the state before "git rebase", run "git rebase --abort".
  error: impossible d'analyser la section en conflit dans 'scutiger-lfs/src/bin/git-lfs-transfer.rs'
  zsh: segmentation fault  git rebase --continue

I can provide a tarball of the broken repo upon request.  It's 108 MB,
so you'll need to provide some place for me to drop it.

At that point, I needed to remove .git/MERGE_RR.lock (which is empty),
and I ran "git rebase --abort" (because I realized my resolution was
bad).  Upon which, I received a second segfault (traceback from tip of
next):

  #0  0x00000000005b6e6b in has_rerere_resolution (id=0x229b3a0) at rerere.c:162
  162             return ((id->collection->status[variant] & both) == both);
  
  #0  0x00000000005b6e6b in has_rerere_resolution (id=0x229b3a0) at rerere.c:162
          both = 3
          variant = 0
  #1  rerere_clear (r=0x722880 <the_repo>, merge_rr=merge_rr@entry=0x7ffe8c703810) at rerere.c:1239
          id = 0x229b3a0
          i = 0

It appears to be because id->collection->status here is NULL.  It's
unclear to me why that's the case, but it does appear to be linked to my
.git/MERGE_RR file, which looks like this (xxd -g1):

  00000000: 30 34 39 62 31 34 32 39 34 65 64 30 63 30 65 66  049b14294ed0c0ef
  00000010: 62 65 39 32 66 34 66 64 33 31 31 63 37 63 34 30  be92f4fd311c7c40
  00000020: 39 30 39 34 65 63 64 65 09 73 63 75 74 69 67 65  9094ecde.scutige
  00000030: 72 2d 6c 66 73 2f 73 72 63 2f 62 69 6e 2f 67 69  r-lfs/src/bin/gi
  00000040: 74 2d 6c 66 73 2d 74 72 61 6e 73 66 65 72 2e 72  t-lfs-transfer.r
  00000050: 73 00                                            s.

The corresponding directory under .git/rr-cache is empty.  This
specifically seems to be the problem, and I've included a snippet of a
test below that causes the same segfault.  My guess is that the original
segfault left the MERGE_RR file present but the files in the rr-cache
directory absent.

Since rerere isn't a strong suit of mine, I'm not sending a patch, but I
am including a failing test[0] which indicates the problem (and to which
anyone is welcome to add my sign-off) in hopes that someone more
familiar with this subsystem can figure out what's going on.

----- %< -----
----- %< -----

As an aside, I generally find Git's codebase to be of exceptionally good
quality for a C program, and so seeing two segfaults back to back led me
to downgrade my recently upgraded version of glibc to see if somehow
that was the problem.  Unfortunately, that was not the case, and we just
have two separate bugs here.

[0] The test uses perl because the MERGE_RR file contains a NUL byte and
therefore cannot be used with standard POSIX utilities.

Comments

Ævar Arnfjörð Bjarmason Sept. 9, 2021, 1:48 a.m. UTC | #1
On Thu, Sep 09 2021, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> I'm having a bit of a problem with segfaults using
> 2.33.0.252.g9b09ab0cd71.  I was in the process of running "git rebase
> --continue" with that version to resolve some conflicts in a project I'm
> working on.  At that point, it segfaulted, and I got this (apologies for
> the French):
>
>   error: impossible d'analyser la section en conflit dans 'scutiger-lfs/src/bin/git-lfs-transfer.rs'
>   error: impossible d'analyser la section en conflit dans 'scutiger-lfs/src/bin/git-lfs-transfer.rs'
>   Pré-image enregistrée pour 'scutiger-lfs/src/bin/git-lfs-transfer.rs'
>   [HEAD détachée 5134185] scutiger-lfs: move BatchItem, Mode, and Oid to library
>    2 files changed, 74 insertions(+), 9 deletions(-)
>   Fusion automatique de scutiger-lfs/src/processor.rs
>   Fusion automatique de scutiger-lfs/src/bin/git-lfs-transfer.rs
>   CONFLIT (contenu) : Conflit de fusion dans scutiger-lfs/src/bin/git-lfs-transfer.rs
>   error: impossible d'appliquer 2cd1615... scutiger-lfs: move download and verify actions into backend
>   Resolve all conflicts manually, mark them as resolved with
>   "git add/rm <conflicted_files>", then run "git rebase --continue".
>   You can instead skip this commit: run "git rebase --skip".
>   To abort and get back to the state before "git rebase", run "git rebase --abort".
>   error: impossible d'analyser la section en conflit dans 'scutiger-lfs/src/bin/git-lfs-transfer.rs'
>   zsh: segmentation fault  git rebase --continue
>
> I can provide a tarball of the broken repo upon request.  It's 108 MB,
> so you'll need to provide some place for me to drop it.
>
> At that point, I needed to remove .git/MERGE_RR.lock (which is empty),
> and I ran "git rebase --abort" (because I realized my resolution was
> bad).  Upon which, I received a second segfault (traceback from tip of
> next):
>
>   #0  0x00000000005b6e6b in has_rerere_resolution (id=0x229b3a0) at rerere.c:162
>   162             return ((id->collection->status[variant] & both) == both);
>   
>   #0  0x00000000005b6e6b in has_rerere_resolution (id=0x229b3a0) at rerere.c:162
>           both = 3
>           variant = 0
>   #1  rerere_clear (r=0x722880 <the_repo>, merge_rr=merge_rr@entry=0x7ffe8c703810) at rerere.c:1239
>           id = 0x229b3a0
>           i = 0
>
> It appears to be because id->collection->status here is NULL.  It's
> unclear to me why that's the case, but it does appear to be linked to my
> .git/MERGE_RR file, which looks like this (xxd -g1):
>
>   00000000: 30 34 39 62 31 34 32 39 34 65 64 30 63 30 65 66  049b14294ed0c0ef
>   00000010: 62 65 39 32 66 34 66 64 33 31 31 63 37 63 34 30  be92f4fd311c7c40
>   00000020: 39 30 39 34 65 63 64 65 09 73 63 75 74 69 67 65  9094ecde.scutige
>   00000030: 72 2d 6c 66 73 2f 73 72 63 2f 62 69 6e 2f 67 69  r-lfs/src/bin/gi
>   00000040: 74 2d 6c 66 73 2d 74 72 61 6e 73 66 65 72 2e 72  t-lfs-transfer.r
>   00000050: 73 00                                            s.
>
> The corresponding directory under .git/rr-cache is empty.  This
> specifically seems to be the problem, and I've included a snippet of a
> test below that causes the same segfault.  My guess is that the original
> segfault left the MERGE_RR file present but the files in the rr-cache
> directory absent.
>
> Since rerere isn't a strong suit of mine, I'm not sending a patch, but I
> am including a failing test[0] which indicates the problem (and to which
> anyone is welcome to add my sign-off) in hopes that someone more
> familiar with this subsystem can figure out what's going on.

I haven't taken time to familiarize myself with it, but I have one
segfault fix for it already[1] which needs a test, I tried and this is
completely unrelated.

The "fix" for this segfault you're reporting could be, this makes your
test pass, along with the rest of the test suite:

diff --git a/rerere.c b/rerere.c
index d83d58df4fb..cbad6a89ebc 100644
--- a/rerere.c
+++ b/rerere.c
@@ -157,6 +157,9 @@ static int has_rerere_resolution(const struct rerere_id *id)
 	const int both = RR_HAS_POSTIMAGE|RR_HAS_PREIMAGE;
 	int variant = id->variant;
 
+	if (variant == id->collection->status_nr)
+		return 1;
+
 	if (variant < 0)
 		return 0;
 	return ((id->collection->status[variant] & both) == both);

But I have no idea if that's sensible. I.e. as you found the immediate
cause here is that our "id" is being looked up in
"id->collection->status" where that's NULL, there's similar checks
elsewhere for "id->collection->status_nr", but if that's correct here I
don't know, nor what the deeper logic error is of how the two went out
of sync.

1. https://lore.kernel.org/git/87v96p4w3f.fsf@evledraar.gmail.com/

> ----- %< -----
> diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
> index 9f8c76dffb..c44dfe248a 100755
> --- a/t/t4200-rerere.sh
> +++ b/t/t4200-rerere.sh
> @@ -623,6 +623,7 @@ test_expect_success 'rerere with inner conflict markers' '
>  	git commit -q -m "will solve conflicts later" &&
>  	test_must_fail git merge A &&
>  
> +	cp .git/MERGE_RR merge_rr &&
>  	echo "resolved" >test &&
>  	git add test &&
>  	git commit -q -m "solved conflict" &&
> @@ -645,6 +646,13 @@ test_expect_success 'rerere with inner conflict markers' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'rerere clear does not segfault with bad data' '
> +	res_id=$($PERL_PATH -nF"\t" -e "print \$F[0]" merge_rr) &&
> +	cp merge_rr .git/MERGE_RR &&
> +	rm -f .git/rr-cache/$res_id/* &&
> +	git rerere clear
> +'
> +
>  test_expect_success 'setup simple stage 1 handling' '
>  	test_create_repo stage_1_handling &&
>  	(
> ----- %< -----
>
> As an aside, I generally find Git's codebase to be of exceptionally good
> quality for a C program, and so seeing two segfaults back to back led me
> to downgrade my recently upgraded version of glibc to see if somehow
> that was the problem.  Unfortunately, that was not the case, and we just
> have two separate bugs here.
>
> [0] The test uses perl because the MERGE_RR file contains a NUL byte and
> therefore cannot be used with standard POSIX utilities.

(Just a side-note, I think the use of perl here is fine)

I haven't tried, but I think you can probably do that res_id assignment
with "git grep" and -o, see t7816-grep-binary-pattern.sh for how we can
support greps with \0 in them with -f.
diff mbox series

Patch

diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 9f8c76dffb..c44dfe248a 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -623,6 +623,7 @@  test_expect_success 'rerere with inner conflict markers' '
 	git commit -q -m "will solve conflicts later" &&
 	test_must_fail git merge A &&
 
+	cp .git/MERGE_RR merge_rr &&
 	echo "resolved" >test &&
 	git add test &&
 	git commit -q -m "solved conflict" &&
@@ -645,6 +646,13 @@  test_expect_success 'rerere with inner conflict markers' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rerere clear does not segfault with bad data' '
+	res_id=$($PERL_PATH -nF"\t" -e "print \$F[0]" merge_rr) &&
+	cp merge_rr .git/MERGE_RR &&
+	rm -f .git/rr-cache/$res_id/* &&
+	git rerere clear
+'
+
 test_expect_success 'setup simple stage 1 handling' '
 	test_create_repo stage_1_handling &&
 	(