diff mbox series

[3/4] repack: use tempfiles for signal cleanup

Message ID Y1MTJz3wy5xDEPEH@coredump.intra.peff.net (mailing list archive)
State Superseded
Headers show
Series repack tempfile-cleanup signal deadlock | expand

Commit Message

Jeff King Oct. 21, 2022, 9:46 p.m. UTC
When git-repack exits due to a signal, it tries to clean up by calling
its remove_temporary_files() function, which walks through the packs dir
looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of
the current process).

The biggest problem here is that remove_temporary_files() is not safe to
call in a signal handler. It uses opendir(), which isn't on the POSIX
async-signal-safe list. The details will be platform-specific, but a
likely issue is that it needs to allocate memory; if we receive a signal
while inside malloc(), etc, we'll conflict on the allocator lock and
deadlock with ourselves.

We can fix this by just cleaning up the files directly, without walking
the directory. We already know the complete list of .tmp-* files that
were generated, because we recorded them via populate_pack_exts(). When
we find files there, we can use register_tempfile() to record the
filenames. If we receive a signal, then the tempfile API will clean them
up for us, and it's async-safe and pretty battle-tested.

Note that this is slightly racier than the existing scheme. We don't
record the filenames until pack-objects tells us the hash over stdout.
So during the period between it generating the file and reporting the
hash, we'd fail to clean up. However, that period is very small. During
most of the pack generation process pack-objects is using its own
internal tempfiles. It's only at the very end that it moves them into
the names git-repack expects, and then it immediately reports the name
to us. Given that cleanup like this is best effort (after all, we may
get SIGKILL), this level of race is acceptable.

When we register the tempfiles, we'll record them locally and use the
result to call rename_tempfile(), rather than renaming by hand.  This
isn't strictly necessary, as once we've renamed the files they're gone,
and the tempfile API's cleanup unlink() would simply become a pointless
noop. But managing the lifetimes of the tempfile objects is the cleanest
thing to do, and the tempfile pointers naturally fill the same role as
the old booleans.

This patch also fixes another small problem. We only hook signals, and
don't set up an atexit handler. So if we see an error that causes us to
die(), we'll leave the .tmp-* files in place. But since the tempfile API
handles this for us, this is now fixed for free. The new test covers
this by stimulating a failure of pack-objects when generating a cruft
pack. Before this patch, the .tmp-* file for the main pack would have
been left, but now we correctly clean it up.

Reported-by: Jan Pokorný <poki@fnusa.cz>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/repack.c  | 17 ++++-------------
 t/t7700-repack.sh |  8 ++++++++
 2 files changed, 12 insertions(+), 13 deletions(-)

Comments

Junio C Hamano Oct. 21, 2022, 10:30 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> This patch also fixes another small problem. We only hook signals, and
> don't set up an atexit handler. So if we see an error that causes us to
> die(), we'll leave the .tmp-* files in place. But since the tempfile API
> handles this for us, this is now fixed for free. The new test covers
> this by stimulating a failure of pack-objects when generating a cruft
> pack. Before this patch, the .tmp-* file for the main pack would have
> been left, but now we correctly clean it up.
>
> Reported-by: Jan Pokorný <poki@fnusa.cz>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/repack.c  | 17 ++++-------------
>  t/t7700-repack.sh |  8 ++++++++
>  2 files changed, 12 insertions(+), 13 deletions(-)

Nice.  

>  struct generated_pack_data {
> -	char exts[ARRAY_SIZE(exts)];
> +	struct tempfile *tempfiles[ARRAY_SIZE(exts)];
> ...
> -		data->exts[i] = 1;
> +		data->tempfiles[i] = register_tempfile(path.buf);

OK.

>  	}
>  
>  	strbuf_release(&path);
> @@ -867,8 +860,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  		split_pack_geometry(geometry, geometric_factor);
>  	}
>  
> -	sigchain_push_common(remove_pack_on_signal);
> -
>  	prepare_pack_objects(&cmd, &po_args);
>  
>  	show_progress = !po_args.quiet && isatty(2);
> @@ -1020,14 +1011,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  			fname_old = mkpathdup("%s-%s%s",
>  					packtmp, item->string, exts[ext].name);
>  
> -			if (data->exts[ext]) {
> +			if (data->tempfiles[ext]) {
>  				struct stat statbuffer;
>  				if (!stat(fname_old, &statbuffer)) {
>  					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
>  					chmod(fname_old, statbuffer.st_mode);
>  				}
>  
> -				if (rename(fname_old, fname))
> +				if (rename_tempfile(&data->tempfiles[ext], fname))
>  					die_errno(_("renaming '%s' failed"), fname_old);

It now got a bit confusing that we have 'fname', 'fname_old', and
the tempfile.  The path.buf used as the argument to register_tempfile()
matches what is used to compute fname_old above.  I wonder if tempfile
API does not give us that name so that we can stop using fname_old here?
Jeff King Oct. 21, 2022, 11:24 p.m. UTC | #2
On Fri, Oct 21, 2022 at 03:30:29PM -0700, Junio C Hamano wrote:

> > @@ -867,8 +860,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  		split_pack_geometry(geometry, geometric_factor);
> >  	}
> >  
> > -	sigchain_push_common(remove_pack_on_signal);
> > -
> >  	prepare_pack_objects(&cmd, &po_args);
> >  
> >  	show_progress = !po_args.quiet && isatty(2);
> > @@ -1020,14 +1011,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  			fname_old = mkpathdup("%s-%s%s",
> >  					packtmp, item->string, exts[ext].name);
> >  
> > -			if (data->exts[ext]) {
> > +			if (data->tempfiles[ext]) {
> >  				struct stat statbuffer;
> >  				if (!stat(fname_old, &statbuffer)) {
> >  					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
> >  					chmod(fname_old, statbuffer.st_mode);
> >  				}
> >  
> > -				if (rename(fname_old, fname))
> > +				if (rename_tempfile(&data->tempfiles[ext], fname))
> >  					die_errno(_("renaming '%s' failed"), fname_old);
> 
> It now got a bit confusing that we have 'fname', 'fname_old', and
> the tempfile.  The path.buf used as the argument to register_tempfile()
> matches what is used to compute fname_old above.  I wonder if tempfile
> API does not give us that name so that we can stop using fname_old here?

It does, and we probably should use get_tempfile_path() in the error
message here. But sadly we can't get rid of fname_old entirely, as it's
used below this for the second block in the if-else chain:

  if (data->tempfiles[ext]) {
     ...do the rename ...
  } else if (!exts[ext].optional)
	  die(_("missing required file: %s"), fname_old);
  else if (unlink(fname) < 0 && errno != ENOENT)
	  die_errno(_("could not unlink: %s"), fname);

OTOH, it would probably be equally readable (or perhaps even better) for
that second block to say:

  die("pack-objects did not generate a '%s' file for pack %s",
      exts[ext].name, item->string);

And then we could drop fname_old entirely. Which is nice, because it
gets rid of the implicit assumption that the tempfile matches what is in
fname_old (which is always true, but since they are generated by
individual lines far apart from each other, it's possible for that to
change).

-Peff
Taylor Blau Oct. 21, 2022, 11:45 p.m. UTC | #3
On Fri, Oct 21, 2022 at 07:24:46PM -0400, Jeff King wrote:
> On Fri, Oct 21, 2022 at 03:30:29PM -0700, Junio C Hamano wrote:
>
> > > @@ -867,8 +860,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> > >  		split_pack_geometry(geometry, geometric_factor);
> > >  	}
> > >
> > > -	sigchain_push_common(remove_pack_on_signal);
> > > -
> > >  	prepare_pack_objects(&cmd, &po_args);
> > >
> > >  	show_progress = !po_args.quiet && isatty(2);
> > > @@ -1020,14 +1011,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> > >  			fname_old = mkpathdup("%s-%s%s",
> > >  					packtmp, item->string, exts[ext].name);
> > >
> > > -			if (data->exts[ext]) {
> > > +			if (data->tempfiles[ext]) {
> > >  				struct stat statbuffer;
> > >  				if (!stat(fname_old, &statbuffer)) {
> > >  					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
> > >  					chmod(fname_old, statbuffer.st_mode);
> > >  				}
> > >
> > > -				if (rename(fname_old, fname))
> > > +				if (rename_tempfile(&data->tempfiles[ext], fname))
> > >  					die_errno(_("renaming '%s' failed"), fname_old);
> >
> > It now got a bit confusing that we have 'fname', 'fname_old', and
> > the tempfile.  The path.buf used as the argument to register_tempfile()
> > matches what is used to compute fname_old above.  I wonder if tempfile
> > API does not give us that name so that we can stop using fname_old here?
>
> It does, and we probably should use get_tempfile_path() in the error
> message here. But sadly we can't get rid of fname_old entirely, as it's
> used below this for the second block in the if-else chain:
>
>   if (data->tempfiles[ext]) {
>      ...do the rename ...
>   } else if (!exts[ext].optional)
> 	  die(_("missing required file: %s"), fname_old);
>   else if (unlink(fname) < 0 && errno != ENOENT)
> 	  die_errno(_("could not unlink: %s"), fname);
>
> OTOH, it would probably be equally readable (or perhaps even better) for
> that second block to say:
>
>   die("pack-objects did not generate a '%s' file for pack %s",
>       exts[ext].name, item->string);
>
> And then we could drop fname_old entirely. Which is nice, because it
> gets rid of the implicit assumption that the tempfile matches what is in
> fname_old (which is always true, but since they are generated by
> individual lines far apart from each other, it's possible for that to
> change).

TBH, I've always found fname_old to be a confusing name. It's not really
"old", in fact we just had pack-objects write that file ;-). It really
does pertain to the tempfile, and I think using get_tempfile_path() when
we have a tempfile to rename is sensible.

I think that your proposed error message is good, too, and doubly so
since it lets us get rid of fname_old entirely. Yay :-).

> -Peff

Thanks,
Taylor
Jeff King Oct. 22, 2022, 12:11 a.m. UTC | #4
On Fri, Oct 21, 2022 at 07:24:46PM -0400, Jeff King wrote:

> > > @@ -1020,14 +1011,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> > >  			fname_old = mkpathdup("%s-%s%s",
> > >  					packtmp, item->string, exts[ext].name);
> > >  
> > > -			if (data->exts[ext]) {
> > > +			if (data->tempfiles[ext]) {
> > >  				struct stat statbuffer;
> > >  				if (!stat(fname_old, &statbuffer)) {
> > >  					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
> > >  					chmod(fname_old, statbuffer.st_mode);
> > >  				}
> > >  
> > > -				if (rename(fname_old, fname))
> > > +				if (rename_tempfile(&data->tempfiles[ext], fname))
> > >  					die_errno(_("renaming '%s' failed"), fname_old);
> > 
> > It now got a bit confusing that we have 'fname', 'fname_old', and
> > the tempfile.  The path.buf used as the argument to register_tempfile()
> > matches what is used to compute fname_old above.  I wonder if tempfile
> > API does not give us that name so that we can stop using fname_old here?
> 
> It does, and we probably should use get_tempfile_path() in the error
> message here.

Gah, this is not quite true. If the rename fails, we clean up the
tempfile struct entirely, and that invalidates the pointer. I think it
is OK to just report "fname" in this case, though, which is what most
callers do. Arguably the tempfile API should leave the tempfile in place
on a failed rename, letting the callers decide themselves how to handle
things. In most cases they'll just exit anyway, which will clean up the
tempfile.

I also didn't notice that we do some mode-twiddling on fname_old. But I
think if the code becomes (inside this conditional block, once we stop
using it in the other half):

  const char *fname_old = get_tempfile_path(data->tempfiles[ext]);

then those lines don't even need to change.

-Peff
Jeff King Oct. 22, 2022, 12:12 a.m. UTC | #5
On Fri, Oct 21, 2022 at 07:45:28PM -0400, Taylor Blau wrote:

> TBH, I've always found fname_old to be a confusing name. It's not really
> "old", in fact we just had pack-objects write that file ;-). It really
> does pertain to the tempfile, and I think using get_tempfile_path() when
> we have a tempfile to rename is sensible.

I don't love it either, but I've kept it in what I'm preparing, just
because we need _some_ variable to avoid writing:

  get_tempfile_path(data->tempfiles[ext])

over and over. And using the same one keeps the diff minimal. If it's
too terrible we can rename it on top. :)

> I think that your proposed error message is good, too, and doubly so
> since it lets us get rid of fname_old entirely. Yay :-).

Thanks. I'm preparing something along those lines.

-Peff
diff mbox series

Patch

diff --git a/builtin/repack.c b/builtin/repack.c
index b5bd9e5fed..15b6f24626 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -122,13 +122,6 @@  static void remove_temporary_files(void)
 	strbuf_release(&buf);
 }
 
-static void remove_pack_on_signal(int signo)
-{
-	remove_temporary_files();
-	sigchain_pop(signo);
-	raise(signo);
-}
-
 /*
  * Adds all packs hex strings to either fname_nonkept_list or
  * fname_kept_list based on whether each pack has a corresponding
@@ -248,7 +241,7 @@  static struct {
 };
 
 struct generated_pack_data {
-	char exts[ARRAY_SIZE(exts)];
+	struct tempfile *tempfiles[ARRAY_SIZE(exts)];
 };
 
 static struct generated_pack_data *populate_pack_exts(const char *name)
@@ -265,7 +258,7 @@  static struct generated_pack_data *populate_pack_exts(const char *name)
 		if (stat(path.buf, &statbuf))
 			continue;
 
-		data->exts[i] = 1;
+		data->tempfiles[i] = register_tempfile(path.buf);
 	}
 
 	strbuf_release(&path);
@@ -867,8 +860,6 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 		split_pack_geometry(geometry, geometric_factor);
 	}
 
-	sigchain_push_common(remove_pack_on_signal);
-
 	prepare_pack_objects(&cmd, &po_args);
 
 	show_progress = !po_args.quiet && isatty(2);
@@ -1020,14 +1011,14 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 			fname_old = mkpathdup("%s-%s%s",
 					packtmp, item->string, exts[ext].name);
 
-			if (data->exts[ext]) {
+			if (data->tempfiles[ext]) {
 				struct stat statbuffer;
 				if (!stat(fname_old, &statbuffer)) {
 					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
 					chmod(fname_old, statbuffer.st_mode);
 				}
 
-				if (rename(fname_old, fname))
+				if (rename_tempfile(&data->tempfiles[ext], fname))
 					die_errno(_("renaming '%s' failed"), fname_old);
 			} else if (!exts[ext].optional)
 				die(_("missing required file: %s"), fname_old);
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index ca45c4cd2c..592016f64a 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -432,6 +432,14 @@  test_expect_success TTY '--quiet disables progress' '
 	test_must_be_empty stderr
 '
 
+test_expect_success 'clean up .tmp-* packs on error' '
+	test_must_fail git \
+		-c repack.cruftwindow=bogus \
+		repack -ad --cruft &&
+	find $objdir/pack -name '.tmp-*' >tmpfiles &&
+	test_must_be_empty tmpfiles
+'
+
 test_expect_success 'setup for update-server-info' '
 	git init update-server-info &&
 	test_commit -C update-server-info message