diff mbox series

[1/1] gc/repack: release packs when needed

Message ID 7eee3d107927b30bd3e1ec422e833111627252ce.1544911438.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series gc/repack: release packs when needed | expand

Commit Message

John Passaro via GitGitGadget Dec. 15, 2018, 10:04 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

On Windows, files cannot be removed nor renamed if there are still
handles held by a process. To remedy that, we introduced the
close_all_packs() function.

Earlier, we made sure that the packs are released just before `git gc`
is spawned, in case that gc wants to remove no-longer needed packs.

But this developer forgot that gc itself also needs to let go of packs,
e.g. when consolidating all packs via the --aggressive option.

Likewise, `git repack -d` wants to delete obsolete packs and therefore
needs to close all pack handles, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/gc.c     | 4 +++-
 builtin/repack.c | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Jan. 10, 2019, 9:01 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> On Windows, files cannot be removed nor renamed if there are still
> handles held by a process. To remedy that, we introduced the
> close_all_packs() function.
>
> Earlier, we made sure that the packs are released just before `git gc`
> is spawned, in case that gc wants to remove no-longer needed packs.
>
> But this developer forgot that gc itself also needs to let go of packs,
> e.g. when consolidating all packs via the --aggressive option.
>
> Likewise, `git repack -d` wants to delete obsolete packs and therefore
> needs to close all pack handles, too.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  builtin/gc.c     | 4 +++-
>  builtin/repack.c | 2 ++
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 871a56f1c5..df90fd7f51 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -659,8 +659,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>  
>  	report_garbage = report_pack_garbage;
>  	reprepare_packed_git(the_repository);
> -	if (pack_garbage.nr > 0)
> +	if (pack_garbage.nr > 0) {
> +		close_all_packs(the_repository->objects);
>  		clean_pack_garbage();
> +	}

Closing before removing does make sense, but wouldn't we want to
move reprepare_packed_git() after clean_pack_garbage() while at it?
After all, the logical sequence is that we used the current set of
packs to figure out whihch ones are garbage, then now we are about
to discard.  We close the packs in the current set (i.e. the fix
made in this patch), discard the garbage packs.  It would make sense
to start using the new set (i.e. "reprepare") after all that is
done, no?  Especially, given that the next step (write-commit-graph)
still wants to read quite a lot of data from now the latest set of
packfiles...

>  	if (gc_write_commit_graph)
>  		write_commit_graph_reachable(get_object_directory(), 0,
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 45583683ee..f9319defe4 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -419,6 +419,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	if (!names.nr && !po_args.quiet)
>  		printf("Nothing new to pack.\n");
>  
> +	close_all_packs(the_repository->objects);
> +

On the other hand, This one is added to the ideal and perfect
location, I think.

Thanks.

>  	/*
>  	 * Ok we have prepared all new packfiles.
>  	 * First see if there are packs of the same name and if so
Jeff King Jan. 11, 2019, 4:10 p.m. UTC | #2
On Thu, Jan 10, 2019 at 01:01:36PM -0800, Junio C Hamano wrote:

> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index 871a56f1c5..df90fd7f51 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
> > @@ -659,8 +659,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
> >  
> >  	report_garbage = report_pack_garbage;
> >  	reprepare_packed_git(the_repository);
> > -	if (pack_garbage.nr > 0)
> > +	if (pack_garbage.nr > 0) {
> > +		close_all_packs(the_repository->objects);
> >  		clean_pack_garbage();
> > +	}
> 
> Closing before removing does make sense, but wouldn't we want to
> move reprepare_packed_git() after clean_pack_garbage() while at it?
> After all, the logical sequence is that we used the current set of
> packs to figure out whihch ones are garbage, then now we are about
> to discard.  We close the packs in the current set (i.e. the fix
> made in this patch), discard the garbage packs.  It would make sense
> to start using the new set (i.e. "reprepare") after all that is
> done, no?  Especially, given that the next step (write-commit-graph)
> still wants to read quite a lot of data from now the latest set of
> packfiles...

I agree that your suggested ordering makes more sense, but I don't think
it matters in practice with the current code. reprepare_packed_git()
never throws away old pack entries (and if they're mmap'd, we might even
continue to use them). So the end result is the same either way.

-Peff
Junio C Hamano Jan. 11, 2019, 5:24 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> On Thu, Jan 10, 2019 at 01:01:36PM -0800, Junio C Hamano wrote:
>
>> > diff --git a/builtin/gc.c b/builtin/gc.c
>> > index 871a56f1c5..df90fd7f51 100644
>> > --- a/builtin/gc.c
>> > +++ b/builtin/gc.c
>> > @@ -659,8 +659,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>> >  
>> >  	report_garbage = report_pack_garbage;
>> >  	reprepare_packed_git(the_repository);
>> > -	if (pack_garbage.nr > 0)
>> > +	if (pack_garbage.nr > 0) {
>> > +		close_all_packs(the_repository->objects);
>> >  		clean_pack_garbage();
>> > +	}
>> 
>> Closing before removing does make sense, but wouldn't we want to
>> move reprepare_packed_git() after clean_pack_garbage() while at it?
>> After all, the logical sequence is that we used the current set of
>> packs to figure out whihch ones are garbage, then now we are about
>> to discard.  We close the packs in the current set (i.e. the fix
>> made in this patch), discard the garbage packs.  It would make sense
>> to start using the new set (i.e. "reprepare") after all that is
>> done, no?  Especially, given that the next step (write-commit-graph)
>> still wants to read quite a lot of data from now the latest set of
>> packfiles...
>
> I agree that your suggested ordering makes more sense, but I don't think
> it matters in practice with the current code. reprepare_packed_git()
> never throws away old pack entries (and if they're mmap'd, we might even
> continue to use them). So the end result is the same either way.

Yeah, it would not make difference to the machine.  I was trying to
be more helpful to human readers.

In any case, this patch from Dec 15 last year is where my backlog
sweeping is at right now X-<.
diff mbox series

Patch

diff --git a/builtin/gc.c b/builtin/gc.c
index 871a56f1c5..df90fd7f51 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -659,8 +659,10 @@  int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	report_garbage = report_pack_garbage;
 	reprepare_packed_git(the_repository);
-	if (pack_garbage.nr > 0)
+	if (pack_garbage.nr > 0) {
+		close_all_packs(the_repository->objects);
 		clean_pack_garbage();
+	}
 
 	if (gc_write_commit_graph)
 		write_commit_graph_reachable(get_object_directory(), 0,
diff --git a/builtin/repack.c b/builtin/repack.c
index 45583683ee..f9319defe4 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -419,6 +419,8 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (!names.nr && !po_args.quiet)
 		printf("Nothing new to pack.\n");
 
+	close_all_packs(the_repository->objects);
+
 	/*
 	 * Ok we have prepared all new packfiles.
 	 * First see if there are packs of the same name and if so