diff mbox series

prune: quiet ENOENT on missing directories

Message ID 20221119201213.2398081-1-e@80x24.org (mailing list archive)
State Accepted
Commit 69747653523afa3322e0f8dd6a5a7d30184694c3
Headers show
Series prune: quiet ENOENT on missing directories | expand

Commit Message

Eric Wong Nov. 19, 2022, 8:12 p.m. UTC
$GIT_DIR/objects/pack may be removed to save inodes in shared
repositories.  Quiet down prune in cases where either
$GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,
but emit the system error in other cases to help users diagnose
permissions problems or resource constraints.

Signed-off-by: Eric Wong <e@80x24.org>
---
 builtin/prune.c  | 4 +++-
 t/t5304-prune.sh | 8 ++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Nov. 21, 2022, 6:02 a.m. UTC | #1
Eric Wong <e@80x24.org> writes:

> $GIT_DIR/objects/pack may be removed to save inodes in shared
> repositories.  Quiet down prune in cases where either
> $GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,

Wouldn't setup.c::is_git_directory() say "nope, you do not have a
repository there" if you are missing $GIT_DIR/objects?  So I suspect
that the only case this matters in practice is a missing pack/
subdirectory.

I agree that silently ignoring missing objects/pack/ is perfectly
fine, whether we auto-vivify it when we actually create a pack.

> but emit the system error in other cases to help users diagnose
> permissions problems or resource constraints.

OK.

> @@ -127,7 +127,9 @@ static void remove_temporary_files(const char *path)
>  
>  	dir = opendir(path);
>  	if (!dir) {
> -		fprintf(stderr, "Unable to open directory %s\n", path);
> +		if (errno != ENOENT)
> +			fprintf(stderr, "Unable to open directory %s: %s\n",
> +				path, strerror(errno));
>  		return;
>  	}

This is called twice, with $GIT_OBJECT_DIRECTORY and its pack
subdirectory, as it does not recurse.  

This is a tangent, I have to wonder how effective the first call
would be, though.  When writing a loose object file, we compute its
object name first in-core and determine the final filename, create a
temporary file in the same directory as the final file, write into
it and then finally rename the temporary to the final name.  The
fan-out $GIT_OBJECT_DIRECTORY/??/ directories may have temporary
files left when such a process crashed, but do we create cruft "git
prune" should remove in $GIT_OBJECT_DIRECTORY/ itself?

> diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> index 8ae314af58..d65a5f94b4 100755
> --- a/t/t5304-prune.sh
> +++ b/t/t5304-prune.sh
> @@ -29,6 +29,14 @@ test_expect_success setup '
>  	git gc
>  '
>  
> +test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
> +	git clone -q --shared --template= --bare . bare.git &&
> +	rmdir bare.git/objects/pack &&
> +	git --git-dir=bare.git prune --no-progress 2>prune.err &&
> +	test_must_be_empty prune.err &&
> +	rm -r bare.git prune.err
> +'
> +
>  test_expect_success 'prune stale packs' '
>  	orig_pack=$(echo .git/objects/pack/*.pack) &&
>  	>.git/objects/tmp_1.pack &&
Eric Wong Nov. 21, 2022, 10:44 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > $GIT_DIR/objects/pack may be removed to save inodes in shared
> > repositories.  Quiet down prune in cases where either
> > $GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,
> 
> Wouldn't setup.c::is_git_directory() say "nope, you do not have a
> repository there" if you are missing $GIT_DIR/objects?  So I suspect
> that the only case this matters in practice is a missing pack/
> subdirectory.

Right.  Removing $GIT_DIR/objects isn't currently OK, but maybe
someday it could be...  Supporting missing pack/ is the primary
reason for this change, but making a small step towards allowing
objects/-free $GIT_DIR doesn't seem harmful.

> I agree that silently ignoring missing objects/pack/ is perfectly
> fine, whether we auto-vivify it when we actually create a pack.
> 
> > but emit the system error in other cases to help users diagnose
> > permissions problems or resource constraints.
> 
> OK.
> 
> > @@ -127,7 +127,9 @@ static void remove_temporary_files(const char *path)
> >  
> >  	dir = opendir(path);
> >  	if (!dir) {
> > -		fprintf(stderr, "Unable to open directory %s\n", path);
> > +		if (errno != ENOENT)
> > +			fprintf(stderr, "Unable to open directory %s: %s\n",
> > +				path, strerror(errno));
> >  		return;
> >  	}
> 
> This is called twice, with $GIT_OBJECT_DIRECTORY and its pack
> subdirectory, as it does not recurse.  

Right.

> This is a tangent, I have to wonder how effective the first call
> would be, though.  When writing a loose object file, we compute its
> object name first in-core and determine the final filename, create a
> temporary file in the same directory as the final file, write into
> it and then finally rename the temporary to the final name.  The
> fan-out $GIT_OBJECT_DIRECTORY/??/ directories may have temporary
> files left when such a process crashed, but do we create cruft "git
> prune" should remove in $GIT_OBJECT_DIRECTORY/ itself?

Good question, perhaps this could be a followup:

diff --git a/builtin/prune.c b/builtin/prune.c
index 2719220108..041c45ecbe 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -188,7 +188,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 				      prune_cruft, prune_subdir, &revs);
 
 	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
-	remove_temporary_files(get_object_directory());
 	s = mkpathdup("%s/pack", get_object_directory());
 	remove_temporary_files(s);
 	free(s);

OTOH, perhaps there's some 3rd-party tools (e.g. backup tools)
that leave stuff in top-level objects/ and we'd risk breaking
a rare setup via ENOSPC.
Ævar Arnfjörð Bjarmason Nov. 21, 2022, 11:16 a.m. UTC | #3
On Sat, Nov 19 2022, Eric Wong wrote:

> $GIT_DIR/objects/pack may be removed to save inodes in shared
> repositories.  Quiet down prune in cases where either
> $GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,
> but emit the system error in other cases to help users diagnose
> permissions problems or resource constraints.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>  builtin/prune.c  | 4 +++-
>  t/t5304-prune.sh | 8 ++++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index df376b2ed1..2719220108 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -127,7 +127,9 @@ static void remove_temporary_files(const char *path)
>  
>  	dir = opendir(path);
>  	if (!dir) {
> -		fprintf(stderr, "Unable to open directory %s\n", path);
> +		if (errno != ENOENT)
> +			fprintf(stderr, "Unable to open directory %s: %s\n",
> +				path, strerror(errno));

We sometimes use fprintf() instead of "error" or "warning" for output
compatibility with an older version, or because it's written in an old
style.

But as you're changing the anyway let's not re-invent error_errno() or
warning_errno(), but just use those.

We could also s/^Unable/unable/ in the message while at it, per
CodingGuidelines.

>  		return;
>  	}
>  	while ((de = readdir(dir)) != NULL)
> diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> index 8ae314af58..d65a5f94b4 100755
> --- a/t/t5304-prune.sh
> +++ b/t/t5304-prune.sh
> @@ -29,6 +29,14 @@ test_expect_success setup '
>  	git gc
>  '
>  
> +test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
> +	git clone -q --shared --template= --bare . bare.git &&
> +	rmdir bare.git/objects/pack &&
> +	git --git-dir=bare.git prune --no-progress 2>prune.err &&
> +	test_must_be_empty prune.err &&
> +	rm -r bare.git prune.err
> +'
> +
>  test_expect_success 'prune stale packs' '
>  	orig_pack=$(echo .git/objects/pack/*.pack) &&
>  	>.git/objects/tmp_1.pack &&

This seems like a good isolated change, but FWIW I think what we really
should be doing here is using the "report_garbage" facility added in
543c5caa6c9 (count-objects: report garbage files in pack directory too,
2013-02-15) and 478f34d2b6e (gc: remove garbage .idx files from pack
dir, 2015-11-03) for "pack".

I.e. we have already iterated over "pack" and found all the files
therein, and in packfile.c error_errno() etc. That we're
re-opendir()-ing the "pack", walking it again etc. doesn't make much
sense, or does it?

Then the:

	remove_temporary_files(get_object_directory());

Also seems odd, just a few lines above we passed "prune_cruft" to
"for_each_loose_file_in_objdir()", haven't we already walked the loose
object dir & removed temporary cruft there?
Junio C Hamano Nov. 21, 2022, 1:08 p.m. UTC | #4
Eric Wong <e@80x24.org> writes:

> Good question, perhaps this could be a followup:
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index 2719220108..041c45ecbe 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -188,7 +188,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>  				      prune_cruft, prune_subdir, &revs);
>  
>  	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
> -	remove_temporary_files(get_object_directory());
>  	s = mkpathdup("%s/pack", get_object_directory());
>  	remove_temporary_files(s);
>  	free(s);

I actually was hinting at making the remove_temporary_files()
recurse, so that you do not need the separate invocation in pack/
subdirectory.

Or make 256 calls for each of the fan-out subdirectory, in which
case the ENOENT silencing you did would really matter and shine.
Junio C Hamano Nov. 21, 2022, 11:09 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

>>  	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
>> -	remove_temporary_files(get_object_directory());
>>  	s = mkpathdup("%s/pack", get_object_directory());
>>  	remove_temporary_files(s);
>>  	free(s);
>
> I actually was hinting at making the remove_temporary_files()
> recurse, so that you do not need the separate invocation in pack/
> subdirectory.
>
> Or make 256 calls for each of the fan-out subdirectory, in which
> case the ENOENT silencing you did would really matter and shine.

But of course, neither is any part of this topic.  They are possible
follow-on works.

Thanks and sorry for making a confusing statement that could be
mistaken as "let's do this too", which wasn't what I meant.
diff mbox series

Patch

diff --git a/builtin/prune.c b/builtin/prune.c
index df376b2ed1..2719220108 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -127,7 +127,9 @@  static void remove_temporary_files(const char *path)
 
 	dir = opendir(path);
 	if (!dir) {
-		fprintf(stderr, "Unable to open directory %s\n", path);
+		if (errno != ENOENT)
+			fprintf(stderr, "Unable to open directory %s: %s\n",
+				path, strerror(errno));
 		return;
 	}
 	while ((de = readdir(dir)) != NULL)
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 8ae314af58..d65a5f94b4 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -29,6 +29,14 @@  test_expect_success setup '
 	git gc
 '
 
+test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
+	git clone -q --shared --template= --bare . bare.git &&
+	rmdir bare.git/objects/pack &&
+	git --git-dir=bare.git prune --no-progress 2>prune.err &&
+	test_must_be_empty prune.err &&
+	rm -r bare.git prune.err
+'
+
 test_expect_success 'prune stale packs' '
 	orig_pack=$(echo .git/objects/pack/*.pack) &&
 	>.git/objects/tmp_1.pack &&