diff mbox series

refs: cleanup directories when deleting packed ref

Message ID YJVQpaDwkQH/aCee@mini.wfchandler.org (mailing list archive)
State New, archived
Headers show
Series refs: cleanup directories when deleting packed ref | expand

Commit Message

Will Chandler May 7, 2021, 2:37 p.m. UTC
When deleting a packed ref, a lockfile is made in the directory that
would contain the loose copy of that ref, creating any directories in
the ref's path that do not exist. When the transaction completes, the
lockfile is deleted, but any empty parent directories made when creating
the lockfile are left in place.  These empty directories are not removed
by 'pack-refs' or other housekeeping tasks and will accumulate over
time.

When deleting a loose ref, we remove all empty parent directories at the
end of the transaction.

This commit applies the parent directory cleanup logic used when
deleting loose refs to packed refs as well.

Signed-off-by: Will Chandler <wfc@wfchandler.org>
---
 refs/files-backend.c  | 12 ++++++------
 t/t1400-update-ref.sh |  9 +++++++++
 2 files changed, 15 insertions(+), 6 deletions(-)

Comments

Jeff King May 7, 2021, 9:56 p.m. UTC | #1
On Fri, May 07, 2021 at 10:37:25AM -0400, Will Chandler wrote:

> When deleting a packed ref, a lockfile is made in the directory that
> would contain the loose copy of that ref, creating any directories in
> the ref's path that do not exist. When the transaction completes, the
> lockfile is deleted, but any empty parent directories made when creating
> the lockfile are left in place.  These empty directories are not removed
> by 'pack-refs' or other housekeeping tasks and will accumulate over
> time.
> 
> When deleting a loose ref, we remove all empty parent directories at the
> end of the transaction.
> 
> This commit applies the parent directory cleanup logic used when
> deleting loose refs to packed refs as well.

Hmm. I can certainly believe that such a bug exists, but...

> +test_expect_success 'directory not created deleting packed ref' '
> +	git branch d1/d2/r1 HEAD &&
> +	git pack-refs --all &&
> +	test_path_is_missing .git/refs/heads/d1/d2 &&
> +	git branch -d d1/d2/r1 &&
> +	test_path_is_missing .git/refs/heads/d1/d2 &&
> +	test_path_is_missing .git/refs/heads/d1
> +'

...this test passes even without your patch applied. I wonder if there's
something else required to trigger the problem.

-Peff
Jeff King May 7, 2021, 10:02 p.m. UTC | #2
On Fri, May 07, 2021 at 05:56:47PM -0400, Jeff King wrote:

> > +test_expect_success 'directory not created deleting packed ref' '
> > +	git branch d1/d2/r1 HEAD &&
> > +	git pack-refs --all &&
> > +	test_path_is_missing .git/refs/heads/d1/d2 &&
> > +	git branch -d d1/d2/r1 &&
> > +	test_path_is_missing .git/refs/heads/d1/d2 &&
> > +	test_path_is_missing .git/refs/heads/d1
> > +'
> 
> ...this test passes even without your patch applied. I wonder if there's
> something else required to trigger the problem.

If I replace "git branch -d" with "git update-ref -d", then the problem
does trigger (and your patch does indeed clear it up). I wonder what the
difference is.

-Peff
Jeff King May 7, 2021, 10:57 p.m. UTC | #3
On Fri, May 07, 2021 at 06:02:17PM -0400, Jeff King wrote:

> On Fri, May 07, 2021 at 05:56:47PM -0400, Jeff King wrote:
> 
> > > +test_expect_success 'directory not created deleting packed ref' '
> > > +	git branch d1/d2/r1 HEAD &&
> > > +	git pack-refs --all &&
> > > +	test_path_is_missing .git/refs/heads/d1/d2 &&
> > > +	git branch -d d1/d2/r1 &&
> > > +	test_path_is_missing .git/refs/heads/d1/d2 &&
> > > +	test_path_is_missing .git/refs/heads/d1
> > > +'
> > 
> > ...this test passes even without your patch applied. I wonder if there's
> > something else required to trigger the problem.
> 
> If I replace "git branch -d" with "git update-ref -d", then the problem
> does trigger (and your patch does indeed clear it up). I wonder what the
> difference is.

I think this comes down to the interfaces. In update-ref, we call
delete_ref(), which creates a transaction to delete the single ref. It
realizes the ref is packed and there is no loose file to delete.

Whereas in git-branch, call the plural delete_refs(), which handles the
packed and loose stores separately. It first deletes everything from the
packed ref store in one go, and then the loose store. And it's actually
the deletion from the loose store which gets weird. The ref isn't
_anywhere_ at this point. So when we try to read it, we don't set the
REF_ISPACKED flag. And thus when it comes time to clean up the loose
ref, we say "not packed, so I guess it's worth calling unlink()". Of
course that syscall fails, but our unlink_or_msg() wrapper turns ENOENT
into success (which is sensible; we want it to be gone, and it is).

And so we think we've deleted a loose ref, and thus call
try_remove_empty_parents(), which cleans up the extra directory.

So I'd argue that it's actually delete_refs() which is called from
git-branch that is a little confused, or possibly even buggy. But I
don't think it's hurting anything, and working around it would probably
be awkward and/or inefficient.

Getting back to your patch, though, you are definitely fixing a problem
with update-ref (which correctly realizes there is no loose ref to clean
up, but forgets that we had to make a lockfile). And the solution you
have looks correct. I think you just need to update the test to exercise
it with "update-ref -d".

-Peff
Will Chandler May 8, 2021, 4:27 a.m. UTC | #4
On Fri, May 07, 2021 at 06:57:39PM -0400, Jeff King wrote:
> Getting back to your patch, though, you are definitely fixing a problem
> with update-ref (which correctly realizes there is no loose ref to clean
> up, but forgets that we had to make a lockfile). And the solution you
> have looks correct. I think you just need to update the test to exercise
> it with "update-ref -d".

Thanks for the thorough analysis! Apologies for the confusion, I really
appreciate your patience.

I'll get the test fixed and note that this is specific to 'update-ref'.
Will Chandler May 8, 2021, 5 a.m. UTC | #5
Subject: [PATCH v2] refs: cleanup directories when deleting packed ref

When deleting a packed ref via 'update-ref -d', a lockfile is made in
the directory that would contain the loose copy of that ref, creating
any directories in the ref's path that do not exist. When the
transaction completes, the lockfile is deleted, but any empty parent
directories made when creating the lockfile are left in place.  These
empty directories are not removed by 'pack-refs' or other housekeeping
tasks and will accumulate over time.

When deleting a loose ref, we remove all empty parent directories at the
end of the transaction.

This commit applies the parent directory cleanup logic used when
deleting loose refs to packed refs as well.

Signed-off-by: Will Chandler <wfc@wfchandler.org>
---
 refs/files-backend.c  | 12 ++++++------
 t/t1400-update-ref.sh |  9 +++++++++
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 119972ee16..49e6ee069a 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -45,10 +45,10 @@
 #define REF_UPDATE_VIA_HEAD (1 << 8)
 
 /*
- * Used as a flag in ref_update::flags when the loose reference has
- * been deleted.
+ * Used as a flag in ref_update::flags when a reference has been
+ * deleted and the ref's parent directories may need cleanup.
  */
-#define REF_DELETED_LOOSE (1 << 9)
+#define REF_DELETED_RMDIR (1 << 9)
 
 struct ref_lock {
 	char *ref_name;
@@ -2852,6 +2852,7 @@ static int files_transaction_finish(struct ref_store *ref_store,
 
 		if (update->flags & REF_DELETING &&
 		    !(update->flags & REF_LOG_ONLY)) {
+			update->flags |= REF_DELETED_RMDIR;
 			if (!(update->type & REF_ISPACKED) ||
 			    update->type & REF_ISSYMREF) {
 				/* It is a loose reference. */
@@ -2861,7 +2862,6 @@ static int files_transaction_finish(struct ref_store *ref_store,
 					ret = TRANSACTION_GENERIC_ERROR;
 					goto cleanup;
 				}
-				update->flags |= REF_DELETED_LOOSE;
 			}
 		}
 	}
@@ -2874,9 +2874,9 @@ static int files_transaction_finish(struct ref_store *ref_store,
 	for (i = 0; i < transaction->nr; i++) {
 		struct ref_update *update = transaction->updates[i];
 
-		if (update->flags & REF_DELETED_LOOSE) {
+		if (update->flags & REF_DELETED_RMDIR) {
 			/*
-			 * The loose reference was deleted. Delete any
+			 * The reference was deleted. Delete any
 			 * empty parent directories. (Note that this
 			 * can only work because we have already
 			 * removed the lockfile.)
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index e31f65f381..4506cd435b 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -1598,4 +1598,13 @@ test_expect_success 'transaction cannot restart ongoing transaction' '
 	test_must_fail git show-ref --verify refs/heads/restart
 '
 
+test_expect_success 'directory not created deleting packed ref' '
+	git branch d1/d2/r1 HEAD &&
+	git pack-refs --all &&
+	test_path_is_missing .git/refs/heads/d1/d2 &&
+	git update-ref -d refs/heads/d1/d2/r1 &&
+	test_path_is_missing .git/refs/heads/d1/d2 &&
+	test_path_is_missing .git/refs/heads/d1
+'
+
 test_done
Bagas Sanjaya May 8, 2021, 5:21 a.m. UTC | #6
On 08/05/21 12.00, Will Chandler wrote:
> Subject: [PATCH v2] refs: cleanup directories when deleting packed ref
> 
> When deleting a packed ref via 'update-ref -d', a lockfile is made in
> the directory that would contain the loose copy of that ref, creating
> any directories in the ref's path that do not exist. When the
> transaction completes, the lockfile is deleted, but any empty parent
> directories made when creating the lockfile are left in place.  These
> empty directories are not removed by 'pack-refs' or other housekeeping
> tasks and will accumulate over time.
> 
> When deleting a loose ref, we remove all empty parent directories at the
> end of the transaction.
> 
> This commit applies the parent directory cleanup logic used when
> deleting loose refs to packed refs as well.
> 
> Signed-off-by: Will Chandler <wfc@wfchandler.org>
> ---
>   refs/files-backend.c  | 12 ++++++------
>   t/t1400-update-ref.sh |  9 +++++++++
>   2 files changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 119972ee16..49e6ee069a 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -45,10 +45,10 @@
>   #define REF_UPDATE_VIA_HEAD (1 << 8)
>   
>   /*
> - * Used as a flag in ref_update::flags when the loose reference has
> - * been deleted.
> + * Used as a flag in ref_update::flags when a reference has been
> + * deleted and the ref's parent directories may need cleanup.
>    */
> -#define REF_DELETED_LOOSE (1 << 9)
> +#define REF_DELETED_RMDIR (1 << 9)
>   
>   struct ref_lock {
>   	char *ref_name;
> @@ -2852,6 +2852,7 @@ static int files_transaction_finish(struct ref_store *ref_store,
>   
>   		if (update->flags & REF_DELETING &&
>   		    !(update->flags & REF_LOG_ONLY)) {
> +			update->flags |= REF_DELETED_RMDIR;
>   			if (!(update->type & REF_ISPACKED) ||
>   			    update->type & REF_ISSYMREF) {
>   				/* It is a loose reference. */
> @@ -2861,7 +2862,6 @@ static int files_transaction_finish(struct ref_store *ref_store,
>   					ret = TRANSACTION_GENERIC_ERROR;
>   					goto cleanup;
>   				}
> -				update->flags |= REF_DELETED_LOOSE;
>   			}
>   		}
>   	}
> @@ -2874,9 +2874,9 @@ static int files_transaction_finish(struct ref_store *ref_store,
>   	for (i = 0; i < transaction->nr; i++) {
>   		struct ref_update *update = transaction->updates[i];
>   
> -		if (update->flags & REF_DELETED_LOOSE) {
> +		if (update->flags & REF_DELETED_RMDIR) {
>   			/*
> -			 * The loose reference was deleted. Delete any
> +			 * The reference was deleted. Delete any
>   			 * empty parent directories. (Note that this
>   			 * can only work because we have already
>   			 * removed the lockfile.)
> diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
> index e31f65f381..4506cd435b 100755
> --- a/t/t1400-update-ref.sh
> +++ b/t/t1400-update-ref.sh
> @@ -1598,4 +1598,13 @@ test_expect_success 'transaction cannot restart ongoing transaction' '
>   	test_must_fail git show-ref --verify refs/heads/restart
>   '
>   
> +test_expect_success 'directory not created deleting packed ref' '
> +	git branch d1/d2/r1 HEAD &&
> +	git pack-refs --all &&
> +	test_path_is_missing .git/refs/heads/d1/d2 &&
> +	git update-ref -d refs/heads/d1/d2/r1 &&
> +	test_path_is_missing .git/refs/heads/d1/d2 &&
> +	test_path_is_missing .git/refs/heads/d1
> +'
> +
>   test_done
>

I ask to you: why did you send v2 patch as reply to v1?

Supposed that I interested to apply only this v2, instead of v1.
With this situation, I downloaded mbox for v1, which contains v2
patch as reply to v1. And git-am would instead apply v1 instead.

So why not send this v2 as separate message-id?
Junio C Hamano May 8, 2021, 6:24 a.m. UTC | #7
Bagas Sanjaya <bagasdotme@gmail.com> writes:

> I ask to you: why did you send v2 patch as reply to v1?

Because it is a natural progression of the ongoing discussion, I
would presume?

> Supposed that I interested to apply only this v2, instead of v1.
> With this situation, I downloaded mbox for v1, which contains v2
> patch as reply to v1. And git-am would instead apply v1 instead.

If you are interested in v2, fetch individual pieces of e-mail for
v2 and not v1 (in this case, a single message that contains v2
only).  If you feed v1 to "am", of course it would happily apply
it.  Or if you grab the whole discussion in a mbox, of couse you'd
need to pick the message you would want to use (e.g. "mailx -f mbox"
and choose individual messages and then tell it to "s"ave).

> So why not send this v2 as separate message-id?

I sense you are somehow upset or frustrated, but I do not see a need
to behave so rudely to your fellow community member(s).

By the way, kostantin, "b4 am" seems to mishandle this thread.  As I
understand it, it is supposed to grab the latest and greatest
iteration given any message in the discussion thread, right?  In
this thread, we discuss a single-patch "series" but with two
iterations.  The original submission is followed by some discussion
messages, that is followed by a v2 patch, and then a few more
non-patch messaages.

Between these two invocations

$ b4 am YJVQpaDwkQH/aCee@mini.wfchandler.org
$ b4 am YJYa+7yUyt2YD16r@mini.wfchandler.org

where the first one is the the original, and the second one is the
improved v2 patch, both requests end up fetching the original one.

One thing curious about the behaviour (see the transcript at the
end) is that it gives this "NOTE" about "trailers", which hints me
that this vintage of "b4" may not understand the in-body header like
"git am" and "git mailinfo" do?  The v2 patch begins with a Subject:
in-body header to retitle the patch.

FWIW, I am using prepackaged /usr/bin/b4 at a work machine, which
identifies itself as

$ b4 --version
0.6.2

Thanks.


---------------------- >8 ----------------------

$ b4 am YJVQpaDwkQH/aCee@mini.wfchandler.org
Looking up https://lore.kernel.org/r/YJVQpaDwkQH%2FaCee%40mini.wfchandler.org
Grabbing thread from lore.kernel.org/git
Analyzing 7 messages in the thread
---
Writing ./20210507_wfc_refs_cleanup_directories_when_deleting_packed_ref.mbx
  ✓ [PATCH] refs: cleanup directories when deleting packed ref
  ---
  ✓ Attestation-by: DKIM/wfchandler.org (From: wfc@wfchandler.org)
---
Total patches: 1
---
NOTE: some trailers ignored due to from/email mismatches:
    ! Trailer: Subject: [PATCH v2] refs: cleanup directories when deleting packed ref
     Msg From: Will Chandler <wfc@wfchandler.org>
NOTE: Rerun with -S to apply them anyway
---
 Link: https://lore.kernel.org/r/YJVQpaDwkQH/aCee@mini.wfchandler.org
 Base: not found (applies clean to current tree)
       git am ./20210507_wfc_refs_cleanup_directories_when_deleting_packed_ref.mbx
Will Chandler May 9, 2021, 6:45 p.m. UTC | #8
Thank you, Junio. Would it be helpful if I sent a separate '[PATCH v2]'
email?

Apologies for not following the correct procedure for the revised patch.
I was relying on 'SubmittingPatches', but I've since found the detailed
instructions on submitting revisions in 'MyFirstContribution.txt'.
Junio C Hamano May 10, 2021, 1:15 a.m. UTC | #9
Will Chandler <wfc@wfchandler.org> writes:

> Thank you, Junio. Would it be helpful if I sent a separate '[PATCH v2]'
> email?
>
> Apologies for not following the correct procedure for the revised patch.
> I was relying on 'SubmittingPatches', but I've since found the detailed
> instructions on submitting revisions in 'MyFirstContribution.txt'.

I am not sure if there is anything to apologize for on your part.
The procedures we use have grown over time, and there certainly
would be documentation gaps.

I think the best current practice is

1. In a thread that originates at a non-patch message, or in a
   review discussion thread for a patch, it is welcome to use "How
   about doing it this way?" patches as an illustration to explain
   your idea in a more concrete way than just in prose.  But it is
   unwelcome to leave the patch buried in the discussion, without
   making it easier to find it (see 3.)

2. In such a thread, "By the way, I thought of this unrelated
   tangent, and here is a patch to demonstrate the idea" is not
   entirely unwelcome, but keep it brief and make sure you get out
   of the thread quickly to avoid distracting the main discussion.

3. In either case, it makes it easier to find the final submission
   of the patch if it is not buried deep in the discussion.

   a. It is OK to start a new thread (without in-reply-to), with
      "here is a polished version of the patch and/or the idea I
      floated in <message-id>" under the three-dash line (for a
      single patch) or in the cover letter (for a series).

   b. An updated iteration of a multi-patch series sometimes is made
      as a direct response to the cover letter of the previous
      iteration (iow, the cover letter for vN+1 has the message id
      of the cover letter for vN on its in-reply-to header).

I think the "b4" tool prefers 3b. over 3a., and it may be also easy
on human readers who read the list in threaded mail/news reader.

Thanks.
Jeff King May 11, 2021, 1:35 a.m. UTC | #10
On Sat, May 08, 2021 at 01:00:43AM -0400, Will Chandler wrote:

> Subject: [PATCH v2] refs: cleanup directories when deleting packed ref
> 
> When deleting a packed ref via 'update-ref -d', a lockfile is made in
> the directory that would contain the loose copy of that ref, creating
> any directories in the ref's path that do not exist. When the
> transaction completes, the lockfile is deleted, but any empty parent
> directories made when creating the lockfile are left in place.  These
> empty directories are not removed by 'pack-refs' or other housekeeping
> tasks and will accumulate over time.
> 
> When deleting a loose ref, we remove all empty parent directories at the
> end of the transaction.
> 
> This commit applies the parent directory cleanup logic used when
> deleting loose refs to packed refs as well.

This thread got off on a tangent about threads, but to get back to the
main idea: this v2 patch looks good to me. Thank you very much for
finding and fixing it.

Regarding threads, I agree with everything Junio already said. My only
suggestion would be to make sure the "Subject" above becomes the actual
email subject, rather than an in-body header (that may be what confused
the b4 tool, but more importantly, threaded readers like mutt will show
the changed subject in the thread index, making it very clear at a
glance that there is a v2 and not just more discussion).

-Peff
Junio C Hamano May 11, 2021, 4:58 a.m. UTC | #11
Jeff King <peff@peff.net> writes:

> This thread got off on a tangent about threads, but to get back to the
> main idea: this v2 patch looks good to me. Thank you very much for
> finding and fixing it.

Thanks.  But now we have a harder time to find that v2 patch X-<.
diff mbox series

Patch

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 119972ee16..49e6ee069a 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -45,10 +45,10 @@ 
 #define REF_UPDATE_VIA_HEAD (1 << 8)
 
 /*
- * Used as a flag in ref_update::flags when the loose reference has
- * been deleted.
+ * Used as a flag in ref_update::flags when a reference has been
+ * deleted and the ref's parent directories may need cleanup.
  */
-#define REF_DELETED_LOOSE (1 << 9)
+#define REF_DELETED_RMDIR (1 << 9)
 
 struct ref_lock {
 	char *ref_name;
@@ -2852,6 +2852,7 @@  static int files_transaction_finish(struct ref_store *ref_store,
 
 		if (update->flags & REF_DELETING &&
 		    !(update->flags & REF_LOG_ONLY)) {
+			update->flags |= REF_DELETED_RMDIR;
 			if (!(update->type & REF_ISPACKED) ||
 			    update->type & REF_ISSYMREF) {
 				/* It is a loose reference. */
@@ -2861,7 +2862,6 @@  static int files_transaction_finish(struct ref_store *ref_store,
 					ret = TRANSACTION_GENERIC_ERROR;
 					goto cleanup;
 				}
-				update->flags |= REF_DELETED_LOOSE;
 			}
 		}
 	}
@@ -2874,9 +2874,9 @@  static int files_transaction_finish(struct ref_store *ref_store,
 	for (i = 0; i < transaction->nr; i++) {
 		struct ref_update *update = transaction->updates[i];
 
-		if (update->flags & REF_DELETED_LOOSE) {
+		if (update->flags & REF_DELETED_RMDIR) {
 			/*
-			 * The loose reference was deleted. Delete any
+			 * The reference was deleted. Delete any
 			 * empty parent directories. (Note that this
 			 * can only work because we have already
 			 * removed the lockfile.)
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index e31f65f381..adae9ef91f 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -1598,4 +1598,13 @@  test_expect_success 'transaction cannot restart ongoing transaction' '
 	test_must_fail git show-ref --verify refs/heads/restart
 '
 
+test_expect_success 'directory not created deleting packed ref' '
+	git branch d1/d2/r1 HEAD &&
+	git pack-refs --all &&
+	test_path_is_missing .git/refs/heads/d1/d2 &&
+	git branch -d d1/d2/r1 &&
+	test_path_is_missing .git/refs/heads/d1/d2 &&
+	test_path_is_missing .git/refs/heads/d1
+'
+
 test_done