diff mbox series

[2/2] Make "git branch -d" prune missing worktrees automatically.

Message ID 20191017162826.1064257-2-pjones@redhat.com (mailing list archive)
State New, archived
Headers show
Series [1/2] Make die_if_checked_out() ignore missing worktree checkouts. | expand

Commit Message

Peter Jones Oct. 17, 2019, 4:28 p.m. UTC
Currently, if you do:

$ git branch zonk origin/master
$ git worktree add zonk zonk
$ rm -rf zonk
$ git branch -d zonk

You get the following error:

$ git branch -d zonk
error: Cannot delete branch 'zonk' checked out at '/home/pjones/devel/kernel.org/git/zonk'

It isn't meaningfully checked out, the repo's data is just stale and no
longer reflects reality.

This makes it so that if nothing is present where a worktree is
supposedly checked out, deleting the branch will automatically prune it.

Signed-off-by: Peter Jones <pjones@redhat.com>
---
 builtin/branch.c   |  2 +-
 builtin/worktree.c | 14 ++++++++++++++
 worktree.h         |  6 ++++++
 3 files changed, 21 insertions(+), 1 deletion(-)

Comments

Eric Sunshine Oct. 17, 2019, 5:28 p.m. UTC | #1
On Thu, Oct 17, 2019 at 12:28 PM Peter Jones <pjones@redhat.com> wrote:
> Currently, if you do:
>
> $ git branch zonk origin/master
> $ git worktree add zonk zonk
> $ rm -rf zonk
> $ git branch -d zonk
>
> You get the following error:
>
> $ git branch -d zonk
> error: Cannot delete branch 'zonk' checked out at '/home/pjones/devel/kernel.org/git/zonk'
>
> It isn't meaningfully checked out, the repo's data is just stale and no
> longer reflects reality.

Echoing SEZDER's comment on patch 1/2, this behavior is an intentional
design choice and safety feature of the worktree implementation since
worktrees may exist on removable media or remote filesystems which
might not always be mounted; hence, the presence of commands "git
worktree prune" and "git worktree remove".

A couple comment regarding this patch...

> Signed-off-by: Peter Jones <pjones@redhat.com>
> ---
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> @@ -133,6 +133,20 @@ static int prune_worktree(const char *id, struct strbuf *reason)
> +int prune_worktree_if_missing(const struct worktree *wt)
> +{
> +       struct strbuf reason = STRBUF_INIT;
> +
> +       if (access(wt->path, F_OK) >= 0 ||
> +           (errno != ENOENT && errno == ENOTDIR)) {
> +               errno = EEXIST;
> +               return -1;
> +       }
> +
> +       strbuf_addf(&reason, _("Removing worktrees/%s: worktree directory is not present"), wt->id);
> +       return prune_worktree(wt->id, &reason);
> +}

"git worktree" tries to clean up after itself as much as possible. For
instance, it is careful to remove the .git/worktrees directory when
the last worktree itself is removed (or pruned). So, the caller of
this function would also want to call delete_worktrees_dir_if_empty()
to follow suit.

> diff --git a/worktree.h b/worktree.h
> @@ -132,4 +132,10 @@ void strbuf_worktree_ref(const struct worktree *wt,
> +/*
> + * Prune a worktree if it is no longer present at the checked out location.
> + * Returns < 0 if the checkout is there or if pruning fails.
> + */
> +int prune_worktree_if_missing(const struct worktree *wt);

It's rather ugly that this function is declared in top-level
worktree.h whereas the actual implementation is in builtin/worktree.c.
I'd expect to see a preparatory patch which moves prune_worktree()
(and probably delete_worktrees_dir_if_empty()) to top-level
worktree.c.

These minor implementation comments aside, before considering this
patch series, it would be nice to see a compelling argument as to why
this change of behavior, which undercuts a deliberate design decision,
is really desirable.
Peter Jones Oct. 18, 2019, 7:43 p.m. UTC | #2
On Thu, Oct 17, 2019 at 06:44:26PM +0200, SZEDER Gábor wrote:
> >  	if (!wt || (ignore_current_worktree && wt->is_current))
> >  		return;
> > +	if (access(wt->path, F_OK) < 0 &&
> > +	    (errno == ENOENT || errno == ENOTDIR))
> > +		return;
> 
> I think this check is insuffient: even if the directory of the working
> tree is not present, the working tree might still exist, and should
> not be ignored (or deleted/pruned in the second patch).
> 
> See the description of 'git worktree lock' for details.

Ah, thanks for that, I had not realized "lock" was relevant here as I
have never used it.  That explains some of what seemed to me like a very
strange usage model.

On Thu, Oct 17, 2019 at 01:28:09PM -0400, Eric Sunshine wrote:
> Echoing SEZDER's comment on patch 1/2, this behavior is an intentional
> design choice and safety feature of the worktree implementation since
> worktrees may exist on removable media or remote filesystems which
> might not always be mounted; hence, the presence of commands "git
> worktree prune" and "git worktree remove".

Okay, I see that use case now - I hadn't realized there was an
intentional design decision here, and honestly that's anything but clear
from the *code*.  It's surprising, for example, that my patches didn't
break a single test case.

> A couple comment regarding this patch...

Sure...

> > diff --git a/builtin/worktree.c b/builtin/worktree.c
> > @@ -133,6 +133,20 @@ static int prune_worktree(const char *id, struct strbuf *reason)
> > +int prune_worktree_if_missing(const struct worktree *wt)
> > +{
> > +       struct strbuf reason = STRBUF_INIT;
> > +
> > +       if (access(wt->path, F_OK) >= 0 ||
> > +           (errno != ENOENT && errno == ENOTDIR)) {
> > +               errno = EEXIST;
> > +               return -1;
> > +       }
> > +
> > +       strbuf_addf(&reason, _("Removing worktrees/%s: worktree directory is not present"), wt->id);
> > +       return prune_worktree(wt->id, &reason);
> > +}
> 
> "git worktree" tries to clean up after itself as much as possible. For
> instance, it is careful to remove the .git/worktrees directory when
> the last worktree itself is removed (or pruned). So, the caller of
> this function would also want to call delete_worktrees_dir_if_empty()
> to follow suit.

Okay, will fix.

> > diff --git a/worktree.h b/worktree.h
> > @@ -132,4 +132,10 @@ void strbuf_worktree_ref(const struct worktree *wt,
> > +/*
> > + * Prune a worktree if it is no longer present at the checked out location.
> > + * Returns < 0 if the checkout is there or if pruning fails.
> > + */
> > +int prune_worktree_if_missing(const struct worktree *wt);
> 
> It's rather ugly that this function is declared in top-level
> worktree.h whereas the actual implementation is in builtin/worktree.c.

I don't disagree, but I didn't want to move stuff into an exposed API if
I didn't have to, and that seemed like an appropriate enough header.  I
can do it the other way though, no problem.

> I'd expect to see a preparatory patch which moves prune_worktree()
> (and probably delete_worktrees_dir_if_empty()) to top-level
> worktree.c.

Sure thing.

> These minor implementation comments aside, before considering this
> patch series, it would be nice to see a compelling argument as to why
> this change of behavior, which undercuts a deliberate design decision,
> is really desirable.

Okay, so just for clarity, when you say there's a deliberate design
decision, which behavior here are you talking about?  If you mean making
"lock" work, I don't have any issue with that.  If you mean not cleaning
up when we do other commands, then I don't see why that's a concern -
after all, that's exactly what "lock" is for.

Assuming it is the "lock" behavior we're talking about, I don't think I
actually have any intention of breaking this design decision, just
making my workflow (without "lock") nag at me less for what seem like
pretty trivial issues.

I can easily accommodate "git worktree lock".  What bugs me though, is
that using worktrees basically means I have to replace fairly regular
filesystem activities with worktree commands, and it doesn't seem to be
*necessary* in any way.  And I'm going to forget.  A lot.

To me, there doesn't seem to be any reason these need to behave any different:

$ git worktree add foo foo
$ rm -rf foo
vs
$ git worktree add foo foo
$ git worktree remove foo

And in fact the only difference right now, aside from some very
minuscule storage requirements that haven't gotten cleaned up, is the
first one leaves an artifact that tells it to give me errors later until
I run "git worktree prune" myself.  

I'll send another revision of this patchset as a reply to this mail,
which should clear up some of our differences.
Eric Sunshine Nov. 8, 2019, 10:14 a.m. UTC | #3
[cc:+duy]

On Fri, Oct 18, 2019 at 3:43 PM Peter Jones <pjones@redhat.com> wrote:
> On Thu, Oct 17, 2019 at 01:28:09PM -0400, Eric Sunshine wrote:
> > Echoing SEZDER's comment on patch 1/2, this behavior is an intentional
> > design choice and safety feature of the worktree implementation since
> > worktrees may exist on removable media or remote filesystems which
> > might not always be mounted; hence, the presence of commands "git
> > worktree prune" and "git worktree remove".
>
> Okay, I see that use case now - I hadn't realized there was an
> intentional design decision here, and honestly that's anything but clear
> from the *code*.

It can indeed sometimes be difficult to get a high-level functional
overview by examining code in isolation. In this case, at least,
git-worktree documentation tries to be clear about the "why" and "how"
of the pruning behavior (which is not to say that the documentation --
or the code -- can't be improved to communicate this better).

> It's surprising, for example, that my patches didn't break a single
> test case.

Tests suites are never perfect, and an attempt to prune a dangling
worktree by deleting a branch likely never occurred to the
git-worktree implementer(s).

> > These minor implementation comments aside, before considering this
> > patch series, it would be nice to see a compelling argument as to why
> > this change of behavior, which undercuts a deliberate design decision,
> > is really desirable.
>
> Okay, so just for clarity, when you say there's a deliberate design
> decision, which behavior here are you talking about? If you mean making
> "lock" work, I don't have any issue with that. If you mean not cleaning
> up when we do other commands, then I don't see why that's a concern -
> after all, that's exactly what "lock" is for.

To clarify, I'm talking about Duy's deliberate design decision to
model git-worktree auto-pruning after Git's own garbage-collection
behavior. That model includes, not only explicit locking, but a grace
period before dangling worktree administrative files can be pruned
automatically (see the gc.worktreePruneExpire configuration).

The point of git-worktree's grace period (just like git-gc's grace
period) is to avoid deleting potentially precious information
permanently. For instance, the worktree-local "index" file might have
some changes staged but not yet committed. Under the existing model,
those staged changes are immune from being accidentally deleted
permanently until after the grace period expires or until they are
thrown away deliberately (say, via "git worktree prune --expire=now").

> Assuming it is the "lock" behavior we're talking about, I don't think I
> actually have any intention of breaking this design decision, just
> making my workflow (without "lock") nag at me less for what seem like
> pretty trivial issues.

The ability to lock a worktree is an extra safety measure built atop
the grace period mechanism to provide a way to completely override
auto-pruning; it is not meant as an alternate or replacement safety
mechanism to the grace period, but instead augments it. So, a behavior
change which respects only one of those safety mechanisms but not the
other is likely flawed.

And, importantly, people may already be relying upon this behavior of
having an automatic grace period -- without having to place a worktree
lock manually -- so changing behavior arbitrarily could break existing
workflows and result in data loss.

> I can easily accommodate "git worktree lock". What bugs me though, is
> that using worktrees basically means I have to replace fairly regular
> filesystem activities with worktree commands, and it doesn't seem to be
> *necessary* in any way. And I'm going to forget. A lot.
>
> To me, there doesn't seem to be any reason these need to behave any different:
>
> $ git worktree add foo foo
> $ rm -rf foo
> vs
> $ git worktree add foo foo
> $ git worktree remove foo
>
> And in fact the only difference right now, aside from some very
> minuscule storage requirements that haven't gotten cleaned up, is the
> first one leaves an artifact that tells it to give me errors later until
> I run "git worktree prune" myself.

I understand the pain point, but I also understand Duy's motivation
for being very careful about pruning worktree administrative files
automatically (so as to avoid data loss, such as changes already
staged to a worktree-local "index" file). While the proposed change
may address the pain point, it nevertheless creates the possibility of
accidental loss which Duy was careful to avoid when designing worktree
mechanics. Although annoying, the current behavior gives you the
opportunity to avoid that accidental loss by forcing you to take
deliberate action to remove the worktree administrative files.

Perhaps there is some way to address the pain point without breaking
the fundamental promise made by git-worktree about being careful with
worktree metadata[*], but the changes proposed by this patch series
seem insufficient (even if the patch is reworked to respect worktree
locking). I've cc:'d Duy in case he wants to chime in.

[*] For instance, perhaps before auto-pruning, it could check whether
the index is recording staged changes or conflict information, and
only allow auto-pruning if the index is clean. *But* there may be
other ways for information to be lost permanently (beyond a dirty
"index") which don't occur to me at present, so this has to be
considered carefully.
Phillip Wood Nov. 8, 2019, 2:56 p.m. UTC | #4
On 08/11/2019 10:14, Eric Sunshine wrote:
> [cc:+duy]
> 
> On Fri, Oct 18, 2019 at 3:43 PM Peter Jones <pjones@redhat.com> wrote:
>> On Thu, Oct 17, 2019 at 01:28:09PM -0400, Eric Sunshine wrote:
>>> Echoing SEZDER's comment on patch 1/2, this behavior is an intentional
>>> design choice and safety feature of the worktree implementation since
>>> worktrees may exist on removable media or remote filesystems which
>>> might not always be mounted; hence, the presence of commands "git
>>> worktree prune" and "git worktree remove".
>>
>> Okay, I see that use case now - I hadn't realized there was an
>> intentional design decision here, and honestly that's anything but clear
>> from the *code*.
> 
> It can indeed sometimes be difficult to get a high-level functional
> overview by examining code in isolation. In this case, at least,
> git-worktree documentation tries to be clear about the "why" and "how"
> of the pruning behavior (which is not to say that the documentation --
> or the code -- can't be improved to communicate this better).
> 
>> It's surprising, for example, that my patches didn't break a single
>> test case.
> 
> Tests suites are never perfect, and an attempt to prune a dangling
> worktree by deleting a branch likely never occurred to the
> git-worktree implementer(s).
> 
>>> These minor implementation comments aside, before considering this
>>> patch series, it would be nice to see a compelling argument as to why
>>> this change of behavior, which undercuts a deliberate design decision,
>>> is really desirable.
>>
>> Okay, so just for clarity, when you say there's a deliberate design
>> decision, which behavior here are you talking about? If you mean making
>> "lock" work, I don't have any issue with that. If you mean not cleaning
>> up when we do other commands, then I don't see why that's a concern -
>> after all, that's exactly what "lock" is for.
> 
> To clarify, I'm talking about Duy's deliberate design decision to
> model git-worktree auto-pruning after Git's own garbage-collection
> behavior. That model includes, not only explicit locking, but a grace
> period before dangling worktree administrative files can be pruned
> automatically (see the gc.worktreePruneExpire configuration).
> 
> The point of git-worktree's grace period (just like git-gc's grace
> period) is to avoid deleting potentially precious information
> permanently. For instance, the worktree-local "index" file might have
> some changes staged but not yet committed. Under the existing model,
> those staged changes are immune from being accidentally deleted
> permanently until after the grace period expires or until they are
> thrown away deliberately (say, via "git worktree prune --expire=now").
> 
>> Assuming it is the "lock" behavior we're talking about, I don't think I
>> actually have any intention of breaking this design decision, just
>> making my workflow (without "lock") nag at me less for what seem like
>> pretty trivial issues.
> 
> The ability to lock a worktree is an extra safety measure built atop
> the grace period mechanism to provide a way to completely override
> auto-pruning; it is not meant as an alternate or replacement safety
> mechanism to the grace period, but instead augments it. So, a behavior
> change which respects only one of those safety mechanisms but not the
> other is likely flawed.
> 
> And, importantly, people may already be relying upon this behavior of
> having an automatic grace period -- without having to place a worktree
> lock manually -- so changing behavior arbitrarily could break existing
> workflows and result in data loss.
> 
>> I can easily accommodate "git worktree lock". What bugs me though, is
>> that using worktrees basically means I have to replace fairly regular
>> filesystem activities with worktree commands, and it doesn't seem to be
>> *necessary* in any way. And I'm going to forget. A lot.
>>
>> To me, there doesn't seem to be any reason these need to behave any different:
>>
>> $ git worktree add foo foo
>> $ rm -rf foo
>> vs
>> $ git worktree add foo foo
>> $ git worktree remove foo
>>
>> And in fact the only difference right now, aside from some very
>> minuscule storage requirements that haven't gotten cleaned up, is the
>> first one leaves an artifact that tells it to give me errors later until
>> I run "git worktree prune" myself.
> 
> I understand the pain point, but I also understand Duy's motivation
> for being very careful about pruning worktree administrative files
> automatically (so as to avoid data loss, such as changes already
> staged to a worktree-local "index" file). While the proposed change
> may address the pain point, it nevertheless creates the possibility of
> accidental loss which Duy was careful to avoid when designing worktree
> mechanics. Although annoying, the current behavior gives you the
> opportunity to avoid that accidental loss by forcing you to take
> deliberate action to remove the worktree administrative files.
> 
> Perhaps there is some way to address the pain point without breaking
> the fundamental promise made by git-worktree about being careful with
> worktree metadata[*], but the changes proposed by this patch series
> seem insufficient (even if the patch is reworked to respect worktree
> locking). I've cc:'d Duy in case he wants to chime in.

I agree that we want to preserve the safe guards in the worktree design. 
I wonder if detaching the HEAD of the missing worktree would solve the 
problem without losing data. In the case where something wants to 
checkout the same branch as the missing worktree then I think that is a 
good solution. I think it should be OK for branch deletion as well.

Best Wishes

Phillip

> [*] For instance, perhaps before auto-pruning, it could check whether
> the index is recording staged changes or conflict information, and
> only allow auto-pruning if the index is clean. *But* there may be
> other ways for information to be lost permanently (beyond a dirty
> "index") which don't occur to me at present, so this has to be
> considered carefully.
>
Eric Sunshine Nov. 9, 2019, 11:34 a.m. UTC | #5
On Fri, Nov 8, 2019 at 9:56 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
> On 08/11/2019 10:14, Eric Sunshine wrote:
> > Perhaps there is some way to address the pain point without breaking
> > the fundamental promise made by git-worktree about being careful with
> > worktree metadata[*], but the changes proposed by this patch series
> > seem insufficient (even if the patch is reworked to respect worktree
> > locking). I've cc:'d Duy in case he wants to chime in.
>
> I agree that we want to preserve the safe guards in the worktree design.
> I wonder if detaching the HEAD of the missing worktree would solve the
> problem without losing data. In the case where something wants to
> checkout the same branch as the missing worktree then I think that is a
> good solution. I think it should be OK for branch deletion as well.

I would feel very uncomfortable making "automatic HEAD detachment"
(decapitation?) the default behavior. Although doing so may (in some
fashion) safeguard precious information in .git/worktrees/<id>, it
potentially brings its own difficulties. For instance, if someone
takes an action which automatically detaches HEAD of a missing
worktree which had some branch checked out (and possibly some changes
staged in the worktree-specific "index"), and then builds more commits
on that branch, then that worktree gets into a state akin to rebased
upstream (for which git-rebase documentation devotes an entire
section[1], "Recovering From Upstream Rebase"). While a power-user may
be able to recover from such a state, allowing the general Git user to
get into such a situation by default seems contraindicated.

I'm not even convinced that hiding the suggested "auto-detach"
behavior behind a configuration variable so power-users can enable it
is entirely a good idea either since, while it may eliminate some
pain, it also potentially allows abandoned worktree entries to
accumulate.

[1]: https://git-scm.com/docs/git-rebase#_recovering_from_upstream_rebase
diff mbox series

Patch

diff --git a/builtin/branch.c b/builtin/branch.c
index 2ef214632f0..d611f8183b4 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -236,7 +236,7 @@  static int delete_branches(int argc, const char **argv, int force, int kinds,
 		if (kinds == FILTER_REFS_BRANCHES) {
 			const struct worktree *wt =
 				find_shared_symref("HEAD", name);
-			if (wt) {
+			if (wt && prune_worktree_if_missing(wt) < 0) {
 				error(_("Cannot delete branch '%s' "
 					"checked out at '%s'"),
 				      bname.buf, wt->path);
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 4de44f579af..b3ad915c3c3 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -133,6 +133,20 @@  static int prune_worktree(const char *id, struct strbuf *reason)
 	return 0;
 }
 
+int prune_worktree_if_missing(const struct worktree *wt)
+{
+	struct strbuf reason = STRBUF_INIT;
+
+	if (access(wt->path, F_OK) >= 0 ||
+	    (errno != ENOENT && errno == ENOTDIR)) {
+		errno = EEXIST;
+		return -1;
+	}
+
+	strbuf_addf(&reason, _("Removing worktrees/%s: worktree directory is not present"), wt->id);
+	return prune_worktree(wt->id, &reason);
+}
+
 static void prune_worktrees(void)
 {
 	struct strbuf reason = STRBUF_INIT;
diff --git a/worktree.h b/worktree.h
index caecc7a281c..75762c25752 100644
--- a/worktree.h
+++ b/worktree.h
@@ -132,4 +132,10 @@  void strbuf_worktree_ref(const struct worktree *wt,
 const char *worktree_ref(const struct worktree *wt,
 			 const char *refname);
 
+/*
+ * Prune a worktree if it is no longer present at the checked out location.
+ * Returns < 0 if the checkout is there or if pruning fails.
+ */
+int prune_worktree_if_missing(const struct worktree *wt);
+
 #endif