diff mbox series

[RFC] status: avoid reporting worktrees as "Untracked files"

Message ID 20231104000209.916189-1-eantoranz@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] status: avoid reporting worktrees as "Untracked files" | expand

Commit Message

Edmundo Carmona Antoranz Nov. 4, 2023, 12:02 a.m. UTC
Given that worktrees are tracked in their own special fashion separately,
it makes sense to _not_ report them as "untracked". Also, when seeing the
directory of a worktree listed as Untracked, it might be tempting to try
to do operations (like 'git add') on them from the parent worktree which,
at the moment, will silently do nothing.

With this patch, we check items against the list of worktrees to add
them into the untracked items list effectively hiding them.

END OF PATCH

Here are a few questions more inline with the "RFC" part of the patch.

About UI
- Would it make more sense to separate them from Untracked files instead
  of hiding them (perhaps add a --worktrees option to display them)?
- Follow-up if the previous answer is 'yes': List a worktree only if it
  is not clean?

About code:
- If keeping the idea/patch, Would it make more sense (performance-wise) to
  fist check an item in the list of worktrees before checking it in the
  index? In other words, reverse the conditions to add an item to the
  untracked list?
---
 wt-status.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Eric Sunshine Nov. 4, 2023, 6:15 a.m. UTC | #1
On Fri, Nov 3, 2023 at 8:03 PM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
> Given that worktrees are tracked in their own special fashion separately,
> it makes sense to _not_ report them as "untracked". Also, when seeing the
> directory of a worktree listed as Untracked, it might be tempting to try
> to do operations (like 'git add') on them from the parent worktree which,
> at the moment, will silently do nothing.
>
> With this patch, we check items against the list of worktrees to add
> them into the untracked items list effectively hiding them.
>
> END OF PATCH
>
> Here are a few questions more inline with the "RFC" part of the patch.
>
> About UI
> - Would it make more sense to separate them from Untracked files instead
>   of hiding them (perhaps add a --worktrees option to display them)?
> - Follow-up if the previous answer is 'yes': List a worktree only if it
>   is not clean?
>
> About code:
> - If keeping the idea/patch, Would it make more sense (performance-wise) to
>   fist check an item in the list of worktrees before checking it in the
>   index? In other words, reverse the conditions to add an item to the
>   untracked list?

I have slightly mixed feelings about this idea since I'm sympathetic
to the motivation, however, my knee-jerk reaction is that these really
_are_ untracked considering that Git is a "content tracker" and
worktrees are not project content. Git already has general mechanisms
such as .git/info/exclude and .gitignore for suppressing certain
untracked items, so introducing special-purpose code to suppress
worktrees from being considered untracked may be a case of adding
complexity for little gain.

Moreover, although your personal workflow may be to create worktrees
within your main directory:

    git worktree add new-feature

other people use a workflow in which worktrees are created at other
locations, such as making them siblings:

    git worktree add ../new-feature

For the former case, it's easy enough to mention worktrees in
.git/info/exclude, especially if you use a standard naming convention
for your worktrees, in which case a single wildcard pattern may allow
you to set it once and forget about it. For the latter workflow, the
extra "is untracked" checking is simply wasteful.

Having said all that, I think that someone may have recently floated
the idea on the mailing list about suppressing dirty submodules from
showing up as "dirty" in git-status. Although the underlying concepts
and mechanisms are quite distinct (especially since a submodule _is_
content), perhaps there is some sort of analogy between worktrees and
dirty submodules which invalidates my knee-jerk reaction. Also, I'm
just one person responding without having put all that much thought
into it. Others may feel differently.

Regarding the patch itself...

> diff --git a/wt-status.c b/wt-status.c
> @@ -795,9 +796,12 @@ static void wt_status_collect_untracked(struct wt_status *s)
> +       worktrees = get_worktrees();
> +
>         for (i = 0; i < dir.nr; i++) {
>                 struct dir_entry *ent = dir.entries[i];
> -               if (index_name_is_other(istate, ent->name, ent->len))
> +               if (index_name_is_other(istate, ent->name, ent->len) &&
> +                   !find_worktree_by_path(worktrees, ent->name))
>                         string_list_insert(&s->untracked, ent->name);
>         }

This first-stab implementation unfortunately has worse than quadratic
complexity, perhaps even cubic complexity since
find_worktree_by_path() performs a linear scan through the worktree
list. So, for each path in `dir`, it's performing a
character-by-character string comparison with each path in
`worktrees`. Worse, find_worktree_by_path() calls strbuf_realpath()
which hits the filesystem for each path in `worktrees` each time it's
called.

So, a real (non-RFC) implementation would probably need to perform a
preparatory step of creating a hash-table/set in which the keys are
the realpath'd elements from `worktrees`, and then simply consult the
hash-table/set for each path in `dir`.

> @@ -809,6 +813,9 @@ static void wt_status_collect_untracked(struct wt_status *s)
> +       if (worktrees)
> +               free_worktrees(worktrees);

Nit: At this point, we _know_ that `worktrees` is non-NULL, so
free_worktrees() can be called unconditionally.
Junio C Hamano Nov. 4, 2023, 6:58 a.m. UTC | #2
Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:

> Given that worktrees are tracked in their own special fashion separately,
> it makes sense to _not_ report them as "untracked".

My gut feeling is that a much better solution to the unstated
problem you are solving is to make sure that "git worktree add" will
complain and not allow adding a subdirectory of any existing working
tree of a repository as a new worktree.

What problem are you trying to solve?  "git add foo" where "foo" is
actually a different worktree of the repository would add it as a
submodule that causes confusion?  If that is the case, I think the
right solution is not to get into such a state, i.e. not create a
worktree of the repository inside a different worktree in the first
place.
Edmundo Carmona Antoranz Nov. 11, 2023, 9:22 a.m. UTC | #3
Hey, guys! Thanks Junio and Eric for sharing your thoughts.

And I candidly thought this was going to be an "easy sell".... :-D

On Sat, Nov 4, 2023 at 7:58 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> What problem are you trying to solve?  "git add foo" where "foo" is
> actually a different worktree of the repository would add it as a
> submodule that causes confusion?  If that is the case, I think the
> right solution is not to get into such a state, i.e. not create a
> worktree of the repository inside a different worktree in the first
> place.
>

I am not against the idea of creating worktrees outside of the
repository... however, I like them to be _inside_ the repository. Am I
the only one? IDK. I might be! It feels completely natural, if you ask
me.... but that's just my opinion, I acknowledge that.

While I was running a couple of quick tests to add more information
about git behaviour with "my use case" I think I found something to
work on so more RFCs might be on the way in the next few days or
weeks.

About adding an error message when 'git add' will skip doing something
because it is working on a different worktree, I think it makes
sense.... will probably work on that too.

Thanks again! BR!
Eric Sunshine Nov. 12, 2023, 5:13 p.m. UTC | #4
On Sat, Nov 11, 2023 at 4:22 AM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
> On Sat, Nov 4, 2023 at 7:58 AM Junio C Hamano <gitster@pobox.com> wrote:
> > What problem are you trying to solve?  "git add foo" where "foo" is
> > actually a different worktree of the repository would add it as a
> > submodule that causes confusion?  If that is the case, I think the
> > right solution is not to get into such a state, i.e. not create a
> > worktree of the repository inside a different worktree in the first
> > place.
> >
> Hey, guys! Thanks Junio and Eric for sharing your thoughts.
>
> I am not against the idea of creating worktrees outside of the
> repository... however, I like them to be _inside_ the repository. Am I
> the only one? IDK. I might be! It feels completely natural, if you ask
> me.... but that's just my opinion, I acknowledge that.

I doubt you're the only one, but, based upon, list emails over the
years, it seems that both in-main-tree and outside-main-tree (often
sibling) worktrees are common. More recently, we've also heard from
people who don't even have a main-worktree; instead, they hang their
multiple worktrees off of a bare repository (which is an
explicitly-supported use-case); i.e.:

    git clone --bare https://.../foobar.git
    git -C foobar.git worktree add worktree1
    git -C foobar.git worktree add worktree2
    ...
Junio C Hamano Nov. 12, 2023, 11:52 p.m. UTC | #5
Eric Sunshine <sunshine@sunshineco.com> writes:

> I doubt you're the only one, but, based upon, list emails over the
> years, it seems that both in-main-tree and outside-main-tree (often
> sibling) worktrees are common. More recently, we've also heard from
> people who don't even have a main-worktree; instead, they hang their
> multiple worktrees off of a bare repository (which is an
> explicitly-supported use-case); i.e.:
>
>     git clone --bare https://.../foobar.git
>     git -C foobar.git worktree add worktree1
>     git -C foobar.git worktree add worktree2
>     ...

I am not sure why you brought in that layout in this discussion,
because it places worktree1 and worktree2 next to each other, just
like placing worktree1 and worktree2 next to the non-bare repository.

    git clone https://.../foobar.git foobar
    git -C foobar worktree add worktree1
    git -C foobar worktree add worktree2

The layout to create worktrees attached to a bare repository and add
them next to each other, and the same starting from a non-bare
repository, share an important trait.  They do not have an untracked
and untrackable "cruft" in their working tree, unlike the crazy
layout that places worktrees of the repository inside the working
tree of the primary worktree as untracked subdirectories.

Really, what is the advantage of doing so?  It is not like the build
recipe recorded in the primary worktree can work recursively on
different branches that are checked out---worktree names and paths
at which they are checked out are totally local matter, and the
upstream project that supplies the build recipe would not know or
care.

Even worse, when the project wants to add a new subdirectory or a
file, the name chosen for the subdirectory may happen to collide
with the name of an untracked subdirectory you happened to have used
(again, because the worktree names and locations are totally local
matter, the upstream project are unaware of them and cannot avoid
such name clashes even if they cared).  You can imagine the
confusion that happens to your next "git pull".

Compared to such an insanity, attaching worktrees to a bare
repository, so that all worktrees are equals and there is no
"primary" worktree that you cannot remove, behave just as normal as
a set of worktrees attached to a non-bare repository and sit outside
the primary worktree, often as immediate siblings.
diff mbox series

Patch

diff --git a/wt-status.c b/wt-status.c
index 9f45bf6949..5fd1e6007a 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -775,6 +775,7 @@  static void wt_status_collect_untracked(struct wt_status *s)
 	struct dir_struct dir = DIR_INIT;
 	uint64_t t_begin = getnanotime();
 	struct index_state *istate = s->repo->index;
+	struct worktree **worktrees;
 
 	if (!s->show_untracked_files)
 		return;
@@ -795,9 +796,12 @@  static void wt_status_collect_untracked(struct wt_status *s)
 
 	fill_directory(&dir, istate, &s->pathspec);
 
+	worktrees = get_worktrees();
+
 	for (i = 0; i < dir.nr; i++) {
 		struct dir_entry *ent = dir.entries[i];
-		if (index_name_is_other(istate, ent->name, ent->len))
+		if (index_name_is_other(istate, ent->name, ent->len) &&
+		    !find_worktree_by_path(worktrees, ent->name))
 			string_list_insert(&s->untracked, ent->name);
 	}
 
@@ -809,6 +813,9 @@  static void wt_status_collect_untracked(struct wt_status *s)
 
 	dir_clear(&dir);
 
+	if (worktrees)
+		free_worktrees(worktrees);
+
 	if (advice_enabled(ADVICE_STATUS_U_OPTION))
 		s->untracked_in_ms = (getnanotime() - t_begin) / 1000000;
 }