diff mbox series

[v2,4/7] worktree: prune duplicate entries referencing same worktree path

Message ID 20200610063049.74666-5-sunshine@sunshineco.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/7] worktree: factor out repeated string literal | expand

Commit Message

Eric Sunshine June 10, 2020, 6:30 a.m. UTC
A fundamental restriction of linked working trees is that there must
only ever be a single worktree associated with a particular path, thus
"git worktree add" explicitly disallows creation of a new worktree at
the same location as an existing registered worktree. Nevertheless,
users can still "shoot themselves in the foot" by mucking with
administrative files in .git/worktree/<id>/. Worse, "git worktree move"
is careless[1] and allows a worktree to be moved atop a registered but
missing worktree (which can happen, for instance, if the worktree is on
removable media). For instance:

    $ git clone foo.git
    $ cd foo
    $ git worktree add ../bar
    $ git worktree add ../baz
    $ rm -rf ../bar
    $ git worktree move ../baz ../bar
    $ git worktree list
    .../foo beefd00f [master]
    .../bar beefd00f [bar]
    .../bar beefd00f [baz]

Help users recover from this form of corruption by teaching "git
worktree prune" to detect when multiple worktrees are associated with
the same path.

[1]: A subsequent commit will fix "git worktree move" validation to be
     more strict.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 builtin/worktree.c        | 49 ++++++++++++++++++++++++++++++++++-----
 t/t2401-worktree-prune.sh | 12 ++++++++++
 2 files changed, 55 insertions(+), 6 deletions(-)

Comments

Shourya Shukla June 10, 2020, 11:50 a.m. UTC | #1
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 706ac68751..65492752a7 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -67,7 +67,12 @@ static void delete_worktrees_dir_if_empty(void)
>  	rmdir(git_path("worktrees")); /* ignore failed removal */
>  }
>  
> -static int should_prune_worktree(const char *id, struct strbuf *reason)
> +/*
> + * Return true if worktree entry should be pruned, along with the reason for
> + * pruning. Otherwise, return false and the worktree's path, or NULL if it
> + * cannot be determined. Caller is responsible for freeing returned path.
> + */
> +static int should_prune_worktree(const char *id, struct strbuf *reason, char **wtpath)
>  {
>  	struct stat st;
>  	char *path;
> @@ -75,6 +80,7 @@ static int should_prune_worktree(const char *id, struct strbuf *reason)
>  	size_t len;
>  	ssize_t read_result;
>  
> +	*wtpath = NULL;
>  	if (!is_directory(git_path("worktrees/%s", id))) {
>  		strbuf_addstr(reason, _("not a valid directory"));
>  		return 1;
> @@ -120,16 +126,17 @@ static int should_prune_worktree(const char *id, struct strbuf *reason)
>  	}
>  	path[len] = '\0';
>  	if (!file_exists(path)) {
> -		free(path);
>  		if (stat(git_path("worktrees/%s/index", id), &st) ||
>  		    st.st_mtime <= expire) {
>  			strbuf_addstr(reason, _("gitdir file points to non-existent location"));
> +			free(path);
>  			return 1;
>  		} else {
> +			*wtpath = path;
>  			return 0;
>  		}
>  	}
> -	free(path);
> +	*wtpath = path;
>  	return 0;
>  }

What exactly is the role of 'wtpath' in here? Maybe this is explained in
the later patches. 

> @@ -141,22 +148,52 @@ static void prune_worktree(const char *id, const char *reason)
>  		delete_git_dir(id);
>  }
>  
> +static int prune_cmp(const void *a, const void *b)
> +{
> +	const struct string_list_item *x = a;
> +	const struct string_list_item *y = b;
> +	int c;
> +
> +	if ((c = fspathcmp(x->string, y->string)))
> +	    return c;
> +	/* paths same; sort by .git/worktrees/<id> */
> +	return strcmp(x->util, y->util);
> +}
> 

Can we rename the function arguments better? 'a' and 'b' sound very
naive to me. Maybe change these to something more like: 'firstwt' and
'secondwt'? Yeah even this sounds kiddish but you get the idea right? Or
like 'wt' and 'wt_dup'?

> diff --git a/t/t2401-worktree-prune.sh b/t/t2401-worktree-prune.sh
> index b7d6d5d45a..fd3916fee0 100755
> --- a/t/t2401-worktree-prune.sh
> +++ b/t/t2401-worktree-prune.sh
> @@ -92,4 +92,16 @@ test_expect_success 'not prune proper checkouts' '
>  	test -d .git/worktrees/nop
>
>  
> +test_expect_success 'prune duplicate (linked/linked)' '
> +	test_when_finished rm -fr .git/worktrees w1 w2 &&

Nit: maybe make it 'rm -rf' as that's the popular way of doing it?

Otherwise this looks good to me. But again, we need comments from others
too.

Nicely done! :)
Eric Sunshine June 10, 2020, 3:21 p.m. UTC | #2
On Wed, Jun 10, 2020 at 7:50 AM Shourya Shukla
<shouryashukla.oo@gmail.com> wrote:
> > +static int should_prune_worktree(const char *id, struct strbuf *reason, char **wtpath)
>
> What exactly is the role of 'wtpath' in here? Maybe this is explained in
> the later patches.

'wtpath' holds the location of the worktree. It's used in this patch
by prune_worktrees() to collect a list of paths which haven't been
marked for pruning. Once it has the full list, it passes it to
prune_dups() for pruning duplicate entries.

> > +static int prune_cmp(const void *a, const void *b)
>
> Can we rename the function arguments better? 'a' and 'b' sound very
> naive to me. Maybe change these to something more like: 'firstwt' and
> 'secondwt'? Yeah even this sounds kiddish but you get the idea right? Or
> like 'wt' and 'wt_dup'?

As with any language, C has its idioms, as does Git itself. Sticking
to idioms makes it easier for others to understand the code
at-a-glance. Short variable names, such as "i" and "j" for indexes,
"n" for counters, "s" and "t" for strings, are very common among
experienced programmers. Likewise, "a" and "b" are well-understood as
the arguments of a "comparison" function. There are many such examples
in the Git source code itself. Here are just a few:

    cmp_uint32(const void *a_, const void *b_)
    maildir_filename_cmp(const char *a, const char *b)
    tipcmp(const void *a_, const void *b_)
    cmp_by_tag_and_age(const void *a_, const void *b_)
    cmp_remaining_objects(const void *a, const void *b)
    version_cmp(const char *a, const char *b)
    diffnamecmp(const void *a_, const void *b_)
    spanhash_cmp(const void *a_, const void *b_)
    void_hashcmp(const void *a, const void *b)
    pathspec_item_cmp(const void *a_, const void *b_)

> > +test_expect_success 'prune duplicate (linked/linked)' '
> > +   test_when_finished rm -fr .git/worktrees w1 w2 &&
>
> Nit: maybe make it 'rm -rf' as that's the popular way of doing it?

It's true that "-rf" has wider usage in Git tests than "-fr", though
the latter is heavily used, as well. Ordinarily, matching the style
already present in a particular script would be a good way to choose
between the two, however, the worktree-related test scripts which this
series touches already have a mix of the two. This is the first use of
combined "-r" and "-f" in this particular script, so I could go with
the more common "-rf", however, it's not worth re-rolling just for
that.

Thanks for the review.
Junio C Hamano June 10, 2020, 5:34 p.m. UTC | #3
Eric Sunshine <sunshine@sunshineco.com> writes:

> On Wed, Jun 10, 2020 at 7:50 AM Shourya Shukla
> <shouryashukla.oo@gmail.com> wrote:
>> > +static int should_prune_worktree(const char *id, struct strbuf *reason, char **wtpath)
>>
>> What exactly is the role of 'wtpath' in here? Maybe this is explained in
>> the later patches.
>
> 'wtpath' holds the location of the worktree. It's used in this patch
> by prune_worktrees() to collect a list of paths which haven't been
> marked for pruning. Once it has the full list, it passes it to
> prune_dups() for pruning duplicate entries.

"wt" being a fairly common abbreviation of "work(ing) tree" in the
codebase may escape new readers of the code.  The comment before the
function can explicitly mention the variable name in the description
to help them, I would think.  For example ...

> +/*
> + * Return true if worktree entry should be pruned, along with the reason for
> + * pruning. Otherwise, return false and the worktree's path, or NULL if it

... the first sentence makes it clear that the function returns two
things (i.e. "true"---presumably is returned as its return value, and
"the reason"---the readers are supposed to guess it is stuffed in
the strbuf), and the second sentence also says the function returns
two things (i.e. "false", and "the worktree's path"---it is not
immediately obvious where NULL goes, though).

> + * cannot be determined. Caller is responsible for freeing returned path.
> + */

	Determine if the worktree entry specified by its "id" should
	be pruned.

	When returning 'true', the caller-supplied strbuf "reason"
	is filled with the reason why it should be pruned.  "wtpath"
	is left intact.

	When returning 'false', the string variable pointed by
	"wtpath" receives the absolute path of the worktree; or NULL
	if the location of the worktree cannot be determined.
	"reason" is left intact.

perhaps?  I didn't check what you do to *wtpath when you return
true, so "left intact" may not be what you are doing, and I am *not*
suggesting to leave it intact in that case---I am only suggesting
that we should describe what happens.

>> > +static int prune_cmp(const void *a, const void *b)
>>
>> Can we rename the function arguments better? 'a' and 'b' sound very
>> naive to me. Maybe change these to something more like: 'firstwt' and
>> 'secondwt'? Yeah even this sounds kiddish but you get the idea right? Or
>> like 'wt' and 'wt_dup'?
>
> As with any language, C has its idioms, as does Git itself. Sticking
> to idioms makes it easier for others to understand the code
> at-a-glance. Short variable names, such as "i" and "j" for indexes,
> "n" for counters, "s" and "t" for strings, are very common among
> experienced programmers. Likewise, "a" and "b" are well-understood as
> the arguments of a "comparison" function. There are many such examples
> in the Git source code itself. Here are just a few:
>
>     cmp_uint32(const void *a_, const void *b_)
>     maildir_filename_cmp(const char *a, const char *b)
>     tipcmp(const void *a_, const void *b_)
>     cmp_by_tag_and_age(const void *a_, const void *b_)
>     cmp_remaining_objects(const void *a, const void *b)
>     version_cmp(const char *a, const char *b)
>     diffnamecmp(const void *a_, const void *b_)
>     spanhash_cmp(const void *a_, const void *b_)
>     void_hashcmp(const void *a, const void *b)
>     pathspec_item_cmp(const void *a_, const void *b_)

While that is true, what happens in the funcion is a bit unusual.

> +static int prune_cmp(const void *a, const void *b)
> +{
> +	const struct string_list_item *x = a;
> +	const struct string_list_item *y = b;

Usually we do

	static int foo_cmp(const void *a_, const void *b_)
	{
		const true_type *a = a_;
		const true_type *b = b_;

		/* use a and b for comparison */

without involving 'x' and 'y'.

>> > +test_expect_success 'prune duplicate (linked/linked)' '
>> > +   test_when_finished rm -fr .git/worktrees w1 w2 &&
>>
>> Nit: maybe make it 'rm -rf' as that's the popular way of doing it?
>
> It's true that "-rf" has wider usage in Git tests than "-fr", though
> the latter is heavily used, as well.

I myself write "rm -rf" more often when I am casually programming
out of inertia, but when consciously making a decision on how to
spell the combination, I tend to go alphabetical, because there is
no logical reason to write 'r' first, other than the fact that "-rf"
visually looks prettier than "-fr".  

If recursiveness of the removal is more important than forcedness,
then favouring "-rf" over "-fr" would be justifiable, but I do not
think that is the case here.

And as you said, it is not something that justifies a reroll (or
updating existing uses of "-rf") alone.
diff mbox series

Patch

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 706ac68751..65492752a7 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -67,7 +67,12 @@  static void delete_worktrees_dir_if_empty(void)
 	rmdir(git_path("worktrees")); /* ignore failed removal */
 }
 
-static int should_prune_worktree(const char *id, struct strbuf *reason)
+/*
+ * Return true if worktree entry should be pruned, along with the reason for
+ * pruning. Otherwise, return false and the worktree's path, or NULL if it
+ * cannot be determined. Caller is responsible for freeing returned path.
+ */
+static int should_prune_worktree(const char *id, struct strbuf *reason, char **wtpath)
 {
 	struct stat st;
 	char *path;
@@ -75,6 +80,7 @@  static int should_prune_worktree(const char *id, struct strbuf *reason)
 	size_t len;
 	ssize_t read_result;
 
+	*wtpath = NULL;
 	if (!is_directory(git_path("worktrees/%s", id))) {
 		strbuf_addstr(reason, _("not a valid directory"));
 		return 1;
@@ -120,16 +126,17 @@  static int should_prune_worktree(const char *id, struct strbuf *reason)
 	}
 	path[len] = '\0';
 	if (!file_exists(path)) {
-		free(path);
 		if (stat(git_path("worktrees/%s/index", id), &st) ||
 		    st.st_mtime <= expire) {
 			strbuf_addstr(reason, _("gitdir file points to non-existent location"));
+			free(path);
 			return 1;
 		} else {
+			*wtpath = path;
 			return 0;
 		}
 	}
-	free(path);
+	*wtpath = path;
 	return 0;
 }
 
@@ -141,22 +148,52 @@  static void prune_worktree(const char *id, const char *reason)
 		delete_git_dir(id);
 }
 
+static int prune_cmp(const void *a, const void *b)
+{
+	const struct string_list_item *x = a;
+	const struct string_list_item *y = b;
+	int c;
+
+	if ((c = fspathcmp(x->string, y->string)))
+	    return c;
+	/* paths same; sort by .git/worktrees/<id> */
+	return strcmp(x->util, y->util);
+}
+
+static void prune_dups(struct string_list *l)
+{
+	int i;
+
+	QSORT(l->items, l->nr, prune_cmp);
+	for (i = 1; i < l->nr; i++) {
+		if (!fspathcmp(l->items[i].string, l->items[i - 1].string))
+			prune_worktree(l->items[i].util, "duplicate entry");
+	}
+}
+
 static void prune_worktrees(void)
 {
 	struct strbuf reason = STRBUF_INIT;
+	struct string_list kept = STRING_LIST_INIT_NODUP;
 	DIR *dir = opendir(git_path("worktrees"));
 	struct dirent *d;
 	if (!dir)
 		return;
 	while ((d = readdir(dir)) != NULL) {
+		char *path;
 		if (is_dot_or_dotdot(d->d_name))
 			continue;
 		strbuf_reset(&reason);
-		if (!should_prune_worktree(d->d_name, &reason))
-			continue;
-		prune_worktree(d->d_name, reason.buf);
+		if (should_prune_worktree(d->d_name, &reason, &path))
+			prune_worktree(d->d_name, reason.buf);
+		else if (path)
+			string_list_append(&kept, path)->util = xstrdup(d->d_name);
 	}
 	closedir(dir);
+
+	prune_dups(&kept);
+	string_list_clear(&kept, 1);
+
 	if (!show_only)
 		delete_worktrees_dir_if_empty();
 	strbuf_release(&reason);
diff --git a/t/t2401-worktree-prune.sh b/t/t2401-worktree-prune.sh
index b7d6d5d45a..fd3916fee0 100755
--- a/t/t2401-worktree-prune.sh
+++ b/t/t2401-worktree-prune.sh
@@ -92,4 +92,16 @@  test_expect_success 'not prune proper checkouts' '
 	test -d .git/worktrees/nop
 '
 
+test_expect_success 'prune duplicate (linked/linked)' '
+	test_when_finished rm -fr .git/worktrees w1 w2 &&
+	git worktree add --detach w1 &&
+	git worktree add --detach w2 &&
+	sed "s/w2/w1/" .git/worktrees/w2/gitdir >.git/worktrees/w2/gitdir.new &&
+	mv .git/worktrees/w2/gitdir.new .git/worktrees/w2/gitdir &&
+	git worktree prune --verbose >actual &&
+	test_i18ngrep "duplicate entry" actual &&
+	test -d .git/worktrees/w1 &&
+	! test -d .git/worktrees/w2
+'
+
 test_done