diff mbox series

[v2,1/1] worktree add: sanitize worktree names

Message ID 20190221121943.19778-2-pclouds@gmail.com (mailing list archive)
State New, archived
Headers show
Series worktree add: sanitize worktree names | expand

Commit Message

Duy Nguyen Feb. 21, 2019, 12:19 p.m. UTC
Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
significant until 3a3b9d8cde (refs: new ref types to make per-worktree
refs visible to all worktrees - 2018-10-21), where worktree name could
be part of a refname and must follow refname rules.

Update 'worktree add' code to remove special characters to follow
these rules. The code could replace chars with '-' more than
necessary, but it keeps the code simple. In the future the user will
be able to specify the worktree name by themselves if they're not
happy with this dumb character substitution.

Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/worktree.c      | 51 ++++++++++++++++++++++++++++++++++++++++-
 t/t2025-worktree-add.sh |  7 ++++++
 2 files changed, 57 insertions(+), 1 deletion(-)

Comments

Jeff King Feb. 21, 2019, 1:22 p.m. UTC | #1
On Thu, Feb 21, 2019 at 07:19:43PM +0700, Nguyễn Thái Ngọc Duy wrote:

> +/*
> + * worktree name is part of refname and has to pass
> + * check_refname_component(). Remove unallowed characters to make it
> + * valid.
> + */
> +static void sanitize_worktree_name(struct strbuf *name)
> +{
> +	char *orig_name = xstrdup(name->buf);
> +	int i;
> +
> +	/*
> +	 * All special chars replaced with dashes. See
> +	 * check_refname_component() for reference.
> +	 * Note that .lock is also turned to -lock, removing its
> +	 * special status.
> +	 */
> +	for (i = 0; i < name->len; i++) {
> +		if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
> +			name->buf[i] = '-';
> +	}

This is reject-known-bad, but I think there are still some other
characters that are not allowed in refnames (e.g., ASCII control
characters). Which would lead to us hitting the BUG() below.

It might make sense to provide access to refname_disposition() and use
it here. Alternatively, I think if we did an allow-known-good, it might
be OK to have a slightly more restrictive scheme (say, alnum plus
dashes, plus high-bit chars).

> +	/* remove consecutive dashes, leading or trailing dashes */
> +	for (i = 0; i < name->len; i++) {
> +		while (name->buf[i] == '-' &&
> +		       (i == 0 ||
> +			i == name->len - 1 ||
> +			(i < name->len - 1 && name->buf[i + 1] == '-')))
> +			strbuf_remove(name, i, 1);
> +	}

I think this is correct, though it is possibly to be quadratic in the
string length due to the O(n) remove. I think this kind of sanitizing is
more readable if done between two strings rather than in-place, like:

  for (i = 0; i < name->len; i++) {
	if (is_allowed(name->buf[i])) {
		strbuf_addch(&dest, name->buf[i]);
		last_was_dash = 0;
	} else if (!last_was_dash && dest->len)
		strbuf_addch(&dest, '-');
		last_was_dash = 1;
	}
  }
  /* still must handle removal from end of stray "-" and ".lock" */
  strbuf_swap(name, &dest);
  strbuf_release(&dest);

but that may just be personal preference. I'm OK with it if you prefer
the in-place way.

-Peff
Ramsay Jones Feb. 21, 2019, 5:41 p.m. UTC | #2
On 21/02/2019 12:19, Nguyễn Thái Ngọc Duy wrote:
> Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
> significant until 3a3b9d8cde (refs: new ref types to make per-worktree
> refs visible to all worktrees - 2018-10-21), where worktree name could
> be part of a refname and must follow refname rules.
> 
> Update 'worktree add' code to remove special characters to follow
> these rules. The code could replace chars with '-' more than
> necessary, but it keeps the code simple. In the future the user will
> be able to specify the worktree name by themselves if they're not
> happy with this dumb character substitution.
> 
> Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin/worktree.c      | 51 ++++++++++++++++++++++++++++++++++++++++-
>  t/t2025-worktree-add.sh |  7 ++++++
>  2 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 3f9907fcc9..53e41db229 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -262,6 +262,50 @@ static void validate_worktree_add(const char *path, const struct add_opts *opts)
>  	free_worktrees(worktrees);
>  }
>  
> +/*
> + * worktree name is part of refname and has to pass
> + * check_refname_component(). Remove unallowed characters to make it
> + * valid.
> + */
> +static void sanitize_worktree_name(struct strbuf *name)
> +{
> +	char *orig_name = xstrdup(name->buf);
> +	int i;
> +
> +	/*
> +	 * All special chars replaced with dashes. See
> +	 * check_refname_component() for reference.
> +	 * Note that .lock is also turned to -lock, removing its
> +	 * special status.
> +	 */
> +	for (i = 0; i < name->len; i++) {
> +		if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
> +			name->buf[i] = '-';
> +	}
> +
> +	/* remove consecutive dashes, leading or trailing dashes */

Why? So, '[fred]' will be 'sanitized' to 'fred' (rather than '-fred-'),
which would increase the chance of a 'collision' with the 'fred'
worktree (not very likely, but still). Is that useful? How about
'x86_64-*-gnu' which now becomes 'x86_64-gnu'?
 
> +	for (i = 0; i < name->len; i++) {
> +		while (name->buf[i] == '-' &&
> +		       (i == 0 ||
> +			i == name->len - 1 ||
> +			(i < name->len - 1 && name->buf[i + 1] == '-')))
> +			strbuf_remove(name, i, 1);
> +	}
> +
> +	/*
> +	 * a worktree name of only special chars would be reduced to
> +	 * an empty string
> +	 */> +	if (name->len == 0)
> +		strbuf_addstr(name, "worktree");

If you didn't 'collapse' the name above, you could check for
an empty name at the top and wouldn't need this (presumably
an empty name would not be valid).

ATB,
Ramsay Jones
Duy Nguyen Feb. 22, 2019, 9:21 a.m. UTC | #3
On Fri, Feb 22, 2019 at 12:42 AM Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
> > +static void sanitize_worktree_name(struct strbuf *name)
> > +{
> > +     char *orig_name = xstrdup(name->buf);
> > +     int i;
> > +
> > +     /*
> > +      * All special chars replaced with dashes. See
> > +      * check_refname_component() for reference.
> > +      * Note that .lock is also turned to -lock, removing its
> > +      * special status.
> > +      */
> > +     for (i = 0; i < name->len; i++) {
> > +             if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
> > +                     name->buf[i] = '-';
> > +     }
> > +
> > +     /* remove consecutive dashes, leading or trailing dashes */
>
> Why? So, '[fred]' will be 'sanitized' to 'fred' (rather than '-fred-'),
> which would increase the chance of a 'collision' with the 'fred'
> worktree (not very likely, but still). Is that useful? How about
> 'x86_64-*-gnu' which now becomes 'x86_64-gnu'?

It is useful when you want to specify HEAD of [fred] for example.
Writing worktrees/fred/HEAD is a bit better than
worktrees/-fred-/HEAD. I haven't done it yet, but these names will be
shown in "git worktree list" too and lots of dashes does not improve
readability. Collision is not a problem because if fred is taken, the
final name would be fred1 or fred<some other number>.

If you're really bothered with this, you will be able to specify the
name you want (you can't, yet). You still have to pass the valid
refname check, but you have a lot more flexibility.

So this code only needs to work mostly ok for the common case and I
could go either way, clean up consecutive dashes or not. I suppose
simpler code would be the tie breaker.
diff mbox series

Patch

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 3f9907fcc9..53e41db229 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -262,6 +262,50 @@  static void validate_worktree_add(const char *path, const struct add_opts *opts)
 	free_worktrees(worktrees);
 }
 
+/*
+ * worktree name is part of refname and has to pass
+ * check_refname_component(). Remove unallowed characters to make it
+ * valid.
+ */
+static void sanitize_worktree_name(struct strbuf *name)
+{
+	char *orig_name = xstrdup(name->buf);
+	int i;
+
+	/*
+	 * All special chars replaced with dashes. See
+	 * check_refname_component() for reference.
+	 * Note that .lock is also turned to -lock, removing its
+	 * special status.
+	 */
+	for (i = 0; i < name->len; i++) {
+		if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
+			name->buf[i] = '-';
+	}
+
+	/* remove consecutive dashes, leading or trailing dashes */
+	for (i = 0; i < name->len; i++) {
+		while (name->buf[i] == '-' &&
+		       (i == 0 ||
+			i == name->len - 1 ||
+			(i < name->len - 1 && name->buf[i + 1] == '-')))
+			strbuf_remove(name, i, 1);
+	}
+
+	/*
+	 * a worktree name of only special chars would be reduced to
+	 * an empty string
+	 */
+	if (name->len == 0)
+		strbuf_addstr(name, "worktree");
+
+	if (check_refname_format(name->buf, REFNAME_ALLOW_ONELEVEL))
+		BUG("worktree name '%s' (from '%s') is not a valid refname",
+		    name->buf, orig_name);
+
+	free(orig_name);
+}
+
 static int add_worktree(const char *path, const char *refname,
 			const struct add_opts *opts)
 {
@@ -275,6 +319,7 @@  static int add_worktree(const char *path, const char *refname,
 	struct strbuf symref = STRBUF_INIT;
 	struct commit *commit = NULL;
 	int is_branch = 0;
+	struct strbuf sb_name = STRBUF_INIT;
 
 	validate_worktree_add(path, opts);
 
@@ -290,7 +335,10 @@  static int add_worktree(const char *path, const char *refname,
 		die(_("invalid reference: %s"), refname);
 
 	name = worktree_basename(path, &len);
-	git_path_buf(&sb_repo, "worktrees/%.*s", (int)(path + len - name), name);
+	strbuf_add(&sb_name, name, path + len - name);
+	sanitize_worktree_name(&sb_name);
+	name = sb_name.buf;
+	git_path_buf(&sb_repo, "worktrees/%s", name);
 	len = sb_repo.len;
 	if (safe_create_leading_directories_const(sb_repo.buf))
 		die_errno(_("could not create leading directories of '%s'"),
@@ -415,6 +463,7 @@  static int add_worktree(const char *path, const char *refname,
 	strbuf_release(&symref);
 	strbuf_release(&sb_repo);
 	strbuf_release(&sb_git);
+	strbuf_release(&sb_name);
 	return ret;
 }
 
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 286bba35d8..71aa6ab9c1 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -570,4 +570,11 @@  test_expect_success '"add" an existing locked but missing worktree' '
 	git worktree add --force --force --detach gnoo
 '
 
+test_expect_success 'sanitize generated worktree name' '
+	git worktree add --detach ".  weird*..?.lock.lock" &&
+	test -d .git/worktrees/weird-lock-lock &&
+	git worktree add --detach .... &&
+	test -d .git/worktrees/worktree
+'
+
 test_done