Message ID | 20190221121943.19778-2-pclouds@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | worktree add: sanitize worktree names | expand |
On Thu, Feb 21, 2019 at 07:19:43PM +0700, Nguyễn Thái Ngọc Duy wrote: > +/* > + * worktree name is part of refname and has to pass > + * check_refname_component(). Remove unallowed characters to make it > + * valid. > + */ > +static void sanitize_worktree_name(struct strbuf *name) > +{ > + char *orig_name = xstrdup(name->buf); > + int i; > + > + /* > + * All special chars replaced with dashes. See > + * check_refname_component() for reference. > + * Note that .lock is also turned to -lock, removing its > + * special status. > + */ > + for (i = 0; i < name->len; i++) { > + if (strchr(":?[]\\~ \t@{}*/.", name->buf[i])) > + name->buf[i] = '-'; > + } This is reject-known-bad, but I think there are still some other characters that are not allowed in refnames (e.g., ASCII control characters). Which would lead to us hitting the BUG() below. It might make sense to provide access to refname_disposition() and use it here. Alternatively, I think if we did an allow-known-good, it might be OK to have a slightly more restrictive scheme (say, alnum plus dashes, plus high-bit chars). > + /* remove consecutive dashes, leading or trailing dashes */ > + for (i = 0; i < name->len; i++) { > + while (name->buf[i] == '-' && > + (i == 0 || > + i == name->len - 1 || > + (i < name->len - 1 && name->buf[i + 1] == '-'))) > + strbuf_remove(name, i, 1); > + } I think this is correct, though it is possibly to be quadratic in the string length due to the O(n) remove. I think this kind of sanitizing is more readable if done between two strings rather than in-place, like: for (i = 0; i < name->len; i++) { if (is_allowed(name->buf[i])) { strbuf_addch(&dest, name->buf[i]); last_was_dash = 0; } else if (!last_was_dash && dest->len) strbuf_addch(&dest, '-'); last_was_dash = 1; } } /* still must handle removal from end of stray "-" and ".lock" */ strbuf_swap(name, &dest); strbuf_release(&dest); but that may just be personal preference. I'm OK with it if you prefer the in-place way. -Peff
On 21/02/2019 12:19, Nguyễn Thái Ngọc Duy wrote: > Worktree names are based on $(basename $GIT_WORK_TREE). They aren't > significant until 3a3b9d8cde (refs: new ref types to make per-worktree > refs visible to all worktrees - 2018-10-21), where worktree name could > be part of a refname and must follow refname rules. > > Update 'worktree add' code to remove special characters to follow > these rules. The code could replace chars with '-' more than > necessary, but it keeps the code simple. In the future the user will > be able to specify the worktree name by themselves if they're not > happy with this dumb character substitution. > > Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru> > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > builtin/worktree.c | 51 ++++++++++++++++++++++++++++++++++++++++- > t/t2025-worktree-add.sh | 7 ++++++ > 2 files changed, 57 insertions(+), 1 deletion(-) > > diff --git a/builtin/worktree.c b/builtin/worktree.c > index 3f9907fcc9..53e41db229 100644 > --- a/builtin/worktree.c > +++ b/builtin/worktree.c > @@ -262,6 +262,50 @@ static void validate_worktree_add(const char *path, const struct add_opts *opts) > free_worktrees(worktrees); > } > > +/* > + * worktree name is part of refname and has to pass > + * check_refname_component(). Remove unallowed characters to make it > + * valid. > + */ > +static void sanitize_worktree_name(struct strbuf *name) > +{ > + char *orig_name = xstrdup(name->buf); > + int i; > + > + /* > + * All special chars replaced with dashes. See > + * check_refname_component() for reference. > + * Note that .lock is also turned to -lock, removing its > + * special status. > + */ > + for (i = 0; i < name->len; i++) { > + if (strchr(":?[]\\~ \t@{}*/.", name->buf[i])) > + name->buf[i] = '-'; > + } > + > + /* remove consecutive dashes, leading or trailing dashes */ Why? So, '[fred]' will be 'sanitized' to 'fred' (rather than '-fred-'), which would increase the chance of a 'collision' with the 'fred' worktree (not very likely, but still). Is that useful? How about 'x86_64-*-gnu' which now becomes 'x86_64-gnu'? > + for (i = 0; i < name->len; i++) { > + while (name->buf[i] == '-' && > + (i == 0 || > + i == name->len - 1 || > + (i < name->len - 1 && name->buf[i + 1] == '-'))) > + strbuf_remove(name, i, 1); > + } > + > + /* > + * a worktree name of only special chars would be reduced to > + * an empty string > + */> + if (name->len == 0) > + strbuf_addstr(name, "worktree"); If you didn't 'collapse' the name above, you could check for an empty name at the top and wouldn't need this (presumably an empty name would not be valid). ATB, Ramsay Jones
On Fri, Feb 22, 2019 at 12:42 AM Ramsay Jones <ramsay@ramsayjones.plus.com> wrote: > > +static void sanitize_worktree_name(struct strbuf *name) > > +{ > > + char *orig_name = xstrdup(name->buf); > > + int i; > > + > > + /* > > + * All special chars replaced with dashes. See > > + * check_refname_component() for reference. > > + * Note that .lock is also turned to -lock, removing its > > + * special status. > > + */ > > + for (i = 0; i < name->len; i++) { > > + if (strchr(":?[]\\~ \t@{}*/.", name->buf[i])) > > + name->buf[i] = '-'; > > + } > > + > > + /* remove consecutive dashes, leading or trailing dashes */ > > Why? So, '[fred]' will be 'sanitized' to 'fred' (rather than '-fred-'), > which would increase the chance of a 'collision' with the 'fred' > worktree (not very likely, but still). Is that useful? How about > 'x86_64-*-gnu' which now becomes 'x86_64-gnu'? It is useful when you want to specify HEAD of [fred] for example. Writing worktrees/fred/HEAD is a bit better than worktrees/-fred-/HEAD. I haven't done it yet, but these names will be shown in "git worktree list" too and lots of dashes does not improve readability. Collision is not a problem because if fred is taken, the final name would be fred1 or fred<some other number>. If you're really bothered with this, you will be able to specify the name you want (you can't, yet). You still have to pass the valid refname check, but you have a lot more flexibility. So this code only needs to work mostly ok for the common case and I could go either way, clean up consecutive dashes or not. I suppose simpler code would be the tie breaker.
diff --git a/builtin/worktree.c b/builtin/worktree.c index 3f9907fcc9..53e41db229 100644 --- a/builtin/worktree.c +++ b/builtin/worktree.c @@ -262,6 +262,50 @@ static void validate_worktree_add(const char *path, const struct add_opts *opts) free_worktrees(worktrees); } +/* + * worktree name is part of refname and has to pass + * check_refname_component(). Remove unallowed characters to make it + * valid. + */ +static void sanitize_worktree_name(struct strbuf *name) +{ + char *orig_name = xstrdup(name->buf); + int i; + + /* + * All special chars replaced with dashes. See + * check_refname_component() for reference. + * Note that .lock is also turned to -lock, removing its + * special status. + */ + for (i = 0; i < name->len; i++) { + if (strchr(":?[]\\~ \t@{}*/.", name->buf[i])) + name->buf[i] = '-'; + } + + /* remove consecutive dashes, leading or trailing dashes */ + for (i = 0; i < name->len; i++) { + while (name->buf[i] == '-' && + (i == 0 || + i == name->len - 1 || + (i < name->len - 1 && name->buf[i + 1] == '-'))) + strbuf_remove(name, i, 1); + } + + /* + * a worktree name of only special chars would be reduced to + * an empty string + */ + if (name->len == 0) + strbuf_addstr(name, "worktree"); + + if (check_refname_format(name->buf, REFNAME_ALLOW_ONELEVEL)) + BUG("worktree name '%s' (from '%s') is not a valid refname", + name->buf, orig_name); + + free(orig_name); +} + static int add_worktree(const char *path, const char *refname, const struct add_opts *opts) { @@ -275,6 +319,7 @@ static int add_worktree(const char *path, const char *refname, struct strbuf symref = STRBUF_INIT; struct commit *commit = NULL; int is_branch = 0; + struct strbuf sb_name = STRBUF_INIT; validate_worktree_add(path, opts); @@ -290,7 +335,10 @@ static int add_worktree(const char *path, const char *refname, die(_("invalid reference: %s"), refname); name = worktree_basename(path, &len); - git_path_buf(&sb_repo, "worktrees/%.*s", (int)(path + len - name), name); + strbuf_add(&sb_name, name, path + len - name); + sanitize_worktree_name(&sb_name); + name = sb_name.buf; + git_path_buf(&sb_repo, "worktrees/%s", name); len = sb_repo.len; if (safe_create_leading_directories_const(sb_repo.buf)) die_errno(_("could not create leading directories of '%s'"), @@ -415,6 +463,7 @@ static int add_worktree(const char *path, const char *refname, strbuf_release(&symref); strbuf_release(&sb_repo); strbuf_release(&sb_git); + strbuf_release(&sb_name); return ret; } diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh index 286bba35d8..71aa6ab9c1 100755 --- a/t/t2025-worktree-add.sh +++ b/t/t2025-worktree-add.sh @@ -570,4 +570,11 @@ test_expect_success '"add" an existing locked but missing worktree' ' git worktree add --force --force --detach gnoo ' +test_expect_success 'sanitize generated worktree name' ' + git worktree add --detach ". weird*..?.lock.lock" && + test -d .git/worktrees/weird-lock-lock && + git worktree add --detach .... && + test -d .git/worktrees/worktree +' + test_done
Worktree names are based on $(basename $GIT_WORK_TREE). They aren't significant until 3a3b9d8cde (refs: new ref types to make per-worktree refs visible to all worktrees - 2018-10-21), where worktree name could be part of a refname and must follow refname rules. Update 'worktree add' code to remove special characters to follow these rules. The code could replace chars with '-' more than necessary, but it keeps the code simple. In the future the user will be able to specify the worktree name by themselves if they're not happy with this dumb character substitution. Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- builtin/worktree.c | 51 ++++++++++++++++++++++++++++++++++++++++- t/t2025-worktree-add.sh | 7 ++++++ 2 files changed, 57 insertions(+), 1 deletion(-)