Message ID | 20210805074054.29916-5-raykar.ath@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | submodule: convert the rest of 'add' to C | expand |
Atharva Raykar <raykar.ath@gmail.com> writes: > These functions can be useful to other parts of Git. Let's move them to > dir.c, while renaming them to be make their functionality more explicit. Hmph, guess_dir_name_from_git_url() is not all that more clarifying than the original, at least to me. For a file-scope static helper, it probably was good enough with a short name with no function doc, but we should describe what it does in comments in dir.h and come up with a suitable name, taking input from that description. > Signed-off-by: Atharva Raykar <raykar.ath@gmail.com> > Mentored-by: Christian Couder <christian.couder@gmail.com> > Mentored-by: Shourya Shukla <shouryashukla.oo@gmail.com> > --- > builtin/clone.c | 118 +----------------------------------------------- > dir.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++ > dir.h | 3 ++ > 3 files changed, 119 insertions(+), 116 deletions(-) Again "show --color-moved" helps to see that these two helper functions are moved across files without any change other than the names, which is good. > @@ -1041,8 +927,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix) > if (argc == 2) > dir = xstrdup(argv[1]); > else > - dir = guess_dir_name(repo_name, is_bundle, option_bare); > - strip_trailing_slashes(dir); > + dir = guess_dir_name_from_git_url(repo_name, is_bundle, option_bare); > + strip_dir_trailing_slashes(dir); So, what does this new public helper function guess? The name of the function says it guesses a directory name, but it is not just any directory name, but a directory with some specific meaning. Here repo_name has the URL the user gave "git clone", and even though there are some code before this part that computed on the variable, it hasn't been modified. And "from_git_url" is a good way to indicate that that is one of the input the function uses to guess the name of "the directory with some unknown specific meaning". I think this codepath wants the new directory to create as the result of "git clone" operation in "dir". So, even though I still do not have a good answer to the earlier "this is not just any directory but with some specific meaning---what is it?" question, adjectives that are appropriate for the "directory" to answer it may be along the lines of "new", "resulting", "cloned", etc. > diff --git a/dir.h b/dir.h > index b3e1a54a97..76441dde2d 100644 > --- a/dir.h > +++ b/dir.h > @@ -453,6 +453,9 @@ static inline int is_dot_or_dotdot(const char *name) > > int is_empty_dir(const char *dir); > We would want docstring here for the function (and possibly rename the function to clarify what kind of "dir" we are talking about). > +char *guess_dir_name_from_git_url(const char *repo, int is_bundle, int is_bare); > +void strip_dir_trailing_slashes(char *dir); > + > void setup_standard_excludes(struct dir_struct *dir); > > char *get_sparse_checkout_filename(void); Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Atharva Raykar <raykar.ath@gmail.com> writes: > >> These functions can be useful to other parts of Git. Let's move them to >> dir.c, while renaming them to be make their functionality more explicit. > > Hmph, guess_dir_name_from_git_url() is not all that more clarifying > than the original, at least to me. For a file-scope static helper, > it probably was good enough with a short name with no function doc, > but we should describe what it does in comments in dir.h and come up > with a suitable name, taking input from that description. > > [...] > >> @@ -1041,8 +927,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix) >> if (argc == 2) >> dir = xstrdup(argv[1]); >> else >> - dir = guess_dir_name(repo_name, is_bundle, option_bare); >> - strip_trailing_slashes(dir); >> + dir = guess_dir_name_from_git_url(repo_name, is_bundle, option_bare); >> + strip_dir_trailing_slashes(dir); > > So, what does this new public helper function guess? The name of > the function says it guesses a directory name, but it is not just > any directory name, but a directory with some specific meaning. > > Here repo_name has the URL the user gave "git clone", and even > though there are some code before this part that computed on the > variable, it hasn't been modified. And "from_git_url" is a good way > to indicate that that is one of the input the function uses to guess > the name of "the directory with some unknown specific meaning". > > I think this codepath wants the new directory to create as the > result of "git clone" operation in "dir". So, even though I still > do not have a good answer to the earlier "this is not just any > directory but with some specific meaning---what is it?" question, > adjectives that are appropriate for the "directory" to answer it > may be along the lines of "new", "resulting", "cloned", etc. Naming things is hard... Maybe the right phrase would be 'target directory'? We are creating a target directory name by looking at the "humanish" part of the Git URL. I think the intention of all callers of this function is to get a "default" directory name which will be used as the target of some operation in the absence of the user providing one. So maybe the name could be: 'guess_target_dir_from_git_url()' This would make sense for any operation now or in the future that wants to reuse this functionality. >> diff --git a/dir.h b/dir.h >> index b3e1a54a97..76441dde2d 100644 >> --- a/dir.h >> +++ b/dir.h >> @@ -453,6 +453,9 @@ static inline int is_dot_or_dotdot(const char *name) >> >> int is_empty_dir(const char *dir); >> > > We would want docstring here for the function (and possibly rename > the function to clarify what kind of "dir" we are talking about). Okay, I will add it. >> +char *guess_dir_name_from_git_url(const char *repo, int is_bundle, int is_bare); >> +void strip_dir_trailing_slashes(char *dir); >> + >> void setup_standard_excludes(struct dir_struct *dir); >> >> char *get_sparse_checkout_filename(void); > > Thanks.
Atharva Raykar <raykar.ath@gmail.com> writes: > Naming things is hard... Absolutely. > Maybe the right phrase would be 'target directory'? We are creating a > target directory name by looking at the "humanish" part of the Git URL. > > I think the intention of all callers of this function is to get a > "default" directory name which will be used as the target of some > operation in the absence of the user providing one. > > So maybe the name could be: 'guess_target_dir_from_git_url()' I have no immediate objection to the name. Just to see how people (including you) may react to a name from a completely different line of thinking, let me throw this, though. How does git_url_basename() sound? Instead of saying what we'd use it for (i.e. as the name for the directory getting created), we say what we compute. We take a URL-looking thing that is used by Git, and we compute something like basename() but that is tailored for Git (e.g. unlike "basename a/b/c.git" that yields "c.git", we give "c" for "a/b/c.git". Likewise "<scheme>://a/b/c/.git" won't yield ".git", we compute "c"). Having said that, I think guess_target_dir_from_git_url() is clear enough. > This would make sense for any operation now or in the future that wants > to reuse this functionality. That is mostly for you to decide. I can help you sanity check the proposed name(s) with existing callers, but you'd be a better judge for callers you'll be adding ;-)
Junio C Hamano <gitster@pobox.com> writes: > Atharva Raykar <raykar.ath@gmail.com> writes: > >> Naming things is hard... > > Absolutely. > >> Maybe the right phrase would be 'target directory'? We are creating a >> target directory name by looking at the "humanish" part of the Git URL. >> >> I think the intention of all callers of this function is to get a >> "default" directory name which will be used as the target of some >> operation in the absence of the user providing one. >> >> So maybe the name could be: 'guess_target_dir_from_git_url()' > > I have no immediate objection to the name. > > Just to see how people (including you) may react to a name from a > completely different line of thinking, let me throw this, though. > > How does git_url_basename() sound? > > Instead of saying what we'd use it for (i.e. as the name for the > directory getting created), we say what we compute. We take a > URL-looking thing that is used by Git, and we compute something like > basename() but that is tailored for Git (e.g. unlike "basename > a/b/c.git" that yields "c.git", we give "c" for "a/b/c.git". > Likewise "<scheme>://a/b/c/.git" won't yield ".git", we compute > "c"). > > Having said that, I think guess_target_dir_from_git_url() is clear > enough. Even though the name I suggested is clear enough, I liked your suggestion a lot more. Not only is it more succinct, but it also opens up the future possibility that the basename of a Git URL might be used for purposes other than finding a directory name. >> This would make sense for any operation now or in the future that wants >> to reuse this functionality. > > That is mostly for you to decide. I can help you sanity check the > proposed name(s) with existing callers, but you'd be a better judge > for callers you'll be adding ;-)
diff --git a/builtin/clone.c b/builtin/clone.c index 66fe66679c..b9b59a838f 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -217,120 +217,6 @@ static char *get_repo_path(const char *repo, int *is_bundle) return canon; } -static char *guess_dir_name(const char *repo, int is_bundle, int is_bare) -{ - const char *end = repo + strlen(repo), *start, *ptr; - size_t len; - char *dir; - - /* - * Skip scheme. - */ - start = strstr(repo, "://"); - if (start == NULL) - start = repo; - else - start += 3; - - /* - * Skip authentication data. The stripping does happen - * greedily, such that we strip up to the last '@' inside - * the host part. - */ - for (ptr = start; ptr < end && !is_dir_sep(*ptr); ptr++) { - if (*ptr == '@') - start = ptr + 1; - } - - /* - * Strip trailing spaces, slashes and /.git - */ - while (start < end && (is_dir_sep(end[-1]) || isspace(end[-1]))) - end--; - if (end - start > 5 && is_dir_sep(end[-5]) && - !strncmp(end - 4, ".git", 4)) { - end -= 5; - while (start < end && is_dir_sep(end[-1])) - end--; - } - - /* - * Strip trailing port number if we've got only a - * hostname (that is, there is no dir separator but a - * colon). This check is required such that we do not - * strip URI's like '/foo/bar:2222.git', which should - * result in a dir '2222' being guessed due to backwards - * compatibility. - */ - if (memchr(start, '/', end - start) == NULL - && memchr(start, ':', end - start) != NULL) { - ptr = end; - while (start < ptr && isdigit(ptr[-1]) && ptr[-1] != ':') - ptr--; - if (start < ptr && ptr[-1] == ':') - end = ptr - 1; - } - - /* - * Find last component. To remain backwards compatible we - * also regard colons as path separators, such that - * cloning a repository 'foo:bar.git' would result in a - * directory 'bar' being guessed. - */ - ptr = end; - while (start < ptr && !is_dir_sep(ptr[-1]) && ptr[-1] != ':') - ptr--; - start = ptr; - - /* - * Strip .{bundle,git}. - */ - len = end - start; - strip_suffix_mem(start, &len, is_bundle ? ".bundle" : ".git"); - - if (!len || (len == 1 && *start == '/')) - die(_("No directory name could be guessed.\n" - "Please specify a directory on the command line")); - - if (is_bare) - dir = xstrfmt("%.*s.git", (int)len, start); - else - dir = xstrndup(start, len); - /* - * Replace sequences of 'control' characters and whitespace - * with one ascii space, remove leading and trailing spaces. - */ - if (*dir) { - char *out = dir; - int prev_space = 1 /* strip leading whitespace */; - for (end = dir; *end; ++end) { - char ch = *end; - if ((unsigned char)ch < '\x20') - ch = '\x20'; - if (isspace(ch)) { - if (prev_space) - continue; - prev_space = 1; - } else - prev_space = 0; - *out++ = ch; - } - *out = '\0'; - if (out > dir && prev_space) - out[-1] = '\0'; - } - return dir; -} - -static void strip_trailing_slashes(char *dir) -{ - char *end = dir + strlen(dir); - - while (dir < end - 1 && is_dir_sep(end[-1])) - end--; - *end = '\0'; -} - static int add_one_reference(struct string_list_item *item, void *cb_data) { struct strbuf err = STRBUF_INIT; @@ -1041,8 +927,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix) if (argc == 2) dir = xstrdup(argv[1]); else - dir = guess_dir_name(repo_name, is_bundle, option_bare); - strip_trailing_slashes(dir); + dir = guess_dir_name_from_git_url(repo_name, is_bundle, option_bare); + strip_dir_trailing_slashes(dir); dest_exists = path_exists(dir); if (dest_exists && !is_empty_dir(dir)) diff --git a/dir.c b/dir.c index 03c4d21267..84b47c4dbc 100644 --- a/dir.c +++ b/dir.c @@ -2970,6 +2970,120 @@ int is_empty_dir(const char *path) return ret; } +char *guess_dir_name_from_git_url(const char *repo, int is_bundle, int is_bare) +{ + const char *end = repo + strlen(repo), *start, *ptr; + size_t len; + char *dir; + + /* + * Skip scheme. + */ + start = strstr(repo, "://"); + if (start == NULL) + start = repo; + else + start += 3; + + /* + * Skip authentication data. The stripping does happen + * greedily, such that we strip up to the last '@' inside + * the host part. + */ + for (ptr = start; ptr < end && !is_dir_sep(*ptr); ptr++) { + if (*ptr == '@') + start = ptr + 1; + } + + /* + * Strip trailing spaces, slashes and /.git + */ + while (start < end && (is_dir_sep(end[-1]) || isspace(end[-1]))) + end--; + if (end - start > 5 && is_dir_sep(end[-5]) && + !strncmp(end - 4, ".git", 4)) { + end -= 5; + while (start < end && is_dir_sep(end[-1])) + end--; + } + + /* + * Strip trailing port number if we've got only a + * hostname (that is, there is no dir separator but a + * colon). This check is required such that we do not + * strip URI's like '/foo/bar:2222.git', which should + * result in a dir '2222' being guessed due to backwards + * compatibility. + */ + if (memchr(start, '/', end - start) == NULL + && memchr(start, ':', end - start) != NULL) { + ptr = end; + while (start < ptr && isdigit(ptr[-1]) && ptr[-1] != ':') + ptr--; + if (start < ptr && ptr[-1] == ':') + end = ptr - 1; + } + + /* + * Find last component. To remain backwards compatible we + * also regard colons as path separators, such that + * cloning a repository 'foo:bar.git' would result in a + * directory 'bar' being guessed. + */ + ptr = end; + while (start < ptr && !is_dir_sep(ptr[-1]) && ptr[-1] != ':') + ptr--; + start = ptr; + + /* + * Strip .{bundle,git}. + */ + len = end - start; + strip_suffix_mem(start, &len, is_bundle ? ".bundle" : ".git"); + + if (!len || (len == 1 && *start == '/')) + die(_("No directory name could be guessed.\n" + "Please specify a directory on the command line")); + + if (is_bare) + dir = xstrfmt("%.*s.git", (int)len, start); + else + dir = xstrndup(start, len); + /* + * Replace sequences of 'control' characters and whitespace + * with one ascii space, remove leading and trailing spaces. + */ + if (*dir) { + char *out = dir; + int prev_space = 1 /* strip leading whitespace */; + for (end = dir; *end; ++end) { + char ch = *end; + if ((unsigned char)ch < '\x20') + ch = '\x20'; + if (isspace(ch)) { + if (prev_space) + continue; + prev_space = 1; + } else + prev_space = 0; + *out++ = ch; + } + *out = '\0'; + if (out > dir && prev_space) + out[-1] = '\0'; + } + return dir; +} + +void strip_dir_trailing_slashes(char *dir) +{ + char *end = dir + strlen(dir); + + while (dir < end - 1 && is_dir_sep(end[-1])) + end--; + *end = '\0'; +} + static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up) { DIR *dir; diff --git a/dir.h b/dir.h index b3e1a54a97..76441dde2d 100644 --- a/dir.h +++ b/dir.h @@ -453,6 +453,9 @@ static inline int is_dot_or_dotdot(const char *name) int is_empty_dir(const char *dir); +char *guess_dir_name_from_git_url(const char *repo, int is_bundle, int is_bare); +void strip_dir_trailing_slashes(char *dir); + void setup_standard_excludes(struct dir_struct *dir); char *get_sparse_checkout_filename(void);
These functions can be useful to other parts of Git. Let's move them to dir.c, while renaming them to be make their functionality more explicit. Signed-off-by: Atharva Raykar <raykar.ath@gmail.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Shourya Shukla <shouryashukla.oo@gmail.com> --- builtin/clone.c | 118 +----------------------------------------------- dir.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++ dir.h | 3 ++ 3 files changed, 119 insertions(+), 116 deletions(-)