diff mbox series

[GSoC,v2,4/9] dir: libify and export helper functions from clone.c

Message ID 20210805074054.29916-5-raykar.ath@gmail.com (mailing list archive)
State Superseded
Headers show
Series submodule: convert the rest of 'add' to C | expand

Commit Message

Atharva Raykar Aug. 5, 2021, 7:40 a.m. UTC
These functions can be useful to other parts of Git. Let's move them to
dir.c, while renaming them to be make their functionality more explicit.

Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 builtin/clone.c | 118 +-----------------------------------------------
 dir.c           | 114 ++++++++++++++++++++++++++++++++++++++++++++++
 dir.h           |   3 ++
 3 files changed, 119 insertions(+), 116 deletions(-)

Comments

Junio C Hamano Aug. 5, 2021, 8:37 p.m. UTC | #1
Atharva Raykar <raykar.ath@gmail.com> writes:

> These functions can be useful to other parts of Git. Let's move them to
> dir.c, while renaming them to be make their functionality more explicit.

Hmph, guess_dir_name_from_git_url() is not all that more clarifying
than the original, at least to me.  For a file-scope static helper,
it probably was good enough with a short name with no function doc,
but we should describe what it does in comments in dir.h and come up
with a suitable name, taking input from that description.

> Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> ---
>  builtin/clone.c | 118 +-----------------------------------------------
>  dir.c           | 114 ++++++++++++++++++++++++++++++++++++++++++++++
>  dir.h           |   3 ++
>  3 files changed, 119 insertions(+), 116 deletions(-)

Again "show --color-moved" helps to see that these two helper
functions are moved across files without any change other than the
names, which is good.

> @@ -1041,8 +927,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	if (argc == 2)
>  		dir = xstrdup(argv[1]);
>  	else
> -		dir = guess_dir_name(repo_name, is_bundle, option_bare);
> -	strip_trailing_slashes(dir);
> +		dir = guess_dir_name_from_git_url(repo_name, is_bundle, option_bare);
> +	strip_dir_trailing_slashes(dir);

So, what does this new public helper function guess?  The name of
the function says it guesses a directory name, but it is not just
any directory name, but a directory with some specific meaning.

Here repo_name has the URL the user gave "git clone", and even
though there are some code before this part that computed on the
variable, it hasn't been modified.  And "from_git_url" is a good way
to indicate that that is one of the input the function uses to guess
the name of "the directory with some unknown specific meaning".

I think this codepath wants the new directory to create as the
result of "git clone" operation in "dir".  So, even though I still
do not have a good answer to the earlier "this is not just any
directory but with some specific meaning---what is it?" question,
adjectives that are appropriate for the "directory" to answer it
may be along the lines of "new", "resulting", "cloned", etc.

> diff --git a/dir.h b/dir.h
> index b3e1a54a97..76441dde2d 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -453,6 +453,9 @@ static inline int is_dot_or_dotdot(const char *name)
>  
>  int is_empty_dir(const char *dir);
>  

We would want docstring here for the function (and possibly rename
the function to clarify what kind of "dir" we are talking about).


> +char *guess_dir_name_from_git_url(const char *repo, int is_bundle, int is_bare);
> +void strip_dir_trailing_slashes(char *dir);
> +
>  void setup_standard_excludes(struct dir_struct *dir);
>  
>  char *get_sparse_checkout_filename(void);

Thanks.
Atharva Raykar Aug. 6, 2021, 11:12 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Atharva Raykar <raykar.ath@gmail.com> writes:
>
>> These functions can be useful to other parts of Git. Let's move them to
>> dir.c, while renaming them to be make their functionality more explicit.
>
> Hmph, guess_dir_name_from_git_url() is not all that more clarifying
> than the original, at least to me.  For a file-scope static helper,
> it probably was good enough with a short name with no function doc,
> but we should describe what it does in comments in dir.h and come up
> with a suitable name, taking input from that description.
>
> [...]
>
>> @@ -1041,8 +927,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>>  	if (argc == 2)
>>  		dir = xstrdup(argv[1]);
>>  	else
>> -		dir = guess_dir_name(repo_name, is_bundle, option_bare);
>> -	strip_trailing_slashes(dir);
>> +		dir = guess_dir_name_from_git_url(repo_name, is_bundle, option_bare);
>> +	strip_dir_trailing_slashes(dir);
>
> So, what does this new public helper function guess?  The name of
> the function says it guesses a directory name, but it is not just
> any directory name, but a directory with some specific meaning.
>
> Here repo_name has the URL the user gave "git clone", and even
> though there are some code before this part that computed on the
> variable, it hasn't been modified.  And "from_git_url" is a good way
> to indicate that that is one of the input the function uses to guess
> the name of "the directory with some unknown specific meaning".
>
> I think this codepath wants the new directory to create as the
> result of "git clone" operation in "dir".  So, even though I still
> do not have a good answer to the earlier "this is not just any
> directory but with some specific meaning---what is it?" question,
> adjectives that are appropriate for the "directory" to answer it
> may be along the lines of "new", "resulting", "cloned", etc.

Naming things is hard...

Maybe the right phrase would be 'target directory'? We are creating a
target directory name by looking at the "humanish" part of the Git URL.

I think the intention of all callers of this function is to get a
"default" directory name which will be used as the target of some
operation in the absence of the user providing one.

So maybe the name could be: 'guess_target_dir_from_git_url()'

This would make sense for any operation now or in the future that wants
to reuse this functionality.

>> diff --git a/dir.h b/dir.h
>> index b3e1a54a97..76441dde2d 100644
>> --- a/dir.h
>> +++ b/dir.h
>> @@ -453,6 +453,9 @@ static inline int is_dot_or_dotdot(const char *name)
>>
>>  int is_empty_dir(const char *dir);
>>
>
> We would want docstring here for the function (and possibly rename
> the function to clarify what kind of "dir" we are talking about).

Okay, I will add it.

>> +char *guess_dir_name_from_git_url(const char *repo, int is_bundle, int is_bare);
>> +void strip_dir_trailing_slashes(char *dir);
>> +
>>  void setup_standard_excludes(struct dir_struct *dir);
>>
>>  char *get_sparse_checkout_filename(void);
>
> Thanks.
Junio C Hamano Aug. 6, 2021, 4:36 p.m. UTC | #3
Atharva Raykar <raykar.ath@gmail.com> writes:

> Naming things is hard...

Absolutely.

> Maybe the right phrase would be 'target directory'? We are creating a
> target directory name by looking at the "humanish" part of the Git URL.
>
> I think the intention of all callers of this function is to get a
> "default" directory name which will be used as the target of some
> operation in the absence of the user providing one.
>
> So maybe the name could be: 'guess_target_dir_from_git_url()'

I have no immediate objection to the name.  

Just to see how people (including you) may react to a name from a
completely different line of thinking, let me throw this, though.

How does git_url_basename() sound?  

Instead of saying what we'd use it for (i.e. as the name for the
directory getting created), we say what we compute.  We take a
URL-looking thing that is used by Git, and we compute something like
basename() but that is tailored for Git (e.g. unlike "basename
a/b/c.git" that yields "c.git", we give "c" for "a/b/c.git".
Likewise "<scheme>://a/b/c/.git" won't yield ".git", we compute
"c").

Having said that, I think guess_target_dir_from_git_url() is clear
enough.

> This would make sense for any operation now or in the future that wants
> to reuse this functionality.

That is mostly for you to decide.   I can help you sanity check the
proposed name(s) with existing callers, but you'd be a better judge
for callers you'll be adding ;-)
Atharva Raykar Aug. 7, 2021, 7:15 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Atharva Raykar <raykar.ath@gmail.com> writes:
>
>> Naming things is hard...
>
> Absolutely.
>
>> Maybe the right phrase would be 'target directory'? We are creating a
>> target directory name by looking at the "humanish" part of the Git URL.
>>
>> I think the intention of all callers of this function is to get a
>> "default" directory name which will be used as the target of some
>> operation in the absence of the user providing one.
>>
>> So maybe the name could be: 'guess_target_dir_from_git_url()'
>
> I have no immediate objection to the name.
>
> Just to see how people (including you) may react to a name from a
> completely different line of thinking, let me throw this, though.
>
> How does git_url_basename() sound?
>
> Instead of saying what we'd use it for (i.e. as the name for the
> directory getting created), we say what we compute.  We take a
> URL-looking thing that is used by Git, and we compute something like
> basename() but that is tailored for Git (e.g. unlike "basename
> a/b/c.git" that yields "c.git", we give "c" for "a/b/c.git".
> Likewise "<scheme>://a/b/c/.git" won't yield ".git", we compute
> "c").
>
> Having said that, I think guess_target_dir_from_git_url() is clear
> enough.

Even though the name I suggested is clear enough, I liked your
suggestion a lot more. Not only is it more succinct, but it also opens
up the future possibility that the basename of a Git URL might be used
for purposes other than finding a directory name.

>> This would make sense for any operation now or in the future that wants
>> to reuse this functionality.
>
> That is mostly for you to decide.   I can help you sanity check the
> proposed name(s) with existing callers, but you'd be a better judge
> for callers you'll be adding ;-)
diff mbox series

Patch

diff --git a/builtin/clone.c b/builtin/clone.c
index 66fe66679c..b9b59a838f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -217,120 +217,6 @@  static char *get_repo_path(const char *repo, int *is_bundle)
 	return canon;
 }
 
-static char *guess_dir_name(const char *repo, int is_bundle, int is_bare)
-{
-	const char *end = repo + strlen(repo), *start, *ptr;
-	size_t len;
-	char *dir;
-
-	/*
-	 * Skip scheme.
-	 */
-	start = strstr(repo, "://");
-	if (start == NULL)
-		start = repo;
-	else
-		start += 3;
-
-	/*
-	 * Skip authentication data. The stripping does happen
-	 * greedily, such that we strip up to the last '@' inside
-	 * the host part.
-	 */
-	for (ptr = start; ptr < end && !is_dir_sep(*ptr); ptr++) {
-		if (*ptr == '@')
-			start = ptr + 1;
-	}
-
-	/*
-	 * Strip trailing spaces, slashes and /.git
-	 */
-	while (start < end && (is_dir_sep(end[-1]) || isspace(end[-1])))
-		end--;
-	if (end - start > 5 && is_dir_sep(end[-5]) &&
-	    !strncmp(end - 4, ".git", 4)) {
-		end -= 5;
-		while (start < end && is_dir_sep(end[-1]))
-			end--;
-	}
-
-	/*
-	 * Strip trailing port number if we've got only a
-	 * hostname (that is, there is no dir separator but a
-	 * colon). This check is required such that we do not
-	 * strip URI's like '/foo/bar:2222.git', which should
-	 * result in a dir '2222' being guessed due to backwards
-	 * compatibility.
-	 */
-	if (memchr(start, '/', end - start) == NULL
-	    && memchr(start, ':', end - start) != NULL) {
-		ptr = end;
-		while (start < ptr && isdigit(ptr[-1]) && ptr[-1] != ':')
-			ptr--;
-		if (start < ptr && ptr[-1] == ':')
-			end = ptr - 1;
-	}
-
-	/*
-	 * Find last component. To remain backwards compatible we
-	 * also regard colons as path separators, such that
-	 * cloning a repository 'foo:bar.git' would result in a
-	 * directory 'bar' being guessed.
-	 */
-	ptr = end;
-	while (start < ptr && !is_dir_sep(ptr[-1]) && ptr[-1] != ':')
-		ptr--;
-	start = ptr;
-
-	/*
-	 * Strip .{bundle,git}.
-	 */
-	len = end - start;
-	strip_suffix_mem(start, &len, is_bundle ? ".bundle" : ".git");
-
-	if (!len || (len == 1 && *start == '/'))
-		die(_("No directory name could be guessed.\n"
-		      "Please specify a directory on the command line"));
-
-	if (is_bare)
-		dir = xstrfmt("%.*s.git", (int)len, start);
-	else
-		dir = xstrndup(start, len);
-	/*
-	 * Replace sequences of 'control' characters and whitespace
-	 * with one ascii space, remove leading and trailing spaces.
-	 */
-	if (*dir) {
-		char *out = dir;
-		int prev_space = 1 /* strip leading whitespace */;
-		for (end = dir; *end; ++end) {
-			char ch = *end;
-			if ((unsigned char)ch < '\x20')
-				ch = '\x20';
-			if (isspace(ch)) {
-				if (prev_space)
-					continue;
-				prev_space = 1;
-			} else
-				prev_space = 0;
-			*out++ = ch;
-		}
-		*out = '\0';
-		if (out > dir && prev_space)
-			out[-1] = '\0';
-	}
-	return dir;
-}
-
-static void strip_trailing_slashes(char *dir)
-{
-	char *end = dir + strlen(dir);
-
-	while (dir < end - 1 && is_dir_sep(end[-1]))
-		end--;
-	*end = '\0';
-}
-
 static int add_one_reference(struct string_list_item *item, void *cb_data)
 {
 	struct strbuf err = STRBUF_INIT;
@@ -1041,8 +927,8 @@  int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (argc == 2)
 		dir = xstrdup(argv[1]);
 	else
-		dir = guess_dir_name(repo_name, is_bundle, option_bare);
-	strip_trailing_slashes(dir);
+		dir = guess_dir_name_from_git_url(repo_name, is_bundle, option_bare);
+	strip_dir_trailing_slashes(dir);
 
 	dest_exists = path_exists(dir);
 	if (dest_exists && !is_empty_dir(dir))
diff --git a/dir.c b/dir.c
index 03c4d21267..84b47c4dbc 100644
--- a/dir.c
+++ b/dir.c
@@ -2970,6 +2970,120 @@  int is_empty_dir(const char *path)
 	return ret;
 }
 
+char *guess_dir_name_from_git_url(const char *repo, int is_bundle, int is_bare)
+{
+	const char *end = repo + strlen(repo), *start, *ptr;
+	size_t len;
+	char *dir;
+
+	/*
+	 * Skip scheme.
+	 */
+	start = strstr(repo, "://");
+	if (start == NULL)
+		start = repo;
+	else
+		start += 3;
+
+	/*
+	 * Skip authentication data. The stripping does happen
+	 * greedily, such that we strip up to the last '@' inside
+	 * the host part.
+	 */
+	for (ptr = start; ptr < end && !is_dir_sep(*ptr); ptr++) {
+		if (*ptr == '@')
+			start = ptr + 1;
+	}
+
+	/*
+	 * Strip trailing spaces, slashes and /.git
+	 */
+	while (start < end && (is_dir_sep(end[-1]) || isspace(end[-1])))
+		end--;
+	if (end - start > 5 && is_dir_sep(end[-5]) &&
+	    !strncmp(end - 4, ".git", 4)) {
+		end -= 5;
+		while (start < end && is_dir_sep(end[-1]))
+			end--;
+	}
+
+	/*
+	 * Strip trailing port number if we've got only a
+	 * hostname (that is, there is no dir separator but a
+	 * colon). This check is required such that we do not
+	 * strip URI's like '/foo/bar:2222.git', which should
+	 * result in a dir '2222' being guessed due to backwards
+	 * compatibility.
+	 */
+	if (memchr(start, '/', end - start) == NULL
+	    && memchr(start, ':', end - start) != NULL) {
+		ptr = end;
+		while (start < ptr && isdigit(ptr[-1]) && ptr[-1] != ':')
+			ptr--;
+		if (start < ptr && ptr[-1] == ':')
+			end = ptr - 1;
+	}
+
+	/*
+	 * Find last component. To remain backwards compatible we
+	 * also regard colons as path separators, such that
+	 * cloning a repository 'foo:bar.git' would result in a
+	 * directory 'bar' being guessed.
+	 */
+	ptr = end;
+	while (start < ptr && !is_dir_sep(ptr[-1]) && ptr[-1] != ':')
+		ptr--;
+	start = ptr;
+
+	/*
+	 * Strip .{bundle,git}.
+	 */
+	len = end - start;
+	strip_suffix_mem(start, &len, is_bundle ? ".bundle" : ".git");
+
+	if (!len || (len == 1 && *start == '/'))
+		die(_("No directory name could be guessed.\n"
+		      "Please specify a directory on the command line"));
+
+	if (is_bare)
+		dir = xstrfmt("%.*s.git", (int)len, start);
+	else
+		dir = xstrndup(start, len);
+	/*
+	 * Replace sequences of 'control' characters and whitespace
+	 * with one ascii space, remove leading and trailing spaces.
+	 */
+	if (*dir) {
+		char *out = dir;
+		int prev_space = 1 /* strip leading whitespace */;
+		for (end = dir; *end; ++end) {
+			char ch = *end;
+			if ((unsigned char)ch < '\x20')
+				ch = '\x20';
+			if (isspace(ch)) {
+				if (prev_space)
+					continue;
+				prev_space = 1;
+			} else
+				prev_space = 0;
+			*out++ = ch;
+		}
+		*out = '\0';
+		if (out > dir && prev_space)
+			out[-1] = '\0';
+	}
+	return dir;
+}
+
+void strip_dir_trailing_slashes(char *dir)
+{
+	char *end = dir + strlen(dir);
+
+	while (dir < end - 1 && is_dir_sep(end[-1]))
+		end--;
+	*end = '\0';
+}
+
 static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up)
 {
 	DIR *dir;
diff --git a/dir.h b/dir.h
index b3e1a54a97..76441dde2d 100644
--- a/dir.h
+++ b/dir.h
@@ -453,6 +453,9 @@  static inline int is_dot_or_dotdot(const char *name)
 
 int is_empty_dir(const char *dir);
 
+char *guess_dir_name_from_git_url(const char *repo, int is_bundle, int is_bare);
+void strip_dir_trailing_slashes(char *dir);
+
 void setup_standard_excludes(struct dir_struct *dir);
 
 char *get_sparse_checkout_filename(void);