diff mbox series

[2/3] setup: add discover_git_directory_reason()

Message ID fbba6252aeabc9a77d8529e4725feb3ea995545f.1692025937.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series scalar: two downstream improvements | expand

Commit Message

Derrick Stolee Aug. 14, 2023, 3:12 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

There are many reasons why discovering a Git directory may fail. In
particular, 8959555cee7 (setup_git_directory(): add an owner check for
the top-level directory, 2022-03-02) added ownership checks as a
security precaution.

Callers attempting to set up a Git directory may want to inform the user
about the reason for the failure. For that, expose the enum
discovery_result from within setup.c and into cache.h where
discover_git_directory() is defined.

I initially wanted to change the return type of discover_git_directory()
to be this enum, but several callers rely upon the "zero means success".
The two problems with this are:

1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive
   results are errors.

2. There are multiple successful states, so some positive results are
   successful.

Instead of updating all callers immediately, add a new method,
discover_git_directory_reason(), and convert discover_git_directory() to
be a thin shim on top of it.

Because there are extra checks that discover_git_directory_reason() does
after setup_git_directory_gently_1(), there are other modes that can be
returned for failure states. Add these modes to the enum, but be sure to
explicitly add them as BUG() states in the switch of
setup_git_directory_gently().

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 setup.c | 32 +++++++++++---------------------
 setup.h | 32 ++++++++++++++++++++++++++++++--
 2 files changed, 41 insertions(+), 23 deletions(-)

Comments

Junio C Hamano Aug. 14, 2023, 4:29 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> There are many reasons why discovering a Git directory may fail. In
> particular, 8959555cee7 (setup_git_directory(): add an owner check for
> the top-level directory, 2022-03-02) added ownership checks as a
> security precaution.
>
> Callers attempting to set up a Git directory may want to inform the user
> about the reason for the failure. For that, expose the enum
> discovery_result from within setup.c and into cache.h where
> discover_git_directory() is defined.
>
> I initially wanted to change the return type of discover_git_directory()
> to be this enum, but several callers rely upon the "zero means success".
> The two problems with this are:
>
> 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive
>    results are errors.

True. discover_git_directory() already knows that negative return
values from setup_git_directory_gently_1() signal errors while 0 or
positive are OK.

> 2. There are multiple successful states, so some positive results are
>    successful.

Makes it sound as if some positive results are not successes, but is
that really the case?

> Instead of updating all callers immediately, add a new method,
> discover_git_directory_reason(), and convert discover_git_directory() to
> be a thin shim on top of it.

It makes sense to insulate callers who only want to know if the
discovery was successful or not (there only are two existing callers
anyway) from the details.  And turning a thin wrapper to the new API
that gives richer return codes is the way to go.  Nicely designed.

> Because there are extra checks that discover_git_directory_reason() does
> after setup_git_directory_gently_1(), there are other modes that can be
> returned for failure states. Add these modes to the enum, but be sure to
> explicitly add them as BUG() states in the switch of
> setup_git_directory_gently().

Good.

> -enum discovery_result {
> -	GIT_DIR_NONE = 0,
> -	GIT_DIR_EXPLICIT,
> -	GIT_DIR_DISCOVERED,
> -	GIT_DIR_BARE,
> -	/* these are errors */
> -	GIT_DIR_HIT_CEILING = -1,
> -	GIT_DIR_HIT_MOUNT_POINT = -2,
> -	GIT_DIR_INVALID_GITFILE = -3,
> -	GIT_DIR_INVALID_OWNERSHIP = -4,
> -	GIT_DIR_DISALLOWED_BARE = -5,
> -};

So we promote this discovery_result, that was private implementation
detail inside the setup code, to a public interface.  Is GIT_DIR_
prefix still appropriate, or would it make more sense to have a
common substring derived from the word DISCOVERY in them?

> @@ -1385,21 +1372,22 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
>  	}
>  }
>  
> -int discover_git_directory(struct strbuf *commondir,
> -			   struct strbuf *gitdir)
> +enum discovery_result discover_git_directory_reason(struct strbuf *commondir,
> +						    struct strbuf *gitdir)
>  {
>  	struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT;
>  	size_t gitdir_offset = gitdir->len, cwd_len;
>  	size_t commondir_offset = commondir->len;
>  	struct repository_format candidate = REPOSITORY_FORMAT_INIT;
> +	enum discovery_result result;
>  
>  	if (strbuf_getcwd(&dir))
> -		return -1;
> +		return GIT_DIR_CWD_FAILURE;

Makes sense.

>  	cwd_len = dir.len;
> -	if (setup_git_directory_gently_1(&dir, gitdir, NULL, 0) <= 0) {
> +	if ((result = setup_git_directory_gently_1(&dir, gitdir, NULL, 0)) <= 0) {

Can we split this into two simple statements?

	result = setup_git_directory_gently_1(...);
	if (result <= 0) {

>  		strbuf_release(&dir);
> -		return -1;
> +		return result;
>  	}

OK.

> @@ -1429,11 +1417,11 @@ int discover_git_directory(struct strbuf *commondir,
>  		strbuf_setlen(commondir, commondir_offset);
>  		strbuf_setlen(gitdir, gitdir_offset);
>  		clear_repository_format(&candidate);
> -		return -1;
> +		return GIT_DIR_INVALID_FORMAT;

OK, so this thing is new.  Earlier we thought we found a good
GIT_DIR but it turns out the repository is something we cannot use.
Over time we may acquire such a "now we found it, is it really good?"
sanity checks, but for now, this is the only case that turns what
gently_1() thought was good into a bad one.  OK.

>  	}
>  
>  	clear_repository_format(&candidate);
> -	return 0;
> +	return result;

And it makes perfect sense that everybody who passed such a "post
discovery check" are OK and we return the result from gently_1().

> @@ -1516,9 +1504,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  		*nongit_ok = 1;
>  		break;
>  	case GIT_DIR_NONE:
> +	case GIT_DIR_CWD_FAILURE:
> +	case GIT_DIR_INVALID_FORMAT:
>  		/*
>  		 * As a safeguard against setup_git_directory_gently_1 returning
> -		 * this value, fallthrough to BUG. Otherwise it is possible to
> +		 * these values, fallthrough to BUG. Otherwise it is possible to
>  		 * set startup_info->have_repository to 1 when we did nothing to
>  		 * find a repository.
>  		 */

OK.

Not a new problem, but does anybody explicitly or implicitly return
DIR_NONE?  I didn't find any codepath that does so.  Presumably it
may have been arranged in the hope that a code structured like so:

	enum discovery_result res = GIT_DIR_NONE;

	if (some complex condition)
		res = ...;
	else if (another complex condition)
		res = ...;

	... sometime later ...
	if (res <= 0)
		we found a bad one

would ensure that "res" untouched by setup_git_directory_gently_1()
is still an error, but I am not sure if it is effective, given that
nobody uses GIT_DIR_NONE to assign or initialize anything.  And the
same effect can be had by leaving 'res' uninitialized---the compilers
are our friend.

Not a part of this review, but I wonder if it makes sense for us to
get rid of DIR_NONE.

> -int discover_git_directory(struct strbuf *commondir,
> -			   struct strbuf *gitdir);
> +static inline int discover_git_directory(struct strbuf *commondir,
> +					 struct strbuf *gitdir)
> +{
> +	return discover_git_directory_reason(commondir, gitdir) <= 0;
> +}

The _reason() thing is more or less like setup_git_directory_gently_1()
in that positives are success and everything else is an error.  And
the point of keeping this thin wrapper as discover_git_directory() is
to insulate the existing callers that discover_git_directory() returns
-1 for failure, while returning 0 for success.

So this wrapper smells very wrong.  It now flips the polarity of the
error return into a positive 1, no?  That does not achieve the goal
to insulate the callers from the change in implementation.

Other than that, as you wrote in the cover letter, I think it is an
excellent move to have an interface to expose details of what
discovery has found.

Thanks.
Junio C Hamano Aug. 14, 2023, 4:55 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

>> 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive
>>    results are errors.
>
> True. discover_git_directory() already knows that negative return
> values from setup_git_directory_gently_1() signal errors while 0 or
> positive are OK.

NOnononono.  negative are not.  0 is not returned, so if we saw one,
it would be an error.  And positives are OK.

Sorry for the confusion.
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index 18927a847b8..bbf8f684d93 100644
--- a/setup.c
+++ b/setup.c
@@ -1221,19 +1221,6 @@  static const char *allowed_bare_repo_to_string(
 	return NULL;
 }
 
-enum discovery_result {
-	GIT_DIR_NONE = 0,
-	GIT_DIR_EXPLICIT,
-	GIT_DIR_DISCOVERED,
-	GIT_DIR_BARE,
-	/* these are errors */
-	GIT_DIR_HIT_CEILING = -1,
-	GIT_DIR_HIT_MOUNT_POINT = -2,
-	GIT_DIR_INVALID_GITFILE = -3,
-	GIT_DIR_INVALID_OWNERSHIP = -4,
-	GIT_DIR_DISALLOWED_BARE = -5,
-};
-
 /*
  * We cannot decide in this function whether we are in the work tree or
  * not, since the config can only be read _after_ this function was called.
@@ -1385,21 +1372,22 @@  static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 	}
 }
 
-int discover_git_directory(struct strbuf *commondir,
-			   struct strbuf *gitdir)
+enum discovery_result discover_git_directory_reason(struct strbuf *commondir,
+						    struct strbuf *gitdir)
 {
 	struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT;
 	size_t gitdir_offset = gitdir->len, cwd_len;
 	size_t commondir_offset = commondir->len;
 	struct repository_format candidate = REPOSITORY_FORMAT_INIT;
+	enum discovery_result result;
 
 	if (strbuf_getcwd(&dir))
-		return -1;
+		return GIT_DIR_CWD_FAILURE;
 
 	cwd_len = dir.len;
-	if (setup_git_directory_gently_1(&dir, gitdir, NULL, 0) <= 0) {
+	if ((result = setup_git_directory_gently_1(&dir, gitdir, NULL, 0)) <= 0) {
 		strbuf_release(&dir);
-		return -1;
+		return result;
 	}
 
 	/*
@@ -1429,11 +1417,11 @@  int discover_git_directory(struct strbuf *commondir,
 		strbuf_setlen(commondir, commondir_offset);
 		strbuf_setlen(gitdir, gitdir_offset);
 		clear_repository_format(&candidate);
-		return -1;
+		return GIT_DIR_INVALID_FORMAT;
 	}
 
 	clear_repository_format(&candidate);
-	return 0;
+	return result;
 }
 
 const char *setup_git_directory_gently(int *nongit_ok)
@@ -1516,9 +1504,11 @@  const char *setup_git_directory_gently(int *nongit_ok)
 		*nongit_ok = 1;
 		break;
 	case GIT_DIR_NONE:
+	case GIT_DIR_CWD_FAILURE:
+	case GIT_DIR_INVALID_FORMAT:
 		/*
 		 * As a safeguard against setup_git_directory_gently_1 returning
-		 * this value, fallthrough to BUG. Otherwise it is possible to
+		 * these values, fallthrough to BUG. Otherwise it is possible to
 		 * set startup_info->have_repository to 1 when we did nothing to
 		 * find a repository.
 		 */
diff --git a/setup.h b/setup.h
index 58fd2605dd2..b87d0d6fb2b 100644
--- a/setup.h
+++ b/setup.h
@@ -42,6 +42,30 @@  const char *resolve_gitdir_gently(const char *suspect, int *return_error_code);
 #define resolve_gitdir(path) resolve_gitdir_gently((path), NULL)
 
 void setup_work_tree(void);
+
+/*
+ * discover_git_directory_reason() is similar to discover_git_directory(),
+ * except it returns an enum value instead. It is important to note that
+ * a zero-valued return here is actually GIT_DIR_NONE, which is different
+ * from discover_git_directory.
+ */
+enum discovery_result {
+	GIT_DIR_NONE = 0,
+	GIT_DIR_EXPLICIT,
+	GIT_DIR_DISCOVERED,
+	GIT_DIR_BARE,
+	/* these are errors */
+	GIT_DIR_HIT_CEILING = -1,
+	GIT_DIR_HIT_MOUNT_POINT = -2,
+	GIT_DIR_INVALID_GITFILE = -3,
+	GIT_DIR_INVALID_OWNERSHIP = -4,
+	GIT_DIR_DISALLOWED_BARE = -5,
+	GIT_DIR_INVALID_FORMAT = -6,
+	GIT_DIR_CWD_FAILURE = -7,
+};
+enum discovery_result discover_git_directory_reason(struct strbuf *commondir,
+						    struct strbuf *gitdir);
+
 /*
  * Find the commondir and gitdir of the repository that contains the current
  * working directory, without changing the working directory or other global
@@ -50,8 +74,12 @@  void setup_work_tree(void);
  * both have the same result appended to the buffer.  The return value is
  * either 0 upon success and non-zero if no repository was found.
  */
-int discover_git_directory(struct strbuf *commondir,
-			   struct strbuf *gitdir);
+static inline int discover_git_directory(struct strbuf *commondir,
+					 struct strbuf *gitdir)
+{
+	return discover_git_directory_reason(commondir, gitdir) <= 0;
+}
+
 const char *setup_git_directory_gently(int *);
 const char *setup_git_directory(void);
 char *prefix_path(const char *prefix, int len, const char *path);