diff mbox series

[v2,2/3] setup: do not use invalid `repository_format`

Message ID a7e58385290e6789105d2d5b794f4bf8607285dd.1547488709.git.martin.agren@gmail.com (mailing list archive)
State New, archived
Headers show
Series setup: add `clear_repository_format()` | expand

Commit Message

Martin Ågren Jan. 14, 2019, 6:34 p.m. UTC
If `read_repository_format()` encounters an error, `format->version`
will be -1 and all other fields of `format` will be undefined. However,
in `setup_git_directory_gently()`, we use `repo_fmt.hash_algo`
regardless of the value of `repo_fmt.version`.

This can be observed by adding this to the end of
`read_repository_format()`:

	if (format->version == -1)
		format->hash_algo = 0; /* no-one should peek at this! */

This causes, e.g., "git branch -m q q2 without config should succeed" in
t3200 to fail with "fatal: Failed to resolve HEAD as a valid ref."
because it has moved .git/config out of the way and is now trying to use
a bad hash algorithm.

Check that `version` is non-negative before using `hash_algo`.

This patch adds no tests, but do note that if we skip this patch, the
next patch would cause existing tests to fail as outlined above.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jeff King Jan. 15, 2019, 7:31 p.m. UTC | #1
On Mon, Jan 14, 2019 at 07:34:56PM +0100, Martin Ågren wrote:

> If `read_repository_format()` encounters an error, `format->version`
> will be -1 and all other fields of `format` will be undefined. However,
> in `setup_git_directory_gently()`, we use `repo_fmt.hash_algo`
> regardless of the value of `repo_fmt.version`.
> 
> This can be observed by adding this to the end of
> `read_repository_format()`:
> 
> 	if (format->version == -1)
> 		format->hash_algo = 0; /* no-one should peek at this! */
> 
> This causes, e.g., "git branch -m q q2 without config should succeed" in
> t3200 to fail with "fatal: Failed to resolve HEAD as a valid ref."
> because it has moved .git/config out of the way and is now trying to use
> a bad hash algorithm.
> 
> Check that `version` is non-negative before using `hash_algo`.
> 
> This patch adds no tests, but do note that if we skip this patch, the
> next patch would cause existing tests to fail as outlined above.

I'm still somewhat confused about how this breaks. If we move
".git/config" out of the way, then we have no version indicator and
presumably we should guess GIT_HASH_SHA1. Which is what's happening if
we fail to call repo_set_hash_algo(), no?  In other words, wouldn't
repo_set_hash_algo() always be a noop in that case?

I get why adding the code snippet above would cause that assumption to
break, but I am just not sure why we would add that code snippet. ;)

I also get why read_repository_format() doing this in patch 3 would be a
problem:

  +       if (format->version == -1) {
  +               clear_repository_format(format);
  +               format->version = -1;
  +       }

but doesn't that point out that clear_repository_format() should be
setting hash_algo to GIT_HASH_SHA1 as the default (and likewise "bare =
-1", etc, that is done in that function)?

-Peff
Martin Ågren Jan. 17, 2019, 6:31 a.m. UTC | #2
On Tue, 15 Jan 2019 at 20:31, Jeff King <peff@peff.net> wrote:
>
> On Mon, Jan 14, 2019 at 07:34:56PM +0100, Martin Ågren wrote:
>
> > This can be observed by adding this to the end of
> > `read_repository_format()`:
> >
> >       if (format->version == -1)
> >               format->hash_algo = 0; /* no-one should peek at this! */
> >
> > Check that `version` is non-negative before using `hash_algo`.

> I'm still somewhat confused about how this breaks. If we move
> ".git/config" out of the way, then we have no version indicator and
> presumably we should guess GIT_HASH_SHA1. Which is what's happening if
> we fail to call repo_set_hash_algo(), no?  In other words, wouldn't
> repo_set_hash_algo() always be a noop in that case?
>
> I get why adding the code snippet above would cause that assumption to
> break, but I am just not sure why we would add that code snippet. ;)
>
> I also get why read_repository_format() doing this in patch 3 would be a
> problem:
>
>   +       if (format->version == -1) {
>   +               clear_repository_format(format);
>   +               format->version = -1;
>   +       }
>
> but doesn't that point out that clear_repository_format() should be
> setting hash_algo to GIT_HASH_SHA1 as the default (and likewise "bare =
> -1", etc, that is done in that function)?

Something like the below on top of this series (then rebased). (The last
hunk below is a revert of this patch.)

I'd like to think of the situation before this patch above as a
situation where the API promises something and the user uses the API
beyond that. The next patch in this series changes the internals of the
API in a way that is consistent with the promise made, but which ends up
affecting an over-eager user.

What this patch above does is to make the user do what the API promise
allows them to do, i.e., no more shortcuts. What you're saying is, why
isn't the promise stronger? So the user won't have to think as much?

So in particular, why doesn't `clear...()` and the error path in
`read_...()` impose sane, usable defaults? My first concern is that it
means we need to make a stronger promise, which might then be hard to
back away from, if we want to. Maybe we'll never want to...

My second concern is, what should we be falling back to, going forward?
At some point, the hash indicated by `REPOSITORY_FORMAT_INIT` will be
SHA-256. Before that, and as soon as we support both hashes, what if we
pick up SHA-256 before stumbling on some other piece of the config --
should we now reset the struct to indicate SHA-1, or rather keep the
SHA-256 value, which by itself is valid? (The same could be argued now,
for something other than hash functions, but the SHA-1/256 example might
be more obvious in the context of this patch.)

My third worry is that we should then equip `clear_...()` or at least
the error path of `read_...()` with some logic to keep "as much as
possible" of what we've picked up and reset the rest, all the while
making sure we don't end up with something self-contradicting or stupid.
After all, we'll have promised the users that they can ignore any errors
and just run ahead.

Maybe I'm worrying way too much, and I shouldn't be so afraid of making
a stronger promise here and now because of vague slippery-slope thinking.

Thanks for pushing back and forcing me to articulate my thinking.

Martin


diff --git a/cache.h b/cache.h
index 3ef63d27c4..acd86e9f9f 100644
--- a/cache.h
+++ b/cache.h
@@ -974,15 +974,21 @@ struct repository_format {
 
 /**
  * Always use this to initialize a `struct repository_format`
- * to a well-defined state before calling `read_repository()`.
+ * to a well-defined, default state before calling
+ * `read_repository()`.
  */
-#define REPOSITORY_FORMAT_INIT { 0 }
+#define REPOSITORY_FORMAT_INIT (struct repository_format){ \
+				 .version = -1, \
+				 .is_bare = -1, \
+				 .hash_algo = GIT_HASH_SHA1, \
+				 .unknown_extensions = STRING_LIST_INIT_DUP, \
+			       }
 
 /*
  * Read the repository format characteristics from the config file "path" into
  * "format" struct. Returns the numeric version. On error, -1 is returned,
  * format->version is set to -1, and all other fields in the struct are
- * undefined.
+ * set to the default configuration (REPOSITORY_FORMAT_INIT).
  */
 int read_repository_format(struct repository_format *format, const char *path);
 
diff --git a/setup.c b/setup.c
index 70d9007ae5..f3ea479ad9 100644
--- a/setup.c
+++ b/setup.c
@@ -511,15 +511,9 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 int read_repository_format(struct repository_format *format, const char *path)
 {
 	clear_repository_format(format);
-	format->version = -1;
-	format->is_bare = -1;
-	format->hash_algo = GIT_HASH_SHA1;
-	string_list_init(&format->unknown_extensions, 1);
 	git_config_from_file(check_repo_format, path, format);
-	if (format->version == -1) {
+	if (format->version == -1)
 		clear_repository_format(format);
-		format->version = -1;
-	}
 	return format->version;
 }
 
@@ -528,7 +522,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
-	memset(format, 0, sizeof(*format));
+	*format = REPOSITORY_FORMAT_INIT;
 }
 
 int verify_repository_format(const struct repository_format *format,
@@ -1152,7 +1146,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository && repo_fmt.version > -1)
+		if (startup_info->have_repository)
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 	}
Jeff King Jan. 22, 2019, 7:07 a.m. UTC | #3
On Thu, Jan 17, 2019 at 07:31:14AM +0100, Martin Ågren wrote:

> > I also get why read_repository_format() doing this in patch 3 would be a
> > problem:
> >
> >   +       if (format->version == -1) {
> >   +               clear_repository_format(format);
> >   +               format->version = -1;
> >   +       }
> >
> > but doesn't that point out that clear_repository_format() should be
> > setting hash_algo to GIT_HASH_SHA1 as the default (and likewise "bare =
> > -1", etc, that is done in that function)?
> 
> Something like the below on top of this series (then rebased). (The last
> hunk below is a revert of this patch.)

Yes, that's exactly what I had in mind. Usually our clear() functions
put the struct back into some default state from which it can be used
gain. But the state after clear() here (without the patch below) is
something that nobody is ever expected to look at.

Granted, the only function which fills it in is read_...(), and it sets
those defaults itself. But it just seems to me if we're going to have to
put _something_ in the struct to initialize or clear it, it might as
well be those.

> I'd like to think of the situation before this patch above as a
> situation where the API promises something and the user uses the API
> beyond that. The next patch in this series changes the internals of the
> API in a way that is consistent with the promise made, but which ends up
> affecting an over-eager user.

As with many parts of Git, there really isn't a clear promise. :) I
don't think you're wrong at all about the current state of things. I'm
mostly basing my comments on "what would I _expect_ the promise to be
based on our general patterns". If that's far from what we promise now,
then it's a hassle to convert. But I think it's actually pretty close.

> What this patch above does is to make the user do what the API promise
> allows them to do, i.e., no more shortcuts. What you're saying is, why
> isn't the promise stronger? So the user won't have to think as much?
> 
> So in particular, why doesn't `clear...()` and the error path in
> `read_...()` impose sane, usable defaults? My first concern is that it
> means we need to make a stronger promise, which might then be hard to
> back away from, if we want to. Maybe we'll never want to...

I'm not too worried about that personally. I think the more likely
problem is that the API is misunderstood and misused. ;)

> My second concern is, what should we be falling back to, going forward?
> At some point, the hash indicated by `REPOSITORY_FORMAT_INIT` will be
> SHA-256. Before that, and as soon as we support both hashes, what if we
> pick up SHA-256 before stumbling on some other piece of the config --
> should we now reset the struct to indicate SHA-1, or rather keep the
> SHA-256 value, which by itself is valid? (The same could be argued now,
> for something other than hash functions, but the SHA-1/256 example might
> be more obvious in the context of this patch.)

I'd think this would _always_ be sha-1. Because it's not about "what's
the default for this program running". It's about "what have I read from
this on-disk repo config". And the rule there is "if they don't say
otherwise, it is sha1". That won't change even in a sha256 world,
because we'll maintain backwards-compatibility with legacy repositories
forever.

Now if your next question is: "does any caller misuse this as more than
looking at the repo format", I don't know the answer for sure. That
would be worth poking at (or perhaps having just poked yourself, you
might have an idea already).

> My third worry is that we should then equip `clear_...()` or at least
> the error path of `read_...()` with some logic to keep "as much as
> possible" of what we've picked up and reset the rest, all the while
> making sure we don't end up with something self-contradicting or stupid.
> After all, we'll have promised the users that they can ignore any errors
> and just run ahead.

I think clear() should always throw everything away. Saving partial bits
from the error path of read() is harder. My gut says "no", but I agree
that's a trickier question. I think the real-world thing here is: we're
reading repo config and see an extensions.* field that says "use
sha256". But then we encounter an error, or don't otherwise have a
version. What do we do?

If that's an undefined setup (and I think it is -- if you're using
extensions.* you're supposed to always set the version field), then I
don't know that it really matters that much. But throwing the whole
thing away (even if it means a buggy code path is more likely to use
sha1) seems OK to me.

> Maybe I'm worrying way too much, and I shouldn't be so afraid of making
> a stronger promise here and now because of vague slippery-slope thinking.
> 
> Thanks for pushing back and forcing me to articulate my thinking.

For the record, I can live with it either way. There are so many funky
little setup corner cases in the code already, and we don't even really
have a real-world case to dissect at this point. So the right thing may
also just be to finish this patch series as quickly as possible and move
on to something more useful. :)

-Peff
Martin Ågren Jan. 22, 2019, 1:34 p.m. UTC | #4
On Tue, 22 Jan 2019 at 08:07, Jeff King <peff@peff.net> wrote:
>
> On Thu, Jan 17, 2019 at 07:31:14AM +0100, Martin Ågren wrote:
>
> > Something like the below on top of this series (then rebased). (The last
> > hunk below is a revert of this patch.)
>
> Yes, that's exactly what I had in mind. Usually our clear() functions
> put the struct back into some default state from which it can be used
> gain. But the state after clear() here (without the patch below) is
> something that nobody is ever expected to look at.

> > So in particular, why doesn't `clear...()` and the error path in
> > `read_...()` impose sane, usable defaults? My first concern is that it
> > means we need to make a stronger promise, which might then be hard to
> > back away from, if we want to. Maybe we'll never want to...
>
> I'm not too worried about that personally. I think the more likely
> problem is that the API is misunderstood and misused. ;)

Heh. Agreed. :-)

> Now if your next question is: "does any caller misuse this as more than
> looking at the repo format", I don't know the answer for sure. That
> would be worth poking at (or perhaps having just poked yourself, you
> might have an idea already).

Not really. I've stumbled around a little, but I'll need to do that some
more.

> For the record, I can live with it either way. There are so many funky
> little setup corner cases in the code already, and we don't even really
> have a real-world case to dissect at this point. So the right thing may
> also just be to finish this patch series as quickly as possible and move
> on to something more useful. :)

I rebased the "something like this?" into this series yesterday and I
think the end result is better, but also that the way there is clearer,
mostly because this patch is then gone. I wanted to double-check it
tonight and submit it. I'll do that tonight.

Thank you for your comments. They're really helpful.


Martin
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index bb633942bb..4d3d67c50b 100644
--- a/setup.c
+++ b/setup.c
@@ -1140,7 +1140,7 @@  const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository && repo_fmt.version > -1)
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 	}