diff mbox series

[v2] setup: don't die if realpath(3) fails on getcwd(3)

Message ID 7c064f43ed426c9e3b54e1ae5313d6b9332a47cb.1653141169.git.kevin@kevinlocke.name (mailing list archive)
State Superseded
Headers show
Series [v2] setup: don't die if realpath(3) fails on getcwd(3) | expand

Commit Message

Kevin Locke May 21, 2022, 1:53 p.m. UTC
Prior to Git 2.35.0, git could be run from an inaccessible working
directory so long as the git repository specified by options and/or
environment variables was accessible.  For example:

    git init repo
    mkdir -p a/b
    cd a/b
    chmod u-x ..
    git -C "${PWD%/a/b}/repo" status

If this example seems a bit contrived, consider running with the
repository owner as a substitute UID (e.g. with runuser(1) or sudo(8))
without ensuring the working directory is accessible by that user.

The code added by e6f8861bd4 ("setup: introduce
startup_info->original_cwd") to preserve the working directory attempts
to normalize the path using strbuf_realpath().  If that fails, as in the
case above, it is treated as a fatal error.

This commit treats strbuf_realpath() errors as non-fatal.  If an error
occurs, setup_original_cwd() will continue without applying removal
prevention for cwd, resulting in the pre-2.35.0 behavior.  The risk
should be minimal, since git will not operate on a repository with
inaccessible ancestors, this behavior is only known to occur when cwd is
a descendant of the repository, an ancestor of cwd is inaccessible, and
no ancestors of the repository are inaccessible.

Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
---

Changes since v1:
 * Set startup_info->original_cwd = NULL when strbuf_realpath() fails,
   rather than setting it to the un-normalized path.
 * Add a trace message when realpath fails to aid debugging.
 * Remove potential realpath failure cause from comment before it.
 * Improve format for reference to e6f8861bd4 in commit message.
 * Clarify when the pre-2.35.0 behavior may occur as a result of this
   commit in the commit message.
 * Remove 'Fixes:' tag from commit message.

 setup.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Comments

Derrick Stolee May 23, 2022, 6:57 p.m. UTC | #1
On 5/21/22 9:53 AM, Kevin Locke wrote:
> -	/* Normalize the directory */
> -	strbuf_realpath(&tmp, tmp_original_cwd, 1);
> -	free((char*)tmp_original_cwd);
> -	tmp_original_cwd = NULL;
> -	startup_info->original_cwd = strbuf_detach(&tmp, NULL);
> +	/* Try to normalize the directory. */
> +	if (strbuf_realpath(&tmp, tmp_original_cwd, 0)) {

I had to look it up, but this last argument going from 1 to 0
is "die_on_error", so it makes sense why it wasn't checked by
an 'if' earlier _and_ why it needs to change now.

> +		free((char*)tmp_original_cwd);

Hm. I'm never a fan of this casting, but it existed before. It's
because tmp_original_cwd is exposed globally in cache.h, which
is _really widely_. However, there are only two uses: setup.c,
which defines it, and common-main.c, which initializes it during
process startup.

The following diff could apply before your commit, removing this
use of "const char *", but maybe it doesn't fit normal Git
coding guidelines (putting the extern directly in a *.c file):

--- >8 ---

diff --git a/cache.h b/cache.h
index aaf334e2aa4..ce9cd6fa3f0 100644
--- a/cache.h
+++ b/cache.h
@@ -1797,7 +1797,6 @@ struct startup_info {
 	const char *original_cwd;
 };
 extern struct startup_info *startup_info;
-extern const char *tmp_original_cwd;
 
 /* merge.c */
 struct commit_list;
diff --git a/common-main.c b/common-main.c
index 29fb7452f8a..e472258b83b 100644
--- a/common-main.c
+++ b/common-main.c
@@ -23,6 +23,8 @@ static void restore_sigpipe_to_default(void)
 	signal(SIGPIPE, SIG_DFL);
 }
 
+extern char *tmp_original_cwd;
+
 int main(int argc, const char **argv)
 {
 	int result;
diff --git a/setup.c b/setup.c
index 04ce33cdcd4..86986317490 100644
--- a/setup.c
+++ b/setup.c
@@ -12,7 +12,7 @@ static int work_tree_config_is_bogus;
 
 static struct startup_info the_startup_info;
 struct startup_info *startup_info = &the_startup_info;
-const char *tmp_original_cwd;
+char *tmp_original_cwd;
 
 /*
  * The input parameter must contain an absolute path, and it must already be
@@ -459,7 +459,7 @@ static void setup_original_cwd(void)
 
 	/* Normalize the directory */
 	strbuf_realpath(&tmp, tmp_original_cwd, 1);
-	free((char*)tmp_original_cwd);
+	free(tmp_original_cwd);
 	tmp_original_cwd = NULL;
 	startup_info->original_cwd = strbuf_detach(&tmp, NULL);
 
--- >8 ---

> +		tmp_original_cwd = NULL;
> +		startup_info->original_cwd = strbuf_detach(&tmp, NULL);
> +	} else {
> +		trace_printf("realpath(cwd) failed: %s\n", strerror(errno));

It could also be helpful to include a trace2 output, since
most modern tracing uses that mechanism:

		trace2_data_string("setup", the_repository,
				   "realpath-path", tmp_original_cwd);
		trace2_data_string("setup", the_repository,
				   "realpath-failure", strerror(errno));

But this is also unlikely to be necessary. We can instruct
a user to rerun their command with GIT_TRACE=1 if we see
this error in the wild.

> +		tmp_original_cwd = NULL;
> +		goto no_prevention_needed;
> +	}

So, I don't see a need for this patch to change before it
is merged. I'm curious to hear thoughts on the diff I put
inline, though.

Thanks,
-Stolee
Kevin Locke May 24, 2022, 2:02 p.m. UTC | #2
On Mon, 2022-05-23 at 14:57 -0400, Derrick Stolee wrote:
> On 5/21/22 9:53 AM, Kevin Locke wrote:
> > +		free((char*)tmp_original_cwd);
> 
> Hm. I'm never a fan of this casting, but it existed before. It's
> because tmp_original_cwd is exposed globally in cache.h, which
> is _really widely_. However, there are only two uses: setup.c,
> which defines it, and common-main.c, which initializes it during
> process startup.
> 
> The following diff could apply before your commit, removing this
> use of "const char *", but maybe it doesn't fit normal Git
> coding guidelines (putting the extern directly in a *.c file):
> 
> --- >8 ---
> 
> diff --git a/cache.h b/cache.h
> index aaf334e2aa4..ce9cd6fa3f0 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1797,7 +1797,6 @@ struct startup_info {
>  	const char *original_cwd;
>  };
>  extern struct startup_info *startup_info;
> -extern const char *tmp_original_cwd;
>  
>  /* merge.c */
>  struct commit_list;
> diff --git a/common-main.c b/common-main.c
> index 29fb7452f8a..e472258b83b 100644
> --- a/common-main.c
> +++ b/common-main.c
> @@ -23,6 +23,8 @@ static void restore_sigpipe_to_default(void)
>  	signal(SIGPIPE, SIG_DFL);
>  }
>  
> +extern char *tmp_original_cwd;
> +
>  int main(int argc, const char **argv)
>  {
>  	int result;
> diff --git a/setup.c b/setup.c
> index 04ce33cdcd4..86986317490 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -12,7 +12,7 @@ static int work_tree_config_is_bogus;
>  
>  static struct startup_info the_startup_info;
>  struct startup_info *startup_info = &the_startup_info;
> -const char *tmp_original_cwd;
> +char *tmp_original_cwd;
>  
>  /*
>   * The input parameter must contain an absolute path, and it must already be
> @@ -459,7 +459,7 @@ static void setup_original_cwd(void)
>  
>  	/* Normalize the directory */
>  	strbuf_realpath(&tmp, tmp_original_cwd, 1);
> -	free((char*)tmp_original_cwd);
> +	free(tmp_original_cwd);
>  	tmp_original_cwd = NULL;
>  	startup_info->original_cwd = strbuf_detach(&tmp, NULL);
>  
> --- >8 ---

This approach seems reasonable to me, as does casting to free().  It's
not clear to me which is preferable in this case.  How to balance the
trade-offs between exposing const interfaces, limiting (internal)
interfaces to headers, and avoiding casts might be worth discussing
and documenting a matter of project coding style.  `grep -rF 'free(('`
lists about 100 casts to free, suggesting the discussion may be
worthwhile.  Introducing a free_const() macro could be another option
to consider.

>> +		tmp_original_cwd = NULL;
>> +		startup_info->original_cwd = strbuf_detach(&tmp, NULL);
>> +	} else {
>> +		trace_printf("realpath(cwd) failed: %s\n", strerror(errno));
> 
> It could also be helpful to include a trace2 output, since
> most modern tracing uses that mechanism:
> 
> 		trace2_data_string("setup", the_repository,
> 				   "realpath-path", tmp_original_cwd);
> 		trace2_data_string("setup", the_repository,
> 				   "realpath-failure", strerror(errno));
> 
> But this is also unlikely to be necessary. We can instruct
> a user to rerun their command with GIT_TRACE=1 if we see
> this error in the wild.

That's a great idea.  I hadn't realized the difference between the
trace_* and trace2_* APIs until investigating as a result of your
suggestion.  Trace2 seems preferable for many reasons.  I'll send an
updated patch shortly.

Thanks for the review and feedback Stolee!

Cheers,
Kevin
Elijah Newren May 24, 2022, 3:20 p.m. UTC | #3
On Tue, May 24, 2022 at 7:02 AM Kevin Locke <kevin@kevinlocke.name> wrote:
>
> On Mon, 2022-05-23 at 14:57 -0400, Derrick Stolee wrote:
> > On 5/21/22 9:53 AM, Kevin Locke wrote:
> > > +           free((char*)tmp_original_cwd);
> >
> > Hm. I'm never a fan of this casting, but it existed before. It's
> > because tmp_original_cwd is exposed globally in cache.h, which
> > is _really widely_. However, there are only two uses: setup.c,
> > which defines it, and common-main.c, which initializes it during
> > process startup.
> >
> > The following diff could apply before your commit, removing this
> > use of "const char *", but maybe it doesn't fit normal Git
> > coding guidelines (putting the extern directly in a *.c file):
> >
> > --- >8 ---
> >
> > diff --git a/cache.h b/cache.h
> > index aaf334e2aa4..ce9cd6fa3f0 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1797,7 +1797,6 @@ struct startup_info {
> >       const char *original_cwd;
> >  };
> >  extern struct startup_info *startup_info;
> > -extern const char *tmp_original_cwd;
> >
> >  /* merge.c */
> >  struct commit_list;
> > diff --git a/common-main.c b/common-main.c
> > index 29fb7452f8a..e472258b83b 100644
> > --- a/common-main.c
> > +++ b/common-main.c
> > @@ -23,6 +23,8 @@ static void restore_sigpipe_to_default(void)
> >       signal(SIGPIPE, SIG_DFL);
> >  }
> >
> > +extern char *tmp_original_cwd;
> > +
> >  int main(int argc, const char **argv)
> >  {
> >       int result;
> > diff --git a/setup.c b/setup.c
> > index 04ce33cdcd4..86986317490 100644
> > --- a/setup.c
> > +++ b/setup.c
> > @@ -12,7 +12,7 @@ static int work_tree_config_is_bogus;
> >
> >  static struct startup_info the_startup_info;
> >  struct startup_info *startup_info = &the_startup_info;
> > -const char *tmp_original_cwd;
> > +char *tmp_original_cwd;
> >
> >  /*
> >   * The input parameter must contain an absolute path, and it must already be
> > @@ -459,7 +459,7 @@ static void setup_original_cwd(void)
> >
> >       /* Normalize the directory */
> >       strbuf_realpath(&tmp, tmp_original_cwd, 1);
> > -     free((char*)tmp_original_cwd);
> > +     free(tmp_original_cwd);
> >       tmp_original_cwd = NULL;
> >       startup_info->original_cwd = strbuf_detach(&tmp, NULL);
> >
> > --- >8 ---
>
> This approach seems reasonable to me, as does casting to free().  It's
> not clear to me which is preferable in this case.  How to balance the
> trade-offs between exposing const interfaces, limiting (internal)
> interfaces to headers, and avoiding casts might be worth discussing
> and documenting a matter of project coding style.  `grep -rF 'free(('`
> lists about 100 casts to free, suggesting the discussion may be
> worthwhile.  Introducing a free_const() macro could be another option
> to consider.

I'd prefer either a free_const() as you suggest (though as a separate
patch from what you are submitting here), or leaving the code as-is.
free() could have been written to take a const void* instead of just
void*, since it's not going to modify what the pointer points at.  The
reason we call free() is because the variable isn't needed anymore,
and using a non-const value after freeing is just as wrong as using a
const one after freeing, so casting away the constness cannot really
cause any new problems.  So, I think the signature of free() is just
wrong: it should have taken a const void* all along.  Unfortunately,
the wrong type signature sadly makes people feel like they have to
choose between (a) dropping the added safety of const that the
compiler can enforce for you during the lifetime of the variable, or
(b) leaking memory you no longer need.  I think it's a bad choice and
you should just typecast when free'ing, but clearly others just don't
want to see any typecasts and are willing to dispense with const on
constant variables.
Derrick Stolee May 24, 2022, 5:38 p.m. UTC | #4
On 5/24/2022 11:20 AM, Elijah Newren wrote:
> On Tue, May 24, 2022 at 7:02 AM Kevin Locke <kevin@kevinlocke.name> wrote:
>>
>> On Mon, 2022-05-23 at 14:57 -0400, Derrick Stolee wrote:
>>> On 5/21/22 9:53 AM, Kevin Locke wrote:
>>>> +           free((char*)tmp_original_cwd);
>>>
>>> Hm. I'm never a fan of this casting, but it existed before. It's
>>> because tmp_original_cwd is exposed globally in cache.h, which
>>> is _really widely_. However, there are only two uses: setup.c,
>>> which defines it, and common-main.c, which initializes it during
>>> process startup.
...>> This approach seems reasonable to me, as does casting to free().  It's
>> not clear to me which is preferable in this case.  How to balance the
>> trade-offs between exposing const interfaces, limiting (internal)
>> interfaces to headers, and avoiding casts might be worth discussing
>> and documenting a matter of project coding style.  `grep -rF 'free(('`
>> lists about 100 casts to free, suggesting the discussion may be
>> worthwhile.  Introducing a free_const() macro could be another option
>> to consider.
> 
> I'd prefer either a free_const() as you suggest (though as a separate
> patch from what you are submitting here), or leaving the code as-is.
> free() could have been written to take a const void* instead of just
> void*, since it's not going to modify what the pointer points at.  The
> reason we call free() is because the variable isn't needed anymore,
> and using a non-const value after freeing is just as wrong as using a
> const one after freeing, so casting away the constness cannot really
> cause any new problems.  So, I think the signature of free() is just
> wrong: it should have taken a const void* all along.  Unfortunately,
> the wrong type signature sadly makes people feel like they have to
> choose between (a) dropping the added safety of const that the
> compiler can enforce for you during the lifetime of the variable, or
> (b) leaking memory you no longer need.  I think it's a bad choice and
> you should just typecast when free'ing, but clearly others just don't
> want to see any typecasts and are willing to dispense with const on
> constant variables.

I mostly agree with you: if free() didn't have the const, then the
answer would be simple. We probably wouldn't also have the convention
of "const pointers are for memory we don't own".

Specifically with 'const char *' this can sometimes point to a
compiled string literal, so I tend to be more careful than usual
around these kinds of casts.

I'm willing to concede this point as it is much messier than just
the goals of this patch.

Thanks,
-Stolee
Elijah Newren May 25, 2022, 3:47 a.m. UTC | #5
On Tue, May 24, 2022 at 10:38 AM Derrick Stolee
<derrickstolee@github.com> wrote:
>
> On 5/24/2022 11:20 AM, Elijah Newren wrote:
> > On Tue, May 24, 2022 at 7:02 AM Kevin Locke <kevin@kevinlocke.name> wrote:
> >>
> >> On Mon, 2022-05-23 at 14:57 -0400, Derrick Stolee wrote:
> >>> On 5/21/22 9:53 AM, Kevin Locke wrote:
> >>>> +           free((char*)tmp_original_cwd);
> >>>
> >>> Hm. I'm never a fan of this casting, but it existed before. It's
> >>> because tmp_original_cwd is exposed globally in cache.h, which
> >>> is _really widely_. However, there are only two uses: setup.c,
> >>> which defines it, and common-main.c, which initializes it during
> >>> process startup.
> ...>> This approach seems reasonable to me, as does casting to free().  It's
> >> not clear to me which is preferable in this case.  How to balance the
> >> trade-offs between exposing const interfaces, limiting (internal)
> >> interfaces to headers, and avoiding casts might be worth discussing
> >> and documenting a matter of project coding style.  `grep -rF 'free(('`
> >> lists about 100 casts to free, suggesting the discussion may be
> >> worthwhile.  Introducing a free_const() macro could be another option
> >> to consider.
> >
> > I'd prefer either a free_const() as you suggest (though as a separate
> > patch from what you are submitting here), or leaving the code as-is.
> > free() could have been written to take a const void* instead of just
> > void*, since it's not going to modify what the pointer points at.  The
> > reason we call free() is because the variable isn't needed anymore,
> > and using a non-const value after freeing is just as wrong as using a
> > const one after freeing, so casting away the constness cannot really
> > cause any new problems.  So, I think the signature of free() is just
> > wrong: it should have taken a const void* all along.  Unfortunately,
> > the wrong type signature sadly makes people feel like they have to
> > choose between (a) dropping the added safety of const that the
> > compiler can enforce for you during the lifetime of the variable, or
> > (b) leaking memory you no longer need.  I think it's a bad choice and
> > you should just typecast when free'ing, but clearly others just don't
> > want to see any typecasts and are willing to dispense with const on
> > constant variables.
>
> I mostly agree with you: if free() didn't have the const, then the
> answer would be simple. We probably wouldn't also have the convention
> of "const pointers are for memory we don't own".
>
> Specifically with 'const char *' this can sometimes point to a
> compiled string literal, so I tend to be more careful than usual
> around these kinds of casts.

Ah, fair enough.


> I'm willing to concede this point as it is much messier than just
> the goals of this patch.

:-)
Ævar Arnfjörð Bjarmason May 27, 2022, 7:48 a.m. UTC | #6
On Tue, May 24 2022, Elijah Newren wrote:

> [...] So, I think the signature of free() is just
> wrong: it should have taken a const void* all along.  Unfortunately,
> the wrong type signature sadly makes people feel like they have to
> choose between (a) dropping the added safety of const that the
> compiler can enforce for you during the lifetime of the variable, or
> (b) leaking memory you no longer need.

Hrm, don't you mean that it would be better as:

	void free(void *const ptr);

Not:

	void free(const void *ptr);

But standard C doesn't make much (any?) use of the former form for its
library functions by convention.

c.f.:

	cdecl> explain const void *var
	declare var as pointer to const void
	cdecl> explain void *const var
	declare var as const pointer to void

I.e. the whole point of malloc() is to give us a pointer to memory that
isn't "const". If we stored that in a variable that was "void *const"
we'd save ourselves from some amount of foot-guns, but you're rarely
doing pointer arithmetic accidentally, so probably not really.

But yeah, we really should have this documented somewhere, i.e. the
cases where we "lie" and expose a "const char *" which is really (as far
as the machine is concerned) mutable.

The confusion being that we're seeking to overlay our own "no, this
isn't mutable" on the basis of our desired API boundaries, not just to
use it to inform us & the compiler about the "real" nature of the
underlying data.
Elijah Newren May 28, 2022, 1:27 a.m. UTC | #7
On Fri, May 27, 2022 at 1:02 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, May 24 2022, Elijah Newren wrote:
>
> > [...] So, I think the signature of free() is just
> > wrong: it should have taken a const void* all along.  Unfortunately,
> > the wrong type signature sadly makes people feel like they have to
> > choose between (a) dropping the added safety of const that the
> > compiler can enforce for you during the lifetime of the variable, or
> > (b) leaking memory you no longer need.
>
> Hrm, don't you mean that it would be better as:
>
>         void free(void *const ptr);
>
> Not:
>
>         void free(const void *ptr);

Nope, I definitely meant the latter; the stuff pointed to is const,
not the pointer itself.

In fact, I don't see any point at all in the former; with the free()
that exists today:

    void free(void *ptr)

I can pass it a "void * const myptr" already without problems, because
free's ptr parameter will be a copy of myptr, and thus modifying ptr
cannot affect myptr.  So such a call signature change could not
possibly provide any benefit to the outside caller.  But that call
signature change could hinder the actual implementation of free() for
some folks (particularly if a given implementation of free() keeps
extra data near the allocated block with information about the size of
the block and the next allocated block in the list).

In contrast, I cannot pass a "const void *myptr" or "const char
*myptr" to free(), but only because of the current type signature;
free() doesn't actually modify any of the contents the pointer points
to.  (And if you want to claim that free effectively does modify what
myptr points to because someone else could allocate that same memory,
remember that use-after-free is undefined regardless of whether the
data pointed to is const or not, and thus you cannot access that data
after free with or without the const.)  So, free()'s real type that it
acts on is a const void *.  Sadly, the declared type signature is
rather void *, which unnecessarily forces users to cast their types
when calling.
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index a7b36f3ffb..22430a663c 100644
--- a/setup.c
+++ b/setup.c
@@ -458,11 +458,16 @@  static void setup_original_cwd(void)
 	 *     not startup_info->original_cwd.
 	 */
 
-	/* Normalize the directory */
-	strbuf_realpath(&tmp, tmp_original_cwd, 1);
-	free((char*)tmp_original_cwd);
-	tmp_original_cwd = NULL;
-	startup_info->original_cwd = strbuf_detach(&tmp, NULL);
+	/* Try to normalize the directory. */
+	if (strbuf_realpath(&tmp, tmp_original_cwd, 0)) {
+		free((char*)tmp_original_cwd);
+		tmp_original_cwd = NULL;
+		startup_info->original_cwd = strbuf_detach(&tmp, NULL);
+	} else {
+		trace_printf("realpath(cwd) failed: %s\n", strerror(errno));
+		tmp_original_cwd = NULL;
+		goto no_prevention_needed;
+	}
 
 	/*
 	 * Get our worktree; we only protect the current working directory