Message ID | 7c064f43ed426c9e3b54e1ae5313d6b9332a47cb.1653141169.git.kevin@kevinlocke.name (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] setup: don't die if realpath(3) fails on getcwd(3) | expand |
On 5/21/22 9:53 AM, Kevin Locke wrote: > - /* Normalize the directory */ > - strbuf_realpath(&tmp, tmp_original_cwd, 1); > - free((char*)tmp_original_cwd); > - tmp_original_cwd = NULL; > - startup_info->original_cwd = strbuf_detach(&tmp, NULL); > + /* Try to normalize the directory. */ > + if (strbuf_realpath(&tmp, tmp_original_cwd, 0)) { I had to look it up, but this last argument going from 1 to 0 is "die_on_error", so it makes sense why it wasn't checked by an 'if' earlier _and_ why it needs to change now. > + free((char*)tmp_original_cwd); Hm. I'm never a fan of this casting, but it existed before. It's because tmp_original_cwd is exposed globally in cache.h, which is _really widely_. However, there are only two uses: setup.c, which defines it, and common-main.c, which initializes it during process startup. The following diff could apply before your commit, removing this use of "const char *", but maybe it doesn't fit normal Git coding guidelines (putting the extern directly in a *.c file): --- >8 --- diff --git a/cache.h b/cache.h index aaf334e2aa4..ce9cd6fa3f0 100644 --- a/cache.h +++ b/cache.h @@ -1797,7 +1797,6 @@ struct startup_info { const char *original_cwd; }; extern struct startup_info *startup_info; -extern const char *tmp_original_cwd; /* merge.c */ struct commit_list; diff --git a/common-main.c b/common-main.c index 29fb7452f8a..e472258b83b 100644 --- a/common-main.c +++ b/common-main.c @@ -23,6 +23,8 @@ static void restore_sigpipe_to_default(void) signal(SIGPIPE, SIG_DFL); } +extern char *tmp_original_cwd; + int main(int argc, const char **argv) { int result; diff --git a/setup.c b/setup.c index 04ce33cdcd4..86986317490 100644 --- a/setup.c +++ b/setup.c @@ -12,7 +12,7 @@ static int work_tree_config_is_bogus; static struct startup_info the_startup_info; struct startup_info *startup_info = &the_startup_info; -const char *tmp_original_cwd; +char *tmp_original_cwd; /* * The input parameter must contain an absolute path, and it must already be @@ -459,7 +459,7 @@ static void setup_original_cwd(void) /* Normalize the directory */ strbuf_realpath(&tmp, tmp_original_cwd, 1); - free((char*)tmp_original_cwd); + free(tmp_original_cwd); tmp_original_cwd = NULL; startup_info->original_cwd = strbuf_detach(&tmp, NULL); --- >8 --- > + tmp_original_cwd = NULL; > + startup_info->original_cwd = strbuf_detach(&tmp, NULL); > + } else { > + trace_printf("realpath(cwd) failed: %s\n", strerror(errno)); It could also be helpful to include a trace2 output, since most modern tracing uses that mechanism: trace2_data_string("setup", the_repository, "realpath-path", tmp_original_cwd); trace2_data_string("setup", the_repository, "realpath-failure", strerror(errno)); But this is also unlikely to be necessary. We can instruct a user to rerun their command with GIT_TRACE=1 if we see this error in the wild. > + tmp_original_cwd = NULL; > + goto no_prevention_needed; > + } So, I don't see a need for this patch to change before it is merged. I'm curious to hear thoughts on the diff I put inline, though. Thanks, -Stolee
On Mon, 2022-05-23 at 14:57 -0400, Derrick Stolee wrote: > On 5/21/22 9:53 AM, Kevin Locke wrote: > > + free((char*)tmp_original_cwd); > > Hm. I'm never a fan of this casting, but it existed before. It's > because tmp_original_cwd is exposed globally in cache.h, which > is _really widely_. However, there are only two uses: setup.c, > which defines it, and common-main.c, which initializes it during > process startup. > > The following diff could apply before your commit, removing this > use of "const char *", but maybe it doesn't fit normal Git > coding guidelines (putting the extern directly in a *.c file): > > --- >8 --- > > diff --git a/cache.h b/cache.h > index aaf334e2aa4..ce9cd6fa3f0 100644 > --- a/cache.h > +++ b/cache.h > @@ -1797,7 +1797,6 @@ struct startup_info { > const char *original_cwd; > }; > extern struct startup_info *startup_info; > -extern const char *tmp_original_cwd; > > /* merge.c */ > struct commit_list; > diff --git a/common-main.c b/common-main.c > index 29fb7452f8a..e472258b83b 100644 > --- a/common-main.c > +++ b/common-main.c > @@ -23,6 +23,8 @@ static void restore_sigpipe_to_default(void) > signal(SIGPIPE, SIG_DFL); > } > > +extern char *tmp_original_cwd; > + > int main(int argc, const char **argv) > { > int result; > diff --git a/setup.c b/setup.c > index 04ce33cdcd4..86986317490 100644 > --- a/setup.c > +++ b/setup.c > @@ -12,7 +12,7 @@ static int work_tree_config_is_bogus; > > static struct startup_info the_startup_info; > struct startup_info *startup_info = &the_startup_info; > -const char *tmp_original_cwd; > +char *tmp_original_cwd; > > /* > * The input parameter must contain an absolute path, and it must already be > @@ -459,7 +459,7 @@ static void setup_original_cwd(void) > > /* Normalize the directory */ > strbuf_realpath(&tmp, tmp_original_cwd, 1); > - free((char*)tmp_original_cwd); > + free(tmp_original_cwd); > tmp_original_cwd = NULL; > startup_info->original_cwd = strbuf_detach(&tmp, NULL); > > --- >8 --- This approach seems reasonable to me, as does casting to free(). It's not clear to me which is preferable in this case. How to balance the trade-offs between exposing const interfaces, limiting (internal) interfaces to headers, and avoiding casts might be worth discussing and documenting a matter of project coding style. `grep -rF 'free(('` lists about 100 casts to free, suggesting the discussion may be worthwhile. Introducing a free_const() macro could be another option to consider. >> + tmp_original_cwd = NULL; >> + startup_info->original_cwd = strbuf_detach(&tmp, NULL); >> + } else { >> + trace_printf("realpath(cwd) failed: %s\n", strerror(errno)); > > It could also be helpful to include a trace2 output, since > most modern tracing uses that mechanism: > > trace2_data_string("setup", the_repository, > "realpath-path", tmp_original_cwd); > trace2_data_string("setup", the_repository, > "realpath-failure", strerror(errno)); > > But this is also unlikely to be necessary. We can instruct > a user to rerun their command with GIT_TRACE=1 if we see > this error in the wild. That's a great idea. I hadn't realized the difference between the trace_* and trace2_* APIs until investigating as a result of your suggestion. Trace2 seems preferable for many reasons. I'll send an updated patch shortly. Thanks for the review and feedback Stolee! Cheers, Kevin
On Tue, May 24, 2022 at 7:02 AM Kevin Locke <kevin@kevinlocke.name> wrote: > > On Mon, 2022-05-23 at 14:57 -0400, Derrick Stolee wrote: > > On 5/21/22 9:53 AM, Kevin Locke wrote: > > > + free((char*)tmp_original_cwd); > > > > Hm. I'm never a fan of this casting, but it existed before. It's > > because tmp_original_cwd is exposed globally in cache.h, which > > is _really widely_. However, there are only two uses: setup.c, > > which defines it, and common-main.c, which initializes it during > > process startup. > > > > The following diff could apply before your commit, removing this > > use of "const char *", but maybe it doesn't fit normal Git > > coding guidelines (putting the extern directly in a *.c file): > > > > --- >8 --- > > > > diff --git a/cache.h b/cache.h > > index aaf334e2aa4..ce9cd6fa3f0 100644 > > --- a/cache.h > > +++ b/cache.h > > @@ -1797,7 +1797,6 @@ struct startup_info { > > const char *original_cwd; > > }; > > extern struct startup_info *startup_info; > > -extern const char *tmp_original_cwd; > > > > /* merge.c */ > > struct commit_list; > > diff --git a/common-main.c b/common-main.c > > index 29fb7452f8a..e472258b83b 100644 > > --- a/common-main.c > > +++ b/common-main.c > > @@ -23,6 +23,8 @@ static void restore_sigpipe_to_default(void) > > signal(SIGPIPE, SIG_DFL); > > } > > > > +extern char *tmp_original_cwd; > > + > > int main(int argc, const char **argv) > > { > > int result; > > diff --git a/setup.c b/setup.c > > index 04ce33cdcd4..86986317490 100644 > > --- a/setup.c > > +++ b/setup.c > > @@ -12,7 +12,7 @@ static int work_tree_config_is_bogus; > > > > static struct startup_info the_startup_info; > > struct startup_info *startup_info = &the_startup_info; > > -const char *tmp_original_cwd; > > +char *tmp_original_cwd; > > > > /* > > * The input parameter must contain an absolute path, and it must already be > > @@ -459,7 +459,7 @@ static void setup_original_cwd(void) > > > > /* Normalize the directory */ > > strbuf_realpath(&tmp, tmp_original_cwd, 1); > > - free((char*)tmp_original_cwd); > > + free(tmp_original_cwd); > > tmp_original_cwd = NULL; > > startup_info->original_cwd = strbuf_detach(&tmp, NULL); > > > > --- >8 --- > > This approach seems reasonable to me, as does casting to free(). It's > not clear to me which is preferable in this case. How to balance the > trade-offs between exposing const interfaces, limiting (internal) > interfaces to headers, and avoiding casts might be worth discussing > and documenting a matter of project coding style. `grep -rF 'free(('` > lists about 100 casts to free, suggesting the discussion may be > worthwhile. Introducing a free_const() macro could be another option > to consider. I'd prefer either a free_const() as you suggest (though as a separate patch from what you are submitting here), or leaving the code as-is. free() could have been written to take a const void* instead of just void*, since it's not going to modify what the pointer points at. The reason we call free() is because the variable isn't needed anymore, and using a non-const value after freeing is just as wrong as using a const one after freeing, so casting away the constness cannot really cause any new problems. So, I think the signature of free() is just wrong: it should have taken a const void* all along. Unfortunately, the wrong type signature sadly makes people feel like they have to choose between (a) dropping the added safety of const that the compiler can enforce for you during the lifetime of the variable, or (b) leaking memory you no longer need. I think it's a bad choice and you should just typecast when free'ing, but clearly others just don't want to see any typecasts and are willing to dispense with const on constant variables.
On 5/24/2022 11:20 AM, Elijah Newren wrote: > On Tue, May 24, 2022 at 7:02 AM Kevin Locke <kevin@kevinlocke.name> wrote: >> >> On Mon, 2022-05-23 at 14:57 -0400, Derrick Stolee wrote: >>> On 5/21/22 9:53 AM, Kevin Locke wrote: >>>> + free((char*)tmp_original_cwd); >>> >>> Hm. I'm never a fan of this casting, but it existed before. It's >>> because tmp_original_cwd is exposed globally in cache.h, which >>> is _really widely_. However, there are only two uses: setup.c, >>> which defines it, and common-main.c, which initializes it during >>> process startup. ...>> This approach seems reasonable to me, as does casting to free(). It's >> not clear to me which is preferable in this case. How to balance the >> trade-offs between exposing const interfaces, limiting (internal) >> interfaces to headers, and avoiding casts might be worth discussing >> and documenting a matter of project coding style. `grep -rF 'free(('` >> lists about 100 casts to free, suggesting the discussion may be >> worthwhile. Introducing a free_const() macro could be another option >> to consider. > > I'd prefer either a free_const() as you suggest (though as a separate > patch from what you are submitting here), or leaving the code as-is. > free() could have been written to take a const void* instead of just > void*, since it's not going to modify what the pointer points at. The > reason we call free() is because the variable isn't needed anymore, > and using a non-const value after freeing is just as wrong as using a > const one after freeing, so casting away the constness cannot really > cause any new problems. So, I think the signature of free() is just > wrong: it should have taken a const void* all along. Unfortunately, > the wrong type signature sadly makes people feel like they have to > choose between (a) dropping the added safety of const that the > compiler can enforce for you during the lifetime of the variable, or > (b) leaking memory you no longer need. I think it's a bad choice and > you should just typecast when free'ing, but clearly others just don't > want to see any typecasts and are willing to dispense with const on > constant variables. I mostly agree with you: if free() didn't have the const, then the answer would be simple. We probably wouldn't also have the convention of "const pointers are for memory we don't own". Specifically with 'const char *' this can sometimes point to a compiled string literal, so I tend to be more careful than usual around these kinds of casts. I'm willing to concede this point as it is much messier than just the goals of this patch. Thanks, -Stolee
On Tue, May 24, 2022 at 10:38 AM Derrick Stolee <derrickstolee@github.com> wrote: > > On 5/24/2022 11:20 AM, Elijah Newren wrote: > > On Tue, May 24, 2022 at 7:02 AM Kevin Locke <kevin@kevinlocke.name> wrote: > >> > >> On Mon, 2022-05-23 at 14:57 -0400, Derrick Stolee wrote: > >>> On 5/21/22 9:53 AM, Kevin Locke wrote: > >>>> + free((char*)tmp_original_cwd); > >>> > >>> Hm. I'm never a fan of this casting, but it existed before. It's > >>> because tmp_original_cwd is exposed globally in cache.h, which > >>> is _really widely_. However, there are only two uses: setup.c, > >>> which defines it, and common-main.c, which initializes it during > >>> process startup. > ...>> This approach seems reasonable to me, as does casting to free(). It's > >> not clear to me which is preferable in this case. How to balance the > >> trade-offs between exposing const interfaces, limiting (internal) > >> interfaces to headers, and avoiding casts might be worth discussing > >> and documenting a matter of project coding style. `grep -rF 'free(('` > >> lists about 100 casts to free, suggesting the discussion may be > >> worthwhile. Introducing a free_const() macro could be another option > >> to consider. > > > > I'd prefer either a free_const() as you suggest (though as a separate > > patch from what you are submitting here), or leaving the code as-is. > > free() could have been written to take a const void* instead of just > > void*, since it's not going to modify what the pointer points at. The > > reason we call free() is because the variable isn't needed anymore, > > and using a non-const value after freeing is just as wrong as using a > > const one after freeing, so casting away the constness cannot really > > cause any new problems. So, I think the signature of free() is just > > wrong: it should have taken a const void* all along. Unfortunately, > > the wrong type signature sadly makes people feel like they have to > > choose between (a) dropping the added safety of const that the > > compiler can enforce for you during the lifetime of the variable, or > > (b) leaking memory you no longer need. I think it's a bad choice and > > you should just typecast when free'ing, but clearly others just don't > > want to see any typecasts and are willing to dispense with const on > > constant variables. > > I mostly agree with you: if free() didn't have the const, then the > answer would be simple. We probably wouldn't also have the convention > of "const pointers are for memory we don't own". > > Specifically with 'const char *' this can sometimes point to a > compiled string literal, so I tend to be more careful than usual > around these kinds of casts. Ah, fair enough. > I'm willing to concede this point as it is much messier than just > the goals of this patch. :-)
On Tue, May 24 2022, Elijah Newren wrote: > [...] So, I think the signature of free() is just > wrong: it should have taken a const void* all along. Unfortunately, > the wrong type signature sadly makes people feel like they have to > choose between (a) dropping the added safety of const that the > compiler can enforce for you during the lifetime of the variable, or > (b) leaking memory you no longer need. Hrm, don't you mean that it would be better as: void free(void *const ptr); Not: void free(const void *ptr); But standard C doesn't make much (any?) use of the former form for its library functions by convention. c.f.: cdecl> explain const void *var declare var as pointer to const void cdecl> explain void *const var declare var as const pointer to void I.e. the whole point of malloc() is to give us a pointer to memory that isn't "const". If we stored that in a variable that was "void *const" we'd save ourselves from some amount of foot-guns, but you're rarely doing pointer arithmetic accidentally, so probably not really. But yeah, we really should have this documented somewhere, i.e. the cases where we "lie" and expose a "const char *" which is really (as far as the machine is concerned) mutable. The confusion being that we're seeking to overlay our own "no, this isn't mutable" on the basis of our desired API boundaries, not just to use it to inform us & the compiler about the "real" nature of the underlying data.
On Fri, May 27, 2022 at 1:02 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Tue, May 24 2022, Elijah Newren wrote: > > > [...] So, I think the signature of free() is just > > wrong: it should have taken a const void* all along. Unfortunately, > > the wrong type signature sadly makes people feel like they have to > > choose between (a) dropping the added safety of const that the > > compiler can enforce for you during the lifetime of the variable, or > > (b) leaking memory you no longer need. > > Hrm, don't you mean that it would be better as: > > void free(void *const ptr); > > Not: > > void free(const void *ptr); Nope, I definitely meant the latter; the stuff pointed to is const, not the pointer itself. In fact, I don't see any point at all in the former; with the free() that exists today: void free(void *ptr) I can pass it a "void * const myptr" already without problems, because free's ptr parameter will be a copy of myptr, and thus modifying ptr cannot affect myptr. So such a call signature change could not possibly provide any benefit to the outside caller. But that call signature change could hinder the actual implementation of free() for some folks (particularly if a given implementation of free() keeps extra data near the allocated block with information about the size of the block and the next allocated block in the list). In contrast, I cannot pass a "const void *myptr" or "const char *myptr" to free(), but only because of the current type signature; free() doesn't actually modify any of the contents the pointer points to. (And if you want to claim that free effectively does modify what myptr points to because someone else could allocate that same memory, remember that use-after-free is undefined regardless of whether the data pointed to is const or not, and thus you cannot access that data after free with or without the const.) So, free()'s real type that it acts on is a const void *. Sadly, the declared type signature is rather void *, which unnecessarily forces users to cast their types when calling.
diff --git a/setup.c b/setup.c index a7b36f3ffb..22430a663c 100644 --- a/setup.c +++ b/setup.c @@ -458,11 +458,16 @@ static void setup_original_cwd(void) * not startup_info->original_cwd. */ - /* Normalize the directory */ - strbuf_realpath(&tmp, tmp_original_cwd, 1); - free((char*)tmp_original_cwd); - tmp_original_cwd = NULL; - startup_info->original_cwd = strbuf_detach(&tmp, NULL); + /* Try to normalize the directory. */ + if (strbuf_realpath(&tmp, tmp_original_cwd, 0)) { + free((char*)tmp_original_cwd); + tmp_original_cwd = NULL; + startup_info->original_cwd = strbuf_detach(&tmp, NULL); + } else { + trace_printf("realpath(cwd) failed: %s\n", strerror(errno)); + tmp_original_cwd = NULL; + goto no_prevention_needed; + } /* * Get our worktree; we only protect the current working directory
Prior to Git 2.35.0, git could be run from an inaccessible working directory so long as the git repository specified by options and/or environment variables was accessible. For example: git init repo mkdir -p a/b cd a/b chmod u-x .. git -C "${PWD%/a/b}/repo" status If this example seems a bit contrived, consider running with the repository owner as a substitute UID (e.g. with runuser(1) or sudo(8)) without ensuring the working directory is accessible by that user. The code added by e6f8861bd4 ("setup: introduce startup_info->original_cwd") to preserve the working directory attempts to normalize the path using strbuf_realpath(). If that fails, as in the case above, it is treated as a fatal error. This commit treats strbuf_realpath() errors as non-fatal. If an error occurs, setup_original_cwd() will continue without applying removal prevention for cwd, resulting in the pre-2.35.0 behavior. The risk should be minimal, since git will not operate on a repository with inaccessible ancestors, this behavior is only known to occur when cwd is a descendant of the repository, an ancestor of cwd is inaccessible, and no ancestors of the repository are inaccessible. Signed-off-by: Kevin Locke <kevin@kevinlocke.name> --- Changes since v1: * Set startup_info->original_cwd = NULL when strbuf_realpath() fails, rather than setting it to the un-normalized path. * Add a trace message when realpath fails to aid debugging. * Remove potential realpath failure cause from comment before it. * Improve format for reference to e6f8861bd4 in commit message. * Clarify when the pre-2.35.0 behavior may occur as a result of this commit in the commit message. * Remove 'Fixes:' tag from commit message. setup.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)