Message ID | 20200107013640.1821227-1-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | run-command: avoid undefined behavior in exists_in_PATH | expand |
brian m. carlson wrote: > In this function, we free the pointer we get from locate_in_PATH and > then check whether it's NULL. However, this is undefined behavior if > the pointer is non-NULL, since the C standard no longer permits us to > use a valid pointer after freeing it. [...] > Noticed-by: Miriam R. <mirucam@gmail.com> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > run-command.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> This API that forces the caller to deal with the allocated result when they never asked for it seems a bit inconvenient. Should we clean it up a little? Something like this (on top): -- >8 -- Subject: run-command: let caller pass in buffer to locate_in_PATH Instead of returning a buffer that the caller is responsible for freeing, use a strbuf output parameter to record the path to the searched-for program. This makes ownership a little easier to reason about, since the owning code declares the buffer. It's a good habit to follow because it allows buffer reuse when calling such a function in a loop. It also allows the caller exists_in_PATH that does not care about the path to the command to be slightly simplified, by allowing a NULL output parameter that means that locate_in_PATH should take care of allocating and freeing its temporary buffer. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- run-command.c | 51 +++++++++++++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 22 deletions(-) diff --git i/run-command.c w/run-command.c index f5e1149f9b..a6dc38396a 100644 --- i/run-command.c +++ w/run-command.c @@ -170,52 +170,57 @@ int is_executable(const char *name) * The caller should ensure that file contains no directory * separators. * - * Returns the path to the command, as found in $PATH or NULL if the - * command could not be found. The caller inherits ownership of the memory - * used to store the resultant path. + * Returns a boolean indicating whether the command was found in $PATH. + * The path to the command is recorded in the strbuf 'out', if supplied. * * This should not be used on Windows, where the $PATH search rules * are more complicated (e.g., a search for "foo" should find * "foo.exe"). */ -static char *locate_in_PATH(const char *file) +static int locate_in_PATH(const char *file, struct strbuf *out) { const char *p = getenv("PATH"); struct strbuf buf = STRBUF_INIT; - if (!p || !*p) - return NULL; + if (!out) + out = &buf; + + if (!p || !*p) { + strbuf_reset(out); + strbuf_release(&buf); + return 0; + } while (1) { const char *end = strchrnul(p, ':'); - strbuf_reset(&buf); + strbuf_reset(out); /* POSIX specifies an empty entry as the current directory. */ if (end != p) { - strbuf_add(&buf, p, end - p); - strbuf_addch(&buf, '/'); + strbuf_add(out, p, end - p); + strbuf_addch(out, '/'); } - strbuf_addstr(&buf, file); + strbuf_addstr(out, file); - if (is_executable(buf.buf)) - return strbuf_detach(&buf, NULL); + if (is_executable(out->buf)) { + strbuf_release(&buf); + return 1; + } if (!*end) break; p = end + 1; } + strbuf_reset(out); strbuf_release(&buf); - return NULL; + return 0; } static int exists_in_PATH(const char *file) { - char *r = locate_in_PATH(file); - int found = r != NULL; - free(r); - return found; + return locate_in_PATH(file, NULL); } int sane_execvp(const char *file, char * const argv[]) @@ -427,15 +432,17 @@ static int prepare_cmd(struct argv_array *out, const struct child_process *cmd) * directly. */ if (!strchr(out->argv[1], '/')) { - char *program = locate_in_PATH(out->argv[1]); - if (program) { - free((char *)out->argv[1]); - out->argv[1] = program; - } else { + struct strbuf program = STRBUF_INIT; + + if (!locate_in_PATH(out->argv[1], &program)) { argv_array_clear(out); + strbuf_release(&program); errno = ENOENT; return -1; } + + free((char *)out->argv[1]); + out->argv[1] = strbuf_detach(&program, NULL); } return 0;
On 2020-01-07 at 02:04:25, Jonathan Nieder wrote: > brian m. carlson wrote: > > > In this function, we free the pointer we get from locate_in_PATH and > > then check whether it's NULL. However, this is undefined behavior if > > the pointer is non-NULL, since the C standard no longer permits us to > > use a valid pointer after freeing it. > [...] > > Noticed-by: Miriam R. <mirucam@gmail.com> > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > > --- > > run-command.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> > > This API that forces the caller to deal with the allocated result when > they never asked for it seems a bit inconvenient. Should we clean it up > a little? Something like this (on top): > > -- >8 -- > Subject: run-command: let caller pass in buffer to locate_in_PATH > > Instead of returning a buffer that the caller is responsible for > freeing, use a strbuf output parameter to record the path to the > searched-for program. > > This makes ownership a little easier to reason about, since the owning > code declares the buffer. It's a good habit to follow because it > allows buffer reuse when calling such a function in a loop. > > It also allows the caller exists_in_PATH that does not care about the > path to the command to be slightly simplified, by allowing a NULL > output parameter that means that locate_in_PATH should take care of > allocating and freeing its temporary buffer. Sure, I think this is a nice improvement. The user can reuse the buffer in a loop if they want by resetting it, which as you point out would be convenient (and potentially more efficient). And we're already using one internally, so there's no reason not to just pass one in.
On Mon, Jan 6, 2020 at 6:04 PM Jonathan Nieder <jrnieder@gmail.com> wrote: > > brian m. carlson wrote: > > > In this function, we free the pointer we get from locate_in_PATH and > > then check whether it's NULL. However, this is undefined behavior if > > the pointer is non-NULL, since the C standard no longer permits us to > > use a valid pointer after freeing it. > [...] > > Noticed-by: Miriam R. <mirucam@gmail.com> > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > > --- > > run-command.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> > > This API that forces the caller to deal with the allocated result when > they never asked for it seems a bit inconvenient. Should we clean it up > a little? Something like this (on top): > > -- >8 -- > Subject: run-command: let caller pass in buffer to locate_in_PATH > > Instead of returning a buffer that the caller is responsible for > freeing, use a strbuf output parameter to record the path to the > searched-for program. > > This makes ownership a little easier to reason about, since the owning > code declares the buffer. It's a good habit to follow because it > allows buffer reuse when calling such a function in a loop. > > It also allows the caller exists_in_PATH that does not care about the > path to the command to be slightly simplified, by allowing a NULL > output parameter that means that locate_in_PATH should take care of > allocating and freeing its temporary buffer. > > Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> > --- > run-command.c | 51 +++++++++++++++++++++++++++++---------------------- > 1 file changed, 29 insertions(+), 22 deletions(-) > > diff --git i/run-command.c w/run-command.c > index f5e1149f9b..a6dc38396a 100644 > --- i/run-command.c > +++ w/run-command.c > @@ -170,52 +170,57 @@ int is_executable(const char *name) > * The caller should ensure that file contains no directory > * separators. > * > - * Returns the path to the command, as found in $PATH or NULL if the > - * command could not be found. The caller inherits ownership of the memory > - * used to store the resultant path. > + * Returns a boolean indicating whether the command was found in $PATH. > + * The path to the command is recorded in the strbuf 'out', if supplied. > * > * This should not be used on Windows, where the $PATH search rules > * are more complicated (e.g., a search for "foo" should find > * "foo.exe"). > */ > -static char *locate_in_PATH(const char *file) > +static int locate_in_PATH(const char *file, struct strbuf *out) > { > const char *p = getenv("PATH"); > struct strbuf buf = STRBUF_INIT; > > - if (!p || !*p) > - return NULL; > + if (!out) > + out = &buf; > + > + if (!p || !*p) { > + strbuf_reset(out); Since the while loop and this block both call strbuf_reset(out);, is there a reason not to do: if (out) strbuf_reset(out); } else { out = &buf; } above? The loop below would still need its own reset, but it could do that when it decides to loop. > + strbuf_release(&buf); > + return 0; > + } > > while (1) { > const char *end = strchrnul(p, ':'); > > - strbuf_reset(&buf); > + strbuf_reset(out); This reset would be removed > > /* POSIX specifies an empty entry as the current directory. */ > if (end != p) { > - strbuf_add(&buf, p, end - p); > - strbuf_addch(&buf, '/'); > + strbuf_add(out, p, end - p); > + strbuf_addch(out, '/'); > } > - strbuf_addstr(&buf, file); > + strbuf_addstr(out, file); > > - if (is_executable(buf.buf)) > - return strbuf_detach(&buf, NULL); > + if (is_executable(out->buf)) { > + strbuf_release(&buf); > + return 1; > + } > We'd call strbuf_reset(out); here instead, before we break or loop. > if (!*end) > break; > p = end + 1 > } > > + strbuf_reset(out); And we could leave this one out; if we make it here, we'd know the buffer was already reset. > strbuf_release(&buf); > - return NULL; > + return 0; > } > > static int exists_in_PATH(const char *file) > { > - char *r = locate_in_PATH(file); > - int found = r != NULL; > - free(r); > - return found; > + return locate_in_PATH(file, NULL); > } > > int sane_execvp(const char *file, char * const argv[]) > @@ -427,15 +432,17 @@ static int prepare_cmd(struct argv_array *out, const struct child_process *cmd) > * directly. > */ > if (!strchr(out->argv[1], '/')) { > - char *program = locate_in_PATH(out->argv[1]); > - if (program) { > - free((char *)out->argv[1]); > - out->argv[1] = program; > - } else { > + struct strbuf program = STRBUF_INIT; > + > + if (!locate_in_PATH(out->argv[1], &program)) { > argv_array_clear(out); > + strbuf_release(&program); > errno = ENOENT; > return -1; > } > + > + free((char *)out->argv[1]); > + out->argv[1] = strbuf_detach(&program, NULL); > } > > return 0; Just a thought. (Pardon the noise from the peanut gallery!) Best regards, Bryan Turner
On Mon, Jan 6, 2020 at 7:40 PM Bryan Turner <bturner@atlassian.com> wrote: > > On Mon, Jan 6, 2020 at 6:04 PM Jonathan Nieder <jrnieder@gmail.com> wrote: > > > > brian m. carlson wrote: > > > > > In this function, we free the pointer we get from locate_in_PATH and > > > then check whether it's NULL. However, this is undefined behavior if > > > the pointer is non-NULL, since the C standard no longer permits us to > > > use a valid pointer after freeing it. > > [...] > > > Noticed-by: Miriam R. <mirucam@gmail.com> > > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > > > --- > > > run-command.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> > > > > This API that forces the caller to deal with the allocated result when > > they never asked for it seems a bit inconvenient. Should we clean it up > > a little? Something like this (on top): > > > > -- >8 -- > > Subject: run-command: let caller pass in buffer to locate_in_PATH > > > > Instead of returning a buffer that the caller is responsible for > > freeing, use a strbuf output parameter to record the path to the > > searched-for program. > > > > This makes ownership a little easier to reason about, since the owning > > code declares the buffer. It's a good habit to follow because it > > allows buffer reuse when calling such a function in a loop. > > > > It also allows the caller exists_in_PATH that does not care about the > > path to the command to be slightly simplified, by allowing a NULL > > output parameter that means that locate_in_PATH should take care of > > allocating and freeing its temporary buffer. > > > > Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> > > --- > > run-command.c | 51 +++++++++++++++++++++++++++++---------------------- > > 1 file changed, 29 insertions(+), 22 deletions(-) > > > > diff --git i/run-command.c w/run-command.c > > index f5e1149f9b..a6dc38396a 100644 > > --- i/run-command.c > > +++ w/run-command.c > > @@ -170,52 +170,57 @@ int is_executable(const char *name) > > * The caller should ensure that file contains no directory > > * separators. > > * > > - * Returns the path to the command, as found in $PATH or NULL if the > > - * command could not be found. The caller inherits ownership of the memory > > - * used to store the resultant path. > > + * Returns a boolean indicating whether the command was found in $PATH. > > + * The path to the command is recorded in the strbuf 'out', if supplied. > > * > > * This should not be used on Windows, where the $PATH search rules > > * are more complicated (e.g., a search for "foo" should find > > * "foo.exe"). > > */ > > -static char *locate_in_PATH(const char *file) > > +static int locate_in_PATH(const char *file, struct strbuf *out) > > { > > const char *p = getenv("PATH"); > > struct strbuf buf = STRBUF_INIT; > > > > - if (!p || !*p) > > - return NULL; > > + if (!out) > > + out = &buf; > > + > > + if (!p || !*p) { > > + strbuf_reset(out); > > Since the while loop and this block both call strbuf_reset(out);, is > there a reason not to do: > if (out) > strbuf_reset(out); > } else { > out = &buf; > } > above? The loop below would still need its own reset, but it could do > that when it decides to loop. Ignore my incorrect braces; force of habit. I meant: if (out) strbuf_reset(out); else out = &buf; > > > + strbuf_release(&buf); > > + return 0; > > + } > > > > while (1) { > > const char *end = strchrnul(p, ':'); > > > > - strbuf_reset(&buf); > > + strbuf_reset(out); > > This reset would be removed > > > > > /* POSIX specifies an empty entry as the current directory. */ > > if (end != p) { > > - strbuf_add(&buf, p, end - p); > > - strbuf_addch(&buf, '/'); > > + strbuf_add(out, p, end - p); > > + strbuf_addch(out, '/'); > > } > > - strbuf_addstr(&buf, file); > > + strbuf_addstr(out, file); > > > > - if (is_executable(buf.buf)) > > - return strbuf_detach(&buf, NULL); > > + if (is_executable(out->buf)) { > > + strbuf_release(&buf); > > + return 1; > > + } > > > > We'd call strbuf_reset(out); here instead, before we break or loop. > > > if (!*end) > > break; > > p = end + 1 > > } > > > > + strbuf_reset(out); > > And we could leave this one out; if we make it here, we'd know the > buffer was already reset. > > > strbuf_release(&buf); > > - return NULL; > > + return 0; > > } > > > > static int exists_in_PATH(const char *file) > > { > > - char *r = locate_in_PATH(file); > > - int found = r != NULL; > > - free(r); > > - return found; > > + return locate_in_PATH(file, NULL); > > } > > > > int sane_execvp(const char *file, char * const argv[]) > > @@ -427,15 +432,17 @@ static int prepare_cmd(struct argv_array *out, const struct child_process *cmd) > > * directly. > > */ > > if (!strchr(out->argv[1], '/')) { > > - char *program = locate_in_PATH(out->argv[1]); > > - if (program) { > > - free((char *)out->argv[1]); > > - out->argv[1] = program; > > - } else { > > + struct strbuf program = STRBUF_INIT; > > + > > + if (!locate_in_PATH(out->argv[1], &program)) { > > argv_array_clear(out); > > + strbuf_release(&program); > > errno = ENOENT; > > return -1; > > } > > + > > + free((char *)out->argv[1]); > > + out->argv[1] = strbuf_detach(&program, NULL); > > } > > > > return 0; > > Just a thought. (Pardon the noise from the peanut gallery!) > > Best regards, > Bryan Turner
On Tue, Jan 07, 2020 at 01:36:40AM +0000, brian m. carlson wrote: > In this function, we free the pointer we get from locate_in_PATH and > then check whether it's NULL. However, this is undefined behavior if > the pointer is non-NULL, since the C standard no longer permits us to > use a valid pointer after freeing it. > > The only case in which the C standard would permit this to be defined > behavior is if r were NULL, since it states that in such a case "no > action occurs" as a result of calling free. > > It's easy to suggest that this is not likely to be a problem, but we > know that GCC does aggressively exploit the fact that undefined > behavior can never occur to optimize and rewrite code, even when that's > contrary to the expectations of the programmer. It is, in fact, very > common for it to omit NULL pointer checks, just as we have here. OK, I agree it makes sense to be on the safe side here (and the patch is obviously the right fix). > Noticed-by: Miriam R. <mirucam@gmail.com> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> I think Miriam actually posted the same patch in her initial email: https://lore.kernel.org/git/CAN7CjDDBA0ZoCG9aaQf5rg3gxqny=EjR6v6jE1mnxvUJQSF_0Q@mail.gmail.com/ I don't know how we want to handle authorship. -Peff
On Mon, Jan 06, 2020 at 06:04:25PM -0800, Jonathan Nieder wrote: > -- >8 -- > Subject: run-command: let caller pass in buffer to locate_in_PATH > > Instead of returning a buffer that the caller is responsible for > freeing, use a strbuf output parameter to record the path to the > searched-for program. > > This makes ownership a little easier to reason about, since the owning > code declares the buffer. It's a good habit to follow because it > allows buffer reuse when calling such a function in a loop. > > It also allows the caller exists_in_PATH that does not care about the > path to the command to be slightly simplified, by allowing a NULL > output parameter that means that locate_in_PATH should take care of > allocating and freeing its temporary buffer. > > Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> > --- > run-command.c | 51 +++++++++++++++++++++++++++++---------------------- > 1 file changed, 29 insertions(+), 22 deletions(-) I dunno. Now the rules inside locate_in_PATH() are more complicated, and we have an unusual boolean return from the function. I admit I don't overly care either way, as there are literally two callers (and not likely to be more). So I'd probably just leave it alone, but I'm not opposed to the patch if people think it's a good idea. -Peff
Jeff King <peff@peff.net> writes: > On Tue, Jan 07, 2020 at 01:36:40AM +0000, brian m. carlson wrote: > >> In this function, we free the pointer we get from locate_in_PATH and >> then check whether it's NULL. However, this is undefined behavior if >> the pointer is non-NULL, since the C standard no longer permits us to >> use a valid pointer after freeing it. >> >> The only case in which the C standard would permit this to be defined >> behavior is if r were NULL, since it states that in such a case "no >> action occurs" as a result of calling free. >> >> It's easy to suggest that this is not likely to be a problem, but we >> know that GCC does aggressively exploit the fact that undefined >> behavior can never occur to optimize and rewrite code, even when that's >> contrary to the expectations of the programmer. It is, in fact, very >> common for it to omit NULL pointer checks, just as we have here. > > OK, I agree it makes sense to be on the safe side here (and the patch is > obviously the right fix). > >> Noticed-by: Miriam R. <mirucam@gmail.com> >> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > > I think Miriam actually posted the same patch in her initial email: > > https://lore.kernel.org/git/CAN7CjDDBA0ZoCG9aaQf5rg3gxqny=EjR6v6jE1mnxvUJQSF_0Q@mail.gmail.com/ > > I don't know how we want to handle authorship. I think the explanation in the log message has as much value as, if not more than, the actual patch text, in this case. Noticed-by: may be striking the right balance. Thanks, all.
On 2020-01-07 at 11:01:45, Jeff King wrote: > > Noticed-by: Miriam R. <mirucam@gmail.com> > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > > I think Miriam actually posted the same patch in her initial email: > > https://lore.kernel.org/git/CAN7CjDDBA0ZoCG9aaQf5rg3gxqny=EjR6v6jE1mnxvUJQSF_0Q@mail.gmail.com/ > > I don't know how we want to handle authorship. I don't feel strongly about holding authorship; I'm happy to have her name on it instead of mine since she did propose a solution. I just care that we fix it.
El mié., 8 ene. 2020 a las 3:47, brian m. carlson (<sandals@crustytoothpaste.net>) escribió: > > On 2020-01-07 at 11:01:45, Jeff King wrote: > > > Noticed-by: Miriam R. <mirucam@gmail.com> > > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > > > > I think Miriam actually posted the same patch in her initial email: > > > > https://lore.kernel.org/git/CAN7CjDDBA0ZoCG9aaQf5rg3gxqny=EjR6v6jE1mnxvUJQSF_0Q@mail.gmail.com/ > > > > I don't know how we want to handle authorship. > > I don't feel strongly about holding authorship; I'm happy to have her > name on it instead of mine since she did propose a solution. I just > care that we fix it. > -- Hi, my mentor (Christian Couder <chriscool@tuxfamily.org>) was who detected the problem and sent me the solution, I only asked the community. I think his name should be in the patch instead of mine. Best, Miriam > brian m. carlson: Houston, Texas, US > OpenPGP: https://keybase.io/bk2204
On Wed, Jan 8, 2020 at 10:16 AM Miriam R. <mirucam@gmail.com> wrote: > > El mié., 8 ene. 2020 a las 3:47, brian m. carlson > (<sandals@crustytoothpaste.net>) escribió: > > > I think Miriam actually posted the same patch in her initial email: > > > > > > https://lore.kernel.org/git/CAN7CjDDBA0ZoCG9aaQf5rg3gxqny=EjR6v6jE1mnxvUJQSF_0Q@mail.gmail.com/ > > > > > > I don't know how we want to handle authorship. > > > > I don't feel strongly about holding authorship; I'm happy to have her > > name on it instead of mine since she did propose a solution. I just > > care that we fix it. > my mentor (Christian Couder <chriscool@tuxfamily.org>) was who > detected the problem and sent me the solution, I only asked the > community. I think his name should be in the patch instead of mine. I am ok with not having my name anywhere and leaving the patch as is.
diff --git a/run-command.c b/run-command.c index 9942f120a9..f5e1149f9b 100644 --- a/run-command.c +++ b/run-command.c @@ -213,8 +213,9 @@ static char *locate_in_PATH(const char *file) static int exists_in_PATH(const char *file) { char *r = locate_in_PATH(file); + int found = r != NULL; free(r); - return r != NULL; + return found; } int sane_execvp(const char *file, char * const argv[])
In this function, we free the pointer we get from locate_in_PATH and then check whether it's NULL. However, this is undefined behavior if the pointer is non-NULL, since the C standard no longer permits us to use a valid pointer after freeing it. The only case in which the C standard would permit this to be defined behavior is if r were NULL, since it states that in such a case "no action occurs" as a result of calling free. It's easy to suggest that this is not likely to be a problem, but we know that GCC does aggressively exploit the fact that undefined behavior can never occur to optimize and rewrite code, even when that's contrary to the expectations of the programmer. It is, in fact, very common for it to omit NULL pointer checks, just as we have here. Since it's easy to fix, let's do so, and avoid a potential headache in the future. Noticed-by: Miriam R. <mirucam@gmail.com> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- run-command.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)