diff mbox series

[v2,2/3] strbuf: set errno to 0 after strbuf_getcwd

Message ID 0ed09e9abb85e73a80d044c1ddaed303517752ac.1722632287.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Small fixes for issues detected during internal CI runs | expand

Commit Message

Kyle Lippincott Aug. 2, 2024, 8:58 p.m. UTC
From: Kyle Lippincott <spectral@google.com>

If the loop executes more than once due to cwd being longer than 128
bytes, then `errno = ERANGE` might persist outside of this function.
This technically shouldn't be a problem, as all locations where the
value in `errno` is tested should either (a) call a function that's
guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
to calling the function that only conditionally sets errno, such as the
`strtod` function. In the case of functions in category (b), it's easy
to forget to do that.

Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
This matches the behavior in functions like `run_transaction_hook`
(refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).

Signed-off-by: Kyle Lippincott <spectral@google.com>
---
 strbuf.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Junio C Hamano Aug. 2, 2024, 9:32 p.m. UTC | #1
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Kyle Lippincott <spectral@google.com>
>
> If the loop executes more than once due to cwd being longer than 128
> bytes, then `errno = ERANGE` might persist outside of this function.
> This technically shouldn't be a problem, as all locations where the
> value in `errno` is tested should either (a) call a function that's
> guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
> to calling the function that only conditionally sets errno, such as the
> `strtod` function. In the case of functions in category (b), it's easy
> to forget to do that.
>
> Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> This matches the behavior in functions like `run_transaction_hook`
> (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).

I am still uneasy to see this unconditional clearing, which looks
more like spreading the bad practice from two places you identified
than following good behaviour modelled after these two places.

But I'll let it pass.

As long as our programmers understand that across strbuf_getcwd(),
errno will *not* be preserved, even if the function returns success,
it would be OK.  As the usual convention around errno is that a
successful call would leave errno intact, not clear it to 0, it
would make it a bit harder to learn our API for newcomers, though.

Thanks.

> Signed-off-by: Kyle Lippincott <spectral@google.com>
> ---
>  strbuf.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/strbuf.c b/strbuf.c
> index 3d2189a7f64..b94ef040ab0 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
>  		strbuf_grow(sb, guessed_len);
>  		if (getcwd(sb->buf, sb->alloc)) {
>  			strbuf_setlen(sb, strlen(sb->buf));
> +			errno = 0;
>  			return 0;
>  		}
Eric Sunshine Aug. 2, 2024, 9:54 p.m. UTC | #2
On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> > [...]
> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > This matches the behavior in functions like `run_transaction_hook`
> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
>
> I am still uneasy to see this unconditional clearing, which looks
> more like spreading the bad practice from two places you identified
> than following good behaviour modelled after these two places.
>
> But I'll let it pass.
>
> As long as our programmers understand that across strbuf_getcwd(),
> errno will *not* be preserved, even if the function returns success,
> it would be OK.  As the usual convention around errno is that a
> successful call would leave errno intact, not clear it to 0, it
> would make it a bit harder to learn our API for newcomers, though.

For what it's worth, I share your misgivings about this change and
consider the suggestion[*] to make it save/restore `errno` upon
success more sensible. It would also be a welcome change to see the
function documentation in strbuf.h updated to mention that it follows
the usual convention of leaving `errno` untouched upon success and
clobbered upon error.

[*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/
Kyle Lippincott Aug. 2, 2024, 11:51 p.m. UTC | #3
On Fri, Aug 2, 2024 at 2:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Kyle Lippincott <spectral@google.com>
> >
> > If the loop executes more than once due to cwd being longer than 128
> > bytes, then `errno = ERANGE` might persist outside of this function.
> > This technically shouldn't be a problem, as all locations where the
> > value in `errno` is tested should either (a) call a function that's
> > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
> > to calling the function that only conditionally sets errno, such as the
> > `strtod` function. In the case of functions in category (b), it's easy
> > to forget to do that.
> >
> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > This matches the behavior in functions like `run_transaction_hook`
> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
>
> I am still uneasy to see this unconditional clearing, which looks
> more like spreading the bad practice from two places you identified
> than following good behaviour modelled after these two places.
>
> But I'll let it pass.
>
> As long as our programmers understand that across strbuf_getcwd(),
> errno will *not* be preserved, even if the function returns success,
> it would be OK.  As the usual convention around errno is that a
> successful call would leave errno intact, not clear it to 0, it
> would make it a bit harder to learn our API for newcomers, though.

I'm sympathetic to that argument. If you'd prefer to not have this
patch, I'm fine with it not landing, and instead at some future date I
may try to work on those #leftoverbits from the previous patch (to
make a safer wrapper around strtoX, and ban the use of the unwrapped
versions), or someone else can if they beat me to it.

Since this is wrapping a posix function, and posix has things to say
about this (see below), I agree that it shouldn't set it to 0, and
withdraw this patch.

I'm including my references below mostly because with the information
I just acquired, I think that any attempt to _preserve_ errno is also
folly. No function we write, unless we explicitly state that it _will_
preserve errno, should feel obligated to do so. The number of cases
where errno _could_ be modified according to the various
specifications (C99 and posix) are just too numerous.

---

Perhaps because I'm not all that experienced with C, but when I did C
a couple decades ago, I operated in a mode where basically every
function was actively hostile. If I wanted errno preserved across a
function call, then it's up to me (the caller) to do so, regardless of
what the current implementation of that function says will happen,
because that can change at any point. Unless the function is
documented as errno-preserving, I'm going to treat it as
errno-hostile. In practice, this didn't really matter much, as I've
never found `if (some_func()) { if (!some_other_func()) { /* use errno
from `some_func` */ } }` logic to happen often, but maybe it does in
"real" programs, I was just a hobbyist self-teaching at the time.

The C standard has a very precise definition of how the library
functions defined in the C specification will act. It guarantees:
- the library functions defined in the specification will never set errno to 0.
- the library functions defined in the specification may set the value
to non-zero whether an error occurs or not, "provided the use of errno
is not documented in the description of the function in this
International Standard". What this means is that (a) if the function
as defined in the C standard mentions errno, it can only set the
values as specified there, and (b) if the function as defined in the C
standard does _not_ mention errno, such as `fopen` or `strstr`, it can
do _whatever it wants_ to errno, even on success, _except_ set it to
0.

POSIX has similar language
(https://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html),
with some key differences:
- The value of errno should only be examined when it is indicated to
be valid by a function's return value.
- The setting of errno after a successful call to a function is
unspecified unless the description of that function specifies that
errno shall not be modified.

This means that unlike the C specification, which says that if a
function doesn't describe its use of errno it can do anything it wants
to errno [except set it to 0], in POSIX, a function can do anything it
wants to errno [except set it to 0] at any time.

What this means in practice is that errno should never be assumed to
be preserved across calls to posix functions (like getcwd). Also,
strbuf_getcwd calls free, malloc, and realloc, none of which mention
errno in the C specification, so they can do whatever they want to it
[except set it to 0]. That I was able to find one single function that
was causing problems is luck, and not guaranteed by any specification.

Kind of makes me want to try writing an actively hostile C99 and POSIX
environment, and see how many things break with it. :) C99 spec
doesn't say anything about malloc setting errno? Ok! malloc now sets
errno to ENOENT on tuesdays [in GMT because I'm not a monster], but
only on success. On any other day, it'll set it to ERANGE, regardless
of success or failure.

>
> Thanks.
>
> > Signed-off-by: Kyle Lippincott <spectral@google.com>
> > ---
> >  strbuf.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/strbuf.c b/strbuf.c
> > index 3d2189a7f64..b94ef040ab0 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
> >               strbuf_grow(sb, guessed_len);
> >               if (getcwd(sb->buf, sb->alloc)) {
> >                       strbuf_setlen(sb, strlen(sb->buf));
> > +                     errno = 0;
> >                       return 0;
> >               }
Junio C Hamano Aug. 5, 2024, 3:51 p.m. UTC | #4
Eric Sunshine <sunshine@sunshineco.com> writes:

> On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>> > [...]
>> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
>> > This matches the behavior in functions like `run_transaction_hook`
>> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
>>
>> I am still uneasy to see this unconditional clearing, which looks
>> more like spreading the bad practice from two places you identified
>> than following good behaviour modelled after these two places.
>>
>> But I'll let it pass.
>>
>> As long as our programmers understand that across strbuf_getcwd(),
>> errno will *not* be preserved, even if the function returns success,
>> it would be OK.  As the usual convention around errno is that a
>> successful call would leave errno intact, not clear it to 0, it
>> would make it a bit harder to learn our API for newcomers, though.
>
> For what it's worth, I share your misgivings about this change and
> consider the suggestion[*] to make it save/restore `errno` upon
> success more sensible. It would also be a welcome change to see the
> function documentation in strbuf.h updated to mention that it follows
> the usual convention of leaving `errno` untouched upon success and
> clobbered upon error.
>
> [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/

Yup, of course save/restore would be safer, and probably easier to
reason about for many people.

Thanks.
Kyle Lippincott Aug. 5, 2024, 5:12 p.m. UTC | #5
On Fri, Aug 2, 2024 at 4:51 PM Kyle Lippincott <spectral@google.com> wrote:
>
> On Fri, Aug 2, 2024 at 2:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > From: Kyle Lippincott <spectral@google.com>
> > >
> > > If the loop executes more than once due to cwd being longer than 128
> > > bytes, then `errno = ERANGE` might persist outside of this function.
> > > This technically shouldn't be a problem, as all locations where the
> > > value in `errno` is tested should either (a) call a function that's
> > > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
> > > to calling the function that only conditionally sets errno, such as the
> > > `strtod` function. In the case of functions in category (b), it's easy
> > > to forget to do that.
> > >
> > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > > This matches the behavior in functions like `run_transaction_hook`
> > > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
> >
> > I am still uneasy to see this unconditional clearing, which looks
> > more like spreading the bad practice from two places you identified
> > than following good behaviour modelled after these two places.
> >
> > But I'll let it pass.
> >
> > As long as our programmers understand that across strbuf_getcwd(),
> > errno will *not* be preserved, even if the function returns success,
> > it would be OK.  As the usual convention around errno is that a
> > successful call would leave errno intact, not clear it to 0, it
> > would make it a bit harder to learn our API for newcomers, though.
>
> I'm sympathetic to that argument. If you'd prefer to not have this
> patch, I'm fine with it not landing, and instead at some future date I
> may try to work on those #leftoverbits from the previous patch (to
> make a safer wrapper around strtoX, and ban the use of the unwrapped
> versions), or someone else can if they beat me to it.
>
> Since this is wrapping a posix function, and posix has things to say
> about this (see below), I agree that it shouldn't set it to 0, and
> withdraw this patch.

Dropped this patch in the reroll that (I think) I just sent.

>
> I'm including my references below mostly because with the information
> I just acquired, I think that any attempt to _preserve_ errno is also
> folly. No function we write, unless we explicitly state that it _will_
> preserve errno, should feel obligated to do so. The number of cases
> where errno _could_ be modified according to the various
> specifications (C99 and posix) are just too numerous.
>
> ---
>
> Perhaps because I'm not all that experienced with C, but when I did C
> a couple decades ago, I operated in a mode where basically every
> function was actively hostile. If I wanted errno preserved across a
> function call, then it's up to me (the caller) to do so, regardless of
> what the current implementation of that function says will happen,
> because that can change at any point. Unless the function is
> documented as errno-preserving, I'm going to treat it as
> errno-hostile. In practice, this didn't really matter much, as I've
> never found `if (some_func()) { if (!some_other_func()) { /* use errno
> from `some_func` */ } }` logic to happen often, but maybe it does in
> "real" programs, I was just a hobbyist self-teaching at the time.
>
> The C standard has a very precise definition of how the library
> functions defined in the C specification will act. It guarantees:
> - the library functions defined in the specification will never set errno to 0.
> - the library functions defined in the specification may set the value
> to non-zero whether an error occurs or not, "provided the use of errno
> is not documented in the description of the function in this
> International Standard". What this means is that (a) if the function
> as defined in the C standard mentions errno, it can only set the
> values as specified there, and (b) if the function as defined in the C
> standard does _not_ mention errno, such as `fopen` or `strstr`, it can
> do _whatever it wants_ to errno, even on success, _except_ set it to
> 0.
>
> POSIX has similar language
> (https://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html),
> with some key differences:
> - The value of errno should only be examined when it is indicated to
> be valid by a function's return value.
> - The setting of errno after a successful call to a function is
> unspecified unless the description of that function specifies that
> errno shall not be modified.
>
> This means that unlike the C specification, which says that if a
> function doesn't describe its use of errno it can do anything it wants
> to errno [except set it to 0], in POSIX, a function can do anything it
> wants to errno [except set it to 0] at any time.
>
> What this means in practice is that errno should never be assumed to
> be preserved across calls to posix functions (like getcwd). Also,
> strbuf_getcwd calls free, malloc, and realloc, none of which mention
> errno in the C specification, so they can do whatever they want to it
> [except set it to 0]. That I was able to find one single function that
> was causing problems is luck, and not guaranteed by any specification.
>
> Kind of makes me want to try writing an actively hostile C99 and POSIX
> environment, and see how many things break with it. :) C99 spec
> doesn't say anything about malloc setting errno? Ok! malloc now sets
> errno to ENOENT on tuesdays [in GMT because I'm not a monster], but
> only on success. On any other day, it'll set it to ERANGE, regardless
> of success or failure.
>
> >
> > Thanks.
> >
> > > Signed-off-by: Kyle Lippincott <spectral@google.com>
> > > ---
> > >  strbuf.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/strbuf.c b/strbuf.c
> > > index 3d2189a7f64..b94ef040ab0 100644
> > > --- a/strbuf.c
> > > +++ b/strbuf.c
> > > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
> > >               strbuf_grow(sb, guessed_len);
> > >               if (getcwd(sb->buf, sb->alloc)) {
> > >                       strbuf_setlen(sb, strlen(sb->buf));
> > > +                     errno = 0;
> > >                       return 0;
> > >               }
Patrick Steinhardt Aug. 6, 2024, 6:26 a.m. UTC | #6
On Mon, Aug 05, 2024 at 08:51:50AM -0700, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> 
> > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> > [...]
> >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> >> > This matches the behavior in functions like `run_transaction_hook`
> >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
> >>
> >> I am still uneasy to see this unconditional clearing, which looks
> >> more like spreading the bad practice from two places you identified
> >> than following good behaviour modelled after these two places.
> >>
> >> But I'll let it pass.
> >>
> >> As long as our programmers understand that across strbuf_getcwd(),
> >> errno will *not* be preserved, even if the function returns success,
> >> it would be OK.  As the usual convention around errno is that a
> >> successful call would leave errno intact, not clear it to 0, it
> >> would make it a bit harder to learn our API for newcomers, though.
> >
> > For what it's worth, I share your misgivings about this change and
> > consider the suggestion[*] to make it save/restore `errno` upon
> > success more sensible. It would also be a welcome change to see the
> > function documentation in strbuf.h updated to mention that it follows
> > the usual convention of leaving `errno` untouched upon success and
> > clobbered upon error.
> >
> > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/
> 
> Yup, of course save/restore would be safer, and probably easier to
> reason about for many people.

Is it really all that reasonable? We're essentially partitioning our set
of APIs into two sets, where one set knows to keep `errno` intact
whereas another set doesn't. In such a world, you have to be very
careful about which APIs you are calling in a function that wants to
keep `errno` intact, which to me sounds like a maintenance headache.

I'd claim that most callers never care about `errno` at all. For the
callers that do, I feel it is way more fragile to rely on whether or not
a called function leaves `errno` intact or not. For one, it's fragile
because that may easily change due to a bug. Second, it is fragile
because the dependency on `errno` is not explicitly documented via code,
but rather an implicit dependency.

So isn't it more reasonable to rather make the few callers that do
require `errno` to be left intact to save it? It makes the dependency
explicit, avoids splitting our functions into two sets and allows us to
just ignore this issue for the majority of functions that couldn't care
less about `errno`.

Patrick
Kyle Lippincott Aug. 6, 2024, 7:04 a.m. UTC | #7
On Mon, Aug 5, 2024 at 11:26 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Mon, Aug 05, 2024 at 08:51:50AM -0700, Junio C Hamano wrote:
> > Eric Sunshine <sunshine@sunshineco.com> writes:
> >
> > > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> > >> > [...]
> > >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > >> > This matches the behavior in functions like `run_transaction_hook`
> > >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
> > >>
> > >> I am still uneasy to see this unconditional clearing, which looks
> > >> more like spreading the bad practice from two places you identified
> > >> than following good behaviour modelled after these two places.
> > >>
> > >> But I'll let it pass.
> > >>
> > >> As long as our programmers understand that across strbuf_getcwd(),
> > >> errno will *not* be preserved, even if the function returns success,
> > >> it would be OK.  As the usual convention around errno is that a
> > >> successful call would leave errno intact, not clear it to 0, it
> > >> would make it a bit harder to learn our API for newcomers, though.
> > >
> > > For what it's worth, I share your misgivings about this change and
> > > consider the suggestion[*] to make it save/restore `errno` upon
> > > success more sensible. It would also be a welcome change to see the
> > > function documentation in strbuf.h updated to mention that it follows
> > > the usual convention of leaving `errno` untouched upon success and
> > > clobbered upon error.
> > >
> > > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/
> >
> > Yup, of course save/restore would be safer, and probably easier to
> > reason about for many people.
>
> Is it really all that reasonable? We're essentially partitioning our set
> of APIs into two sets, where one set knows to keep `errno` intact
> whereas another set doesn't. In such a world, you have to be very
> careful about which APIs you are calling in a function that wants to
> keep `errno` intact, which to me sounds like a maintenance headache.
>
> I'd claim that most callers never care about `errno` at all. For the
> callers that do, I feel it is way more fragile to rely on whether or not
> a called function leaves `errno` intact or not. For one, it's fragile
> because that may easily change due to a bug. Second, it is fragile
> because the dependency on `errno` is not explicitly documented via code,
> but rather an implicit dependency.
>
> So isn't it more reasonable to rather make the few callers that do
> require `errno` to be left intact to save it? It makes the dependency
> explicit, avoids splitting our functions into two sets and allows us to
> just ignore this issue for the majority of functions that couldn't care
> less about `errno`.

100% agreed. The C language specification says you can't rely on errno
persisting across function calls, and that the caller must preserve it
if it needs that behavior for some reason. The POSIX specification
says you can't either except in very rare circumstances where it
guarantees errno will not change. The Linux man page for errno says
you can't rely on errno not changing, even for printf:
https://man7.org/linux/man-pages/man3/errno.3.html

       A common mistake is to do

           if (somecall() == -1) {
               printf("somecall() failed\n");
               if (errno == ...) { ... }
           }

       where errno no longer needs to have the value it had upon return
       from somecall() (i.e., it may have been changed by the
       printf(3)).  If the value of errno should be preserved across a
       library call, it must be saved:

           if (somecall() == -1) {
               int errsv = errno;
               printf("somecall() failed\n");
               if (errsv == ...) { ... }
           }

Basically: errno is _extremely_ volatile. One should assume that
_every_ function call is going to change it, even if they return
successfully. The only thing that can't happen is that the functions
defined in the C and POSIX standards set errno to 0, which is why I
withdrew the patch (since it's a wrapper around a function defined in
POSIX). But in general, I don't see any reason for any of the
functions we write to be errno preserving, especially since any call
to malloc, printf, trace functionality, etc. may modify errno.

>
> Patrick
diff mbox series

Patch

diff --git a/strbuf.c b/strbuf.c
index 3d2189a7f64..b94ef040ab0 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -601,6 +601,7 @@  int strbuf_getcwd(struct strbuf *sb)
 		strbuf_grow(sb, guessed_len);
 		if (getcwd(sb->buf, sb->alloc)) {
 			strbuf_setlen(sb, strlen(sb->buf));
+			errno = 0;
 			return 0;
 		}