Message ID | df774306-f29b-4a75-a282-59db89812b9a@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | apply: replace mksnpath() with a mkpathdup() call | expand |
René Scharfe <l.s.r@web.de> writes: > Support paths longer than PATH_MAX in create_one_file() (which is not a > hard limit e.g. on Linux) by calling mkpathdup() instead of mksnpath(). > Remove the latter, as it becomes unused by this change. Resist the > temptation of using the more convenient mkpath() to avoid introducing a > dependency on a static variable deep inside the apply machinery. > > Suggested-by: Jeff King <peff@peff.net> > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > apply.c | 13 +++++++++---- > path.c | 17 ----------------- > path.h | 6 ------ > 3 files changed, 9 insertions(+), 27 deletions(-) As a patch to "apply", this is perfectly good, but in the larger picture, the value of the patch is in getting rid of mksnpath, isn't it? I am tempted to - queue it on rs/retire-mksnpath topic branch - remove the mention of mksnpath in contrib/vscode/init.sh (I think it is registering non-words to spell checker, so having an extra entry there would not hurt, but it would more likely to be a typo of mkpath than before this patch retires mksnpath function). - ask you to come up with a title with more focus on mksnpath than on apply. Thanks. > diff --git a/apply.c b/apply.c > index 432837a674..4793b05f3d 100644 > --- a/apply.c > +++ b/apply.c > @@ -4502,17 +4502,22 @@ static int create_one_file(struct apply_state *state, > unsigned int nr = getpid(); > > for (;;) { > - char newpath[PATH_MAX]; > - mksnpath(newpath, sizeof(newpath), "%s~%u", path, nr); > + char *newpath = mkpathdup("%s~%u", path, nr); > res = try_create_file(state, newpath, mode, buf, size); > - if (res < 0) > + if (res < 0) { > + free(newpath); > return -1; > + } > if (!res) { > - if (!rename(newpath, path)) > + if (!rename(newpath, path)) { > + free(newpath); > return 0; > + } > unlink_or_warn(newpath); > + free(newpath); > break; > } > + free(newpath); > if (errno != EEXIST) > break; > ++nr; > diff --git a/path.c b/path.c > index 8bb223c92c..67229edb9c 100644 > --- a/path.c > +++ b/path.c > @@ -28,8 +28,6 @@ static int get_st_mode_bits(const char *path, int *mode) > return 0; > } > > -static char bad_path[] = "/bad-path/"; > - > static struct strbuf *get_pathname(void) > { > static struct strbuf pathname_array[4] = { > @@ -59,21 +57,6 @@ static void strbuf_cleanup_path(struct strbuf *sb) > strbuf_remove(sb, 0, path - sb->buf); > } > > -char *mksnpath(char *buf, size_t n, const char *fmt, ...) > -{ > - va_list args; > - unsigned len; > - > - va_start(args, fmt); > - len = vsnprintf(buf, n, fmt, args); > - va_end(args); > - if (len >= n) { > - strlcpy(buf, bad_path, n); > - return buf; > - } > - return (char *)cleanup_path(buf); > -} > - > static int dir_prefix(const char *buf, const char *dir) > { > int len = strlen(dir); > diff --git a/path.h b/path.h > index e053effef2..ea96487b29 100644 > --- a/path.h > +++ b/path.h > @@ -23,12 +23,6 @@ const char *mkpath(const char *fmt, ...) > char *mkpathdup(const char *fmt, ...) > __attribute__((format (printf, 1, 2))); > > -/* > - * Construct a path and place the result in the provided buffer `buf`. > - */ > -char *mksnpath(char *buf, size_t n, const char *fmt, ...) > - __attribute__((format (printf, 3, 4))); > - > /* > * The `git_common_path` family of functions will construct a path into a > * repository's common git directory, which is shared by all worktrees. > -- > 2.44.0
On Thu, Apr 04, 2024 at 11:08:38PM +0200, René Scharfe wrote: > Support paths longer than PATH_MAX in create_one_file() (which is not a > hard limit e.g. on Linux) by calling mkpathdup() instead of mksnpath(). > Remove the latter, as it becomes unused by this change. Resist the > temptation of using the more convenient mkpath() to avoid introducing a > dependency on a static variable deep inside the apply machinery. > > Suggested-by: Jeff King <peff@peff.net> > Signed-off-by: René Scharfe <l.s.r@web.de> Heh, so obviously I had the same patch stewing. But one thing that gave me pause is: do we need to worry about preserving errno across free() boundaries? Traditionally touching errno was something free() was allowed to do, and there were definitely cases where glibc would do so (mostly due to munmap). But recent versions of POSIX clarify that it should not touch errno, and glibc as of 2.33 does not (which dates to 2021). This issue from 2015 talks about "the next version of POSIX" making that change: https://sourceware.org/bugzilla/show_bug.cgi?id=17924 but it looks to me from: https://www.austingroupbugs.net/view.php?id=385 that the change wasn't accepted there until 2020 (and AFAICT that hasn't resulted in a new published spec yet). Now it would be pretty darn useful to not have to worry about preserving errno. It's subtle code that's hard to remember is needed, and sometimes hard to get right depending on the rest of the flow control. Years like "2020" and "2021" are too recent for us to say "oh, that's ancient history, we don't have to care anymore". But I wonder if we can be a little cavalier here for two reasons: - it's rare; for the most part free() isn't going to touch errno. Failures from munmap() are pretty rare, and small allocations like this are probably done with sbrk() anyway. Of course that's just talking about glibc, and there are other platforms. But it still seems like it should be a rarity for any free() implementation to fail or to want to touch errno. - the stakes are usually pretty low; the outcome in most cases would be a misleading error message as we clobber errno. But note that it does sometimes affect control flow (this patch is an example; we are checking for EEXIST to break out of the loop). So would it be that unreasonable to assume the modern, desired behavior, and mostly shrug our shoulders for other platforms? We could even provide: #ifdef FREE_CLOBBERS_ERRNO void git_free(void *ptr) { int saved_errno = errno; free(ptr); errno = saved_errno; } #define free(ptr) git_free(ptr) #endif if we really wanted to be careful, though it's hard to know which platforms even need it (and again, it's so rare to come up in practice that I'd suspect people could go for a long time before realizing their platform was a problem). I guess the flip side is that we could use the function above by default, and disable it selectively (the advantage of disabling it being that it's slightly more efficient; maybe that's not even measurable?). > for (;;) { > - char newpath[PATH_MAX]; > - mksnpath(newpath, sizeof(newpath), "%s~%u", path, nr); > + char *newpath = mkpathdup("%s~%u", path, nr); > res = try_create_file(state, newpath, mode, buf, size); > - if (res < 0) > + if (res < 0) { > + free(newpath); > return -1; > + } > if (!res) { > - if (!rename(newpath, path)) > + if (!rename(newpath, path)) { > + free(newpath); > return 0; > + } > unlink_or_warn(newpath); > + free(newpath); > break; > } > + free(newpath); > if (errno != EEXIST) > break; > ++nr; At any rate, you can probably see the places where free() clobbering errno would be a problem here. Our return when "res < 0" (though I don't think any of the callers actually care about errno after that), the check for EEXIST at the bottom of the loop, and after we break out of the loop, we use error_errno() to report it. -Peff
Jeff King <peff@peff.net> writes: >> + free(newpath); >> if (errno != EEXIST) >> break; >> ++nr; > > At any rate, you can probably see the places where free() clobbering > errno would be a problem here. Our return when "res < 0" (though I don't > think any of the callers actually care about errno after that), the > check for EEXIST at the bottom of the loop, and after we break out of > the loop, we use error_errno() to report it. Yeah, a failing free() is unlikely to set errno to EEXIST ;-)
Am 05.04.24 um 00:53 schrieb Jeff King: > On Thu, Apr 04, 2024 at 11:08:38PM +0200, René Scharfe wrote: > >> Support paths longer than PATH_MAX in create_one_file() (which is not a >> hard limit e.g. on Linux) by calling mkpathdup() instead of mksnpath(). >> Remove the latter, as it becomes unused by this change. Resist the >> temptation of using the more convenient mkpath() to avoid introducing a >> dependency on a static variable deep inside the apply machinery. >> >> Suggested-by: Jeff King <peff@peff.net> >> Signed-off-by: René Scharfe <l.s.r@web.de> > > Heh, so obviously I had the same patch stewing. But one thing that gave > me pause is: do we need to worry about preserving errno across free() > boundaries? > > Traditionally touching errno was something free() was allowed to do, and > there were definitely cases where glibc would do so (mostly due to > munmap). But recent versions of POSIX clarify that it should not touch > errno, and glibc as of 2.33 does not (which dates to 2021). > > This issue from 2015 talks about "the next version of POSIX" making that > change: > > https://sourceware.org/bugzilla/show_bug.cgi?id=17924 > > but it looks to me from: > > https://www.austingroupbugs.net/view.php?id=385 > > that the change wasn't accepted there until 2020 (and AFAICT that hasn't > resulted in a new published spec yet). Horrible. OS implementation details peeking through an API that has no return value and no defined errors is outright Lovecraftian. > Now it would be pretty darn useful to not have to worry about preserving > errno. It's subtle code that's hard to remember is needed, and sometimes > hard to get right depending on the rest of the flow control. Yes. > Years like "2020" and "2021" are too recent for us to say "oh, that's > ancient history, we don't have to care anymore". But I wonder if we can > be a little cavalier here for two reasons: > > - it's rare; for the most part free() isn't going to touch errno. > Failures from munmap() are pretty rare, and small allocations like > this are probably done with sbrk() anyway. Of course that's just > talking about glibc, and there are other platforms. But it still > seems like it should be a rarity for any free() implementation to > fail or to want to touch errno. > > - the stakes are usually pretty low; the outcome in most cases would > be a misleading error message as we clobber errno. But note that it > does sometimes affect control flow (this patch is an example; we are > checking for EEXIST to break out of the loop). > > So would it be that unreasonable to assume the modern, desired behavior, > and mostly shrug our shoulders for other platforms? We could even > provide: > > #ifdef FREE_CLOBBERS_ERRNO > void git_free(void *ptr) > { > int saved_errno = errno; > free(ptr); > errno = saved_errno; > } > #define free(ptr) git_free(ptr) > #endif > > if we really wanted to be careful, though it's hard to know which > platforms even need it (and again, it's so rare to come up in practice > that I'd suspect people could go for a long time before realizing their > platform was a problem). I guess the flip side is that we could use the > function above by default, and disable it selectively (the advantage of > disabling it being that it's slightly more efficient; maybe that's not > even measurable?). I think performing this ritual automatically every time on all platforms (i.e. to use git_free() unconditionally) to provide sane errno values around free(3) calls is better than having to worry about attacks from the deep. But then again I'm easily scared and still a bit in shock, so perhaps I'm overreacting. René
On Fri, Apr 05, 2024 at 12:52:35PM +0200, René Scharfe wrote: > > So would it be that unreasonable to assume the modern, desired behavior, > > and mostly shrug our shoulders for other platforms? We could even > > provide: > > > > #ifdef FREE_CLOBBERS_ERRNO > > void git_free(void *ptr) > > { > > int saved_errno = errno; > > free(ptr); > > errno = saved_errno; > > } > > #define free(ptr) git_free(ptr) > > #endif > > > > if we really wanted to be careful, though it's hard to know which > > platforms even need it (and again, it's so rare to come up in practice > > that I'd suspect people could go for a long time before realizing their > > platform was a problem). I guess the flip side is that we could use the > > function above by default, and disable it selectively (the advantage of > > disabling it being that it's slightly more efficient; maybe that's not > > even measurable?). > > I think performing this ritual automatically every time on all platforms > (i.e. to use git_free() unconditionally) to provide sane errno values > around free(3) calls is better than having to worry about attacks from > the deep. But then again I'm easily scared and still a bit in shock, so > perhaps I'm overreacting. You may be right. The main reason not to do it is the extra assignments (and call to errno, which I think can be a function hidden behind a macro these days). But I kind of doubt it is measurable, and we expect malloc/free to be a bit heavyweight (compared to regular instructions) anyway. So it is probably me being overly cautious about the performance side. The other reason is that macros (especially of system names) can create their own headaches. We could require xfree() everywhere as a complement to xmalloc (or maybe even just places where the errno preservation seems useful), but that's one more thing to remember. With the patch below you can see both in action: - hyperfine seems to show the git_free() version as ~1% slower to run "git log" (which I picked as something that probably does a reasonable number of mallocs). Frankly, I'm still suspicious that it could have such an impact. Maybe inlining git_free() would help? - usually #defining free(ptr) with an argument will avoid confusion with the word "free" as a token. But in this case there's a function pointer which is then called as struct->free(ptr), causing confusion. diff --git a/convert.c b/convert.c index 35b25eb3cb..dfb31ee3a3 100644 --- a/convert.c +++ b/convert.c @@ -1549,7 +1549,7 @@ typedef void (*free_fn)(struct stream_filter *); struct stream_filter_vtbl { filter_fn filter; - free_fn free; + free_fn free_stream; }; struct stream_filter { @@ -1582,7 +1582,7 @@ static void null_free_fn(struct stream_filter *filter UNUSED) static struct stream_filter_vtbl null_vtbl = { .filter = null_filter_fn, - .free = null_free_fn, + .free_stream = null_free_fn, }; static struct stream_filter null_filter_singleton = { @@ -1691,7 +1691,7 @@ static void lf_to_crlf_free_fn(struct stream_filter *filter) static struct stream_filter_vtbl lf_to_crlf_vtbl = { .filter = lf_to_crlf_filter_fn, - .free = lf_to_crlf_free_fn, + .free_stream = lf_to_crlf_free_fn, }; static struct stream_filter *lf_to_crlf_filter(void) @@ -1787,7 +1787,7 @@ static void cascade_free_fn(struct stream_filter *filter) static struct stream_filter_vtbl cascade_vtbl = { .filter = cascade_filter_fn, - .free = cascade_free_fn, + .free_stream = cascade_free_fn, }; static struct stream_filter *cascade_filter(struct stream_filter *one, @@ -1939,7 +1939,7 @@ static void ident_free_fn(struct stream_filter *filter) static struct stream_filter_vtbl ident_vtbl = { .filter = ident_filter_fn, - .free = ident_free_fn, + .free_stream = ident_free_fn, }; static struct stream_filter *ident_filter(const struct object_id *oid) @@ -1992,7 +1992,7 @@ struct stream_filter *get_stream_filter(struct index_state *istate, void free_stream_filter(struct stream_filter *filter) { - filter->vtbl->free(filter); + filter->vtbl->free_stream(filter); } int stream_filter(struct stream_filter *filter, diff --git a/git-compat-util.h b/git-compat-util.h index 7c2a6538e5..324c47fdff 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1094,6 +1094,9 @@ static inline void move_array(void *dst, const void *src, size_t n, size_t size) COPY_ARRAY(ALLOC_ARRAY((dst), dup_array_n_), (src), dup_array_n_); \ } while (0) +void git_free(void *ptr); +#define free(ptr) git_free(ptr) + /* * These functions help you allocate structs with flex arrays, and copy * the data directly into the array. For example, if you had: diff --git a/wrapper.c b/wrapper.c index eeac3741cf..cfffb177e3 100644 --- a/wrapper.c +++ b/wrapper.c @@ -822,3 +822,14 @@ uint32_t git_rand(void) return result; } + +/* this should probably go in its own file, otherwise anything + * added below here uses the bare free() + */ +#undef free +void git_free(void *ptr) +{ + int saved_errno = errno; + free(ptr); + errno = saved_errno; +}
On Fri, Apr 05, 2024 at 01:35:17PM -0400, Jeff King wrote: > With the patch below you can see both in action: > > - hyperfine seems to show the git_free() version as ~1% slower to run > "git log" (which I picked as something that probably does a > reasonable number of mallocs). Frankly, I'm still suspicious that it > could have such an impact. Maybe inlining git_free() would help? This may have been noise. Weirdly it seemed consistent from run to run using hyperfine. But switching my CPU governor from "powersave" to "performance", I got a run where the new code is fast. -Peff
Am 05.04.24 um 19:35 schrieb Jeff King: > On Fri, Apr 05, 2024 at 12:52:35PM +0200, René Scharfe wrote: > >>> So would it be that unreasonable to assume the modern, desired behavior, >>> and mostly shrug our shoulders for other platforms? We could even >>> provide: >>> >>> #ifdef FREE_CLOBBERS_ERRNO >>> void git_free(void *ptr) >>> { >>> int saved_errno = errno; >>> free(ptr); >>> errno = saved_errno; >>> } >>> #define free(ptr) git_free(ptr) >>> #endif >>> >>> if we really wanted to be careful, though it's hard to know which >>> platforms even need it (and again, it's so rare to come up in practice >>> that I'd suspect people could go for a long time before realizing their >>> platform was a problem). I guess the flip side is that we could use the >>> function above by default, and disable it selectively (the advantage of >>> disabling it being that it's slightly more efficient; maybe that's not >>> even measurable?). >> >> I think performing this ritual automatically every time on all platforms >> (i.e. to use git_free() unconditionally) to provide sane errno values >> around free(3) calls is better than having to worry about attacks from >> the deep. But then again I'm easily scared and still a bit in shock, so >> perhaps I'm overreacting. I calmed down a bit now. And ask myself how widespread the issue actually is. Used the following silly Coccinelle rule to find functions that use errno after a free(3) call: @@ @@ - free(...); ... errno Found only a handful of places, and they all set errno explicitly, so they are fine. No idea how to match any use of errno except assignment. And no idea how to find indirect callers of free(3) that use errno with no potential assignment in between. > You may be right. The main reason not to do it is the extra assignments > (and call to errno, which I think can be a function hidden behind a > macro these days). But I kind of doubt it is measurable, and we expect > malloc/free to be a bit heavyweight (compared to regular instructions) > anyway. So it is probably me being overly cautious about the performance > side. > > The other reason is that macros (especially of system names) can create > their own headaches. We could require xfree() everywhere as a > complement to xmalloc (or maybe even just places where the errno > preservation seems useful), but that's one more thing to remember. An xfree() to go along with xmalloc()/xrealloc()/xcalloc()/xstrdup() would fit in nicely and might be easier to remember than free() after a while. Having to convert thousands of calls is unappealing, though. > With the patch below you can see both in action: > > - hyperfine seems to show the git_free() version as ~1% slower to run > "git log" (which I picked as something that probably does a > reasonable number of mallocs). Frankly, I'm still suspicious that it > could have such an impact. Maybe inlining git_free() would help? I get a slowdown of ca. 0.5%: Benchmark 1: ./git_2.44.0 log v2.44.0 Time (mean ± σ): 705.8 ms ± 1.7 ms [User: 674.0 ms, System: 28.3 ms] Range (min … max): 702.3 ms … 709.2 ms 10 runs Benchmark 2: ./git log v2.44.0 Time (mean ± σ): 708.1 ms ± 2.0 ms [User: 676.4 ms, System: 28.7 ms] Range (min … max): 705.0 ms … 710.9 ms 10 runs Hmm. What if we do the opposite and poison errno in git_free() instead of carrying over the original value? That's only half the work. Benchmark 1: ./git_2.44.0 log v2.44.0 Time (mean ± σ): 704.2 ms ± 1.3 ms [User: 674.2 ms, System: 27.6 ms] Range (min … max): 702.2 ms … 705.8 ms 10 runs Benchmark 2: ./git log v2.44.0 Time (mean ± σ): 706.3 ms ± 0.9 ms [User: 676.3 ms, System: 27.5 ms] Range (min … max): 704.8 ms … 708.2 ms 10 runs So that's slightly better. But the measurements are quite noisy. Found four places that did not expect free(3) to mess up their errno by running the test suite with that. Patch below. > - usually #defining free(ptr) with an argument will avoid confusion > with the word "free" as a token. But in this case there's a function > pointer which is then called as struct->free(ptr), causing > confusion. In other contexts we use "clear" or "release" to name functions like that. René compat/precompose_utf8.c | 2 ++ git.c | 3 +++ lockfile.c | 3 +++ tempfile.c | 2 ++ 4 files changed, 10 insertions(+) diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c index 0bd5c24250..4da80b40ce 100644 --- a/compat/precompose_utf8.c +++ b/compat/precompose_utf8.c @@ -114,8 +114,10 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname) prec_dir->dirp = opendir(dirname); if (!prec_dir->dirp) { + int save_errno = errno; free(prec_dir->dirent_nfc); free(prec_dir); + errno = save_errno; return NULL; } else { int ret_errno = errno; diff --git a/git.c b/git.c index 654d615a18..275674f873 100644 --- a/git.c +++ b/git.c @@ -773,6 +773,7 @@ static int run_argv(int *argcp, const char ***argv) int done_alias = 0; struct string_list cmd_list = STRING_LIST_INIT_NODUP; struct string_list_item *seen; + int save_errno; while (1) { /* @@ -853,7 +854,9 @@ static int run_argv(int *argcp, const char ***argv) done_alias = 1; } + save_errno = errno; string_list_clear(&cmd_list, 0); + errno = save_errno; return done_alias; } diff --git a/lockfile.c b/lockfile.c index 1d5ed01682..b8401f5059 100644 --- a/lockfile.c +++ b/lockfile.c @@ -76,6 +76,7 @@ static int lock_file(struct lock_file *lk, const char *path, int flags, int mode) { struct strbuf filename = STRBUF_INIT; + int save_errno; strbuf_addstr(&filename, path); if (!(flags & LOCK_NO_DEREF)) @@ -83,7 +84,9 @@ static int lock_file(struct lock_file *lk, const char *path, int flags, strbuf_addstr(&filename, LOCK_SUFFIX); lk->tempfile = create_tempfile_mode(filename.buf, mode); + save_errno = errno; strbuf_release(&filename); + errno = save_errno; return lk->tempfile ? lk->tempfile->fd : -1; } diff --git a/tempfile.c b/tempfile.c index ed88cf8431..029b760fa9 100644 --- a/tempfile.c +++ b/tempfile.c @@ -144,7 +144,9 @@ struct tempfile *create_tempfile_mode(const char *path, int mode) tempfile->fd = open(tempfile->filename.buf, O_RDWR | O_CREAT | O_EXCL, mode); if (tempfile->fd < 0) { + int save_errno = errno; deactivate_tempfile(tempfile); + errno = save_errno; return NULL; } activate_tempfile(tempfile);
On Sat, Apr 06, 2024 at 07:45:17PM +0200, René Scharfe wrote: > I calmed down a bit now. And ask myself how widespread the issue actually > is. Used the following silly Coccinelle rule to find functions that use > errno after a free(3) call: > > @@ > @@ > - free(...); > ... > errno > > Found only a handful of places, and they all set errno explicitly, so > they are fine. Is that enough, though? Imagine I have: int foo(const char *x) { char *y = xstfrmt("something/%s", x); int fd = open(y, ...); free(y); return fd; } Then errno is important if some caller of foo() checks errno after foo() returns an error. And that caller might not even be in the same file. In other words, it really depends on the contract of foo() with respect to errno. And I don't think there is a way in C to specify that contract in away that the compiler can understand. > No idea how to match any use of errno except assignment. And no idea how > to find indirect callers of free(3) that use errno with no potential > assignment in between. Yeah, I guess the indirect callers of free() are really the flip-side. My example was indirect users of errno. ;) > > The other reason is that macros (especially of system names) can create > > their own headaches. We could require xfree() everywhere as a > > complement to xmalloc (or maybe even just places where the errno > > preservation seems useful), but that's one more thing to remember. > > An xfree() to go along with xmalloc()/xrealloc()/xcalloc()/xstrdup() would > fit in nicely and might be easier to remember than free() after a while. > Having to convert thousands of calls is unappealing, though. My biggest concern with it is that we'd have to remember to use it, and there's not good compiler enforcement. But I guess coccinelle can help us there. My secondary concern is that it might make people think that xmalloc() and xfree() are always paired, and thus you can do clever things in one as long as the other matches it. But we sometimes free memory from system routines like getline(). Maybe a scary comment would be enough? > Found four places that did not expect free(3) to mess up their errno by > running the test suite with that. Patch below. These are perhaps worth fixing (though not if we come up with a universal solution). But I'd be surprised if they are the only ones. By its nature, this problem only manifests when there are actual errors, and our test suite is mostly checking happy paths. So I'd assume there are a ton of "if (ret < 0) { free(foo); return -1; }" spots that are simply not exercised by the test suite at all. -Peff
Am 07.04.24 um 03:18 schrieb Jeff King: > On Sat, Apr 06, 2024 at 07:45:17PM +0200, René Scharfe wrote: > >> I calmed down a bit now. And ask myself how widespread the issue actually >> is. Used the following silly Coccinelle rule to find functions that use >> errno after a free(3) call: >> >> @@ >> @@ >> - free(...); >> ... >> errno >> >> Found only a handful of places, and they all set errno explicitly, so >> they are fine. > > Is that enough, though? Imagine I have: > > int foo(const char *x) > { > char *y = xstfrmt("something/%s", x); > int fd = open(y, ...); > free(y); > return fd; > } > > Then errno is important if some caller of foo() checks errno after foo() > returns an error. And that caller might not even be in the same file. Yes. > In other words, it really depends on the contract of foo() with respect > to errno. And I don't think there is a way in C to specify that > contract in away that the compiler can understand. The context attribute of sparse could be used, I guess. We'd have to duplicate the declaration of all library functions, though, to tag them with a may_have_changed_errno attribute. And we'd need to clear it on success, so we'd have to modify all callers. On the flip side it would allow detecting consecutive errno changes, e.g. by two I/O functions with no error checking in between. But the implementation effort seems way too high. It would be easier if the error information was in one place instead of one bit in the return value (NULL or less than 0) and the rest in errno. >> No idea how to match any use of errno except assignment. And no idea how >> to find indirect callers of free(3) that use errno with no potential >> assignment in between. > > Yeah, I guess the indirect callers of free() are really the flip-side. > My example was indirect users of errno. ;) Indeed. >>> The other reason is that macros (especially of system names) can create >>> their own headaches. We could require xfree() everywhere as a >>> complement to xmalloc (or maybe even just places where the errno >>> preservation seems useful), but that's one more thing to remember. >> >> An xfree() to go along with xmalloc()/xrealloc()/xcalloc()/xstrdup() would >> fit in nicely and might be easier to remember than free() after a while. >> Having to convert thousands of calls is unappealing, though. > > My biggest concern with it is that we'd have to remember to use it, and > there's not good compiler enforcement. But I guess coccinelle can help > us there.q Yes, converting all free(3) calls is easy with Coccinelle, and the same semantic patch can be used to enforce the use of xfree(). Still the initial diff would be huge (2000+ changed lines in 300+ files). > My secondary concern is that it might make people think that xmalloc() > and xfree() are always paired, and thus you can do clever things in one > as long as the other matches it. But we sometimes free memory from > system routines like getline(). Maybe a scary comment would be enough? Amazing foresight! Currently we only use getline(3) (which can act like realloc(3)) in contrib/credential/, though. Any others? The "x" prefix doesn't promise exclusivity (hermetic seal?), but it certainly suggests there are some shared feature between the allocator functions and xfree(), while they only have in common that the are wrapped. We could call the latter errno_preserving_free() instead or so. I'm more worried about the overhead. For a 0-1% slowdown (in the case of git log) git_free() or xfree() give us accurate error numbers on all platforms. Non-misleading error codes are worth a lot (seen my share of cursed messages), but xfree() is only necessary in the error handling code. The happy path can call free(3) directly without harm. But how to classify call sites accordingly? It's easy for programmers once they know it's necessary. Is there a way in C, macro or Coccinelle to have our cake and only eat it if really needed? I don't see any. :-| >> Found four places that did not expect free(3) to mess up their errno by >> running the test suite with that. Patch below. > > These are perhaps worth fixing (though not if we come up with a > universal solution). But I'd be surprised if they are the only ones. By > its nature, this problem only manifests when there are actual errors, > and our test suite is mostly checking happy paths. So I'd assume there > are a ton of "if (ret < 0) { free(foo); return -1; }" spots that are > simply not exercised by the test suite at all. Makes sense. René
On Sun, Apr 14, 2024 at 05:17:23PM +0200, René Scharfe wrote: > > In other words, it really depends on the contract of foo() with respect > > to errno. And I don't think there is a way in C to specify that > > contract in away that the compiler can understand. > > The context attribute of sparse could be used, I guess. We'd have to > duplicate the declaration of all library functions, though, to tag them > with a may_have_changed_errno attribute. And we'd need to clear it on > success, so we'd have to modify all callers. On the flip side it would > allow detecting consecutive errno changes, e.g. by two I/O functions > with no error checking in between. But the implementation effort seems > way too high. > > It would be easier if the error information was in one place instead of > one bit in the return value (NULL or less than 0) and the rest in errno. Yeah. In a world where every failing function returned the equivalent of -errno, you could just do: ret = xread(fd, ...); /* actually returns -errno! */ if (ret < 0) { close(fd); return ret; /* open's errno preserved via ret */ } We could live in that world if we wrapped all of the syscalls with xread(), etc, that behaved that way. I don't know if it would run into weird gotchas, though, or if the result would simply look foreign to most POSIX/C programmers. I also wonder if it might end up having the same 1% overhead that the free() checks did, as it involves more juggling of stack values. > >> An xfree() to go along with xmalloc()/xrealloc()/xcalloc()/xstrdup() would > >> fit in nicely and might be easier to remember than free() after a while. > >> Having to convert thousands of calls is unappealing, though. > > > > My biggest concern with it is that we'd have to remember to use it, and > > there's not good compiler enforcement. But I guess coccinelle can help > > us there.q > > Yes, converting all free(3) calls is easy with Coccinelle, and the same > semantic patch can be used to enforce the use of xfree(). Still the > initial diff would be huge (2000+ changed lines in 300+ files). True. It's one more weird thing to remember, but for the most part we are there already with xmalloc(). > > My secondary concern is that it might make people think that xmalloc() > > and xfree() are always paired, and thus you can do clever things in one > > as long as the other matches it. But we sometimes free memory from > > system routines like getline(). Maybe a scary comment would be enough? > > Amazing foresight! Currently we only use getline(3) (which can act like > realloc(3)) in contrib/credential/, though. Any others? Heh, I actually meant to say getdelim(), which obviously is closely related. There's only one call to it, but as it underlies strbuf_getwholeline(), many strbufs will use it. And as you might have guessed, my "amazing foresight" came from being bitten by the same thing already. ;) Annoyed at how long git took to run a large workload under massif, I once wrote a hacky peak-heap profiler that wraps the system malloc(). Naturally it needs to match frees to their original allocations so we know how big each one was. Imagine my surprise when I saw many frees without a matching allocation. > The "x" prefix doesn't promise exclusivity (hermetic seal?), but it > certainly suggests there are some shared feature between the allocator > functions and xfree(), while they only have in common that the are > wrapped. We could call the latter errno_preserving_free() instead or so. That's a mouthful, for sure. I think it is OK to suggest they are related as long as there is a comment in xfree() mentioning that we might see pointers that didn't come from our x*() functions. That's where somebody would have to add code that violates the assumption. > I'm more worried about the overhead. For a 0-1% slowdown (in the case of > git log) git_free() or xfree() give us accurate error numbers on all > platforms. Non-misleading error codes are worth a lot (seen my share of > cursed messages), but xfree() is only necessary in the error handling > code. The happy path can call free(3) directly without harm. > > But how to classify call sites accordingly? It's easy for programmers > once they know it's necessary. Is there a way in C, macro or Coccinelle > to have our cake and only eat it if really needed? I don't see any. :-| I don't see a way either. I'd be willing to consider the 0-1% slowdown if it buys us simplicity, though. I'm also not even sure it's real. There seemed to be a fairly large variation in results, and switching my powersaving functions around got me cases where the new code was actually faster. If we think there's a real benefit, we should make sure the costs aren't just illusory. -Peff
diff --git a/apply.c b/apply.c index 432837a674..4793b05f3d 100644 --- a/apply.c +++ b/apply.c @@ -4502,17 +4502,22 @@ static int create_one_file(struct apply_state *state, unsigned int nr = getpid(); for (;;) { - char newpath[PATH_MAX]; - mksnpath(newpath, sizeof(newpath), "%s~%u", path, nr); + char *newpath = mkpathdup("%s~%u", path, nr); res = try_create_file(state, newpath, mode, buf, size); - if (res < 0) + if (res < 0) { + free(newpath); return -1; + } if (!res) { - if (!rename(newpath, path)) + if (!rename(newpath, path)) { + free(newpath); return 0; + } unlink_or_warn(newpath); + free(newpath); break; } + free(newpath); if (errno != EEXIST) break; ++nr; diff --git a/path.c b/path.c index 8bb223c92c..67229edb9c 100644 --- a/path.c +++ b/path.c @@ -28,8 +28,6 @@ static int get_st_mode_bits(const char *path, int *mode) return 0; } -static char bad_path[] = "/bad-path/"; - static struct strbuf *get_pathname(void) { static struct strbuf pathname_array[4] = { @@ -59,21 +57,6 @@ static void strbuf_cleanup_path(struct strbuf *sb) strbuf_remove(sb, 0, path - sb->buf); } -char *mksnpath(char *buf, size_t n, const char *fmt, ...) -{ - va_list args; - unsigned len; - - va_start(args, fmt); - len = vsnprintf(buf, n, fmt, args); - va_end(args); - if (len >= n) { - strlcpy(buf, bad_path, n); - return buf; - } - return (char *)cleanup_path(buf); -} - static int dir_prefix(const char *buf, const char *dir) { int len = strlen(dir); diff --git a/path.h b/path.h index e053effef2..ea96487b29 100644 --- a/path.h +++ b/path.h @@ -23,12 +23,6 @@ const char *mkpath(const char *fmt, ...) char *mkpathdup(const char *fmt, ...) __attribute__((format (printf, 1, 2))); -/* - * Construct a path and place the result in the provided buffer `buf`. - */ -char *mksnpath(char *buf, size_t n, const char *fmt, ...) - __attribute__((format (printf, 3, 4))); - /* * The `git_common_path` family of functions will construct a path into a * repository's common git directory, which is shared by all worktrees.
Support paths longer than PATH_MAX in create_one_file() (which is not a hard limit e.g. on Linux) by calling mkpathdup() instead of mksnpath(). Remove the latter, as it becomes unused by this change. Resist the temptation of using the more convenient mkpath() to avoid introducing a dependency on a static variable deep inside the apply machinery. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> --- apply.c | 13 +++++++++---- path.c | 17 ----------------- path.h | 6 ------ 3 files changed, 9 insertions(+), 27 deletions(-) -- 2.44.0