Message ID | YwXw2ytUlrXSSRh7@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | curiosities with tempfile.active | expand |
Am 24.08.22 um 11:35 schrieb Jeff King: > But here's the interesting twist. Commit 2c2db194bd (tempfile: add > mks_tempfile_dt(), 2022-04-20) recently taught remove_tempfiles() to > drop a surrounding directory. And it does so by munging the filename > buffer. It has to, because we can't allocate a new buffer in a signal > handler. > > But is it also idempotent(-ish)? The directory removal itself is, > because it checks: > > tempfile->filename.buf[tempfile->directorylen] == '/' > > before overwriting that byte with a NUL, so it should only be true once > (though I note this violates the usual "volatile" rules for signal > handlers, it's probably OK in practice since we know the NUL will be > written before we call rmdir()). > > But after that happens, the filename buffer now contains a C string that > points to the directory. And if we end up in remove_tempfiles() again > due to another signal, we'll try to unlink(p->filename.buf), which will > now unlink the directory! I'm not sure how that behaves on all systems. > On Linux it's a noop. And if it just deleted the directory, that would > be OK. I seem to recall on old versions of SunOS it did bad things > (maybe because it really did unlink it, but without checking for > orphaned entries inside). https://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html says of the unlink(2) parameter: "The path argument shall not name a directory unless the process has appropriate privileges and the implementation supports using unlink() on directories." So we better not do that.. --- >8 --- Subject: [PATCH] tempfile: avoid directory cleanup race The temporary directory created by mks_tempfile_dt() is deleted by first deleting the file within, then truncating the filename strbuf and passing the resulting string to rmdir(2). When the cleanup routine is invoked concurrently by a signal handler we can end up passing the now truncated string to unlink(2), however, which could cause problems on some systems. Avoid that issue by remembering the directory name separately. This way the paths stay unchanged. A signal handler can still race with normal cleanup, but deleting the same files and directories twice is harmless. Reported-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> --- tempfile.c | 14 ++++++-------- tempfile.h | 2 +- 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/tempfile.c b/tempfile.c index 2024c82691..7414c81e31 100644 --- a/tempfile.c +++ b/tempfile.c @@ -59,14 +59,11 @@ static VOLATILE_LIST_HEAD(tempfile_list); static void remove_template_directory(struct tempfile *tempfile, int in_signal_handler) { - if (tempfile->directorylen > 0 && - tempfile->directorylen < tempfile->filename.len && - tempfile->filename.buf[tempfile->directorylen] == '/') { - strbuf_setlen(&tempfile->filename, tempfile->directorylen); + if (tempfile->directory) { if (in_signal_handler) - rmdir(tempfile->filename.buf); + rmdir(tempfile->directory); else - rmdir_or_warn(tempfile->filename.buf); + rmdir_or_warn(tempfile->directory); } } @@ -115,7 +112,7 @@ static struct tempfile *new_tempfile(void) tempfile->owner = 0; INIT_LIST_HEAD(&tempfile->list); strbuf_init(&tempfile->filename, 0); - tempfile->directorylen = 0; + tempfile->directory = NULL; return tempfile; } @@ -141,6 +138,7 @@ static void deactivate_tempfile(struct tempfile *tempfile) { tempfile->active = 0; strbuf_release(&tempfile->filename); + free(tempfile->directory); volatile_list_del(&tempfile->list); free(tempfile); } @@ -254,7 +252,7 @@ struct tempfile *mks_tempfile_dt(const char *directory_template, tempfile = new_tempfile(); strbuf_swap(&tempfile->filename, &sb); - tempfile->directorylen = directorylen; + tempfile->directory = xmemdupz(tempfile->filename.buf, directorylen); tempfile->fd = fd; activate_tempfile(tempfile); return tempfile; diff --git a/tempfile.h b/tempfile.h index d7804a214a..5b9e8743dd 100644 --- a/tempfile.h +++ b/tempfile.h @@ -82,7 +82,7 @@ struct tempfile { FILE *volatile fp; volatile pid_t owner; struct strbuf filename; - size_t directorylen; + char *directory; }; /* -- 2.37.2
On Sat, Aug 27, 2022 at 12:46:29AM +0200, René Scharfe wrote: > https://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html > says of the unlink(2) parameter: "The path argument shall not name a > directory unless the process has appropriate privileges and the > implementation supports using unlink() on directories." So we better > not do that.. Yeah, I saw that. It's a bit vague, and if the call returns ENOSYS or EISDIR, that would be perfectly fine. It's the "what happens on the implementations that do support it..." part that I'm more worried about. :) That said... > --- >8 --- > Subject: [PATCH] tempfile: avoid directory cleanup race > > The temporary directory created by mks_tempfile_dt() is deleted by first > deleting the file within, then truncating the filename strbuf and > passing the resulting string to rmdir(2). When the cleanup routine is > invoked concurrently by a signal handler we can end up passing the now > truncated string to unlink(2), however, which could cause problems on > some systems. > > Avoid that issue by remembering the directory name separately. This way > the paths stay unchanged. A signal handler can still race with normal > cleanup, but deleting the same files and directories twice is harmless. Right, I'm a little embarrassed I didn't immediately come up with this same fix. This is definitely the right thing to do. The more we can treat data from signal-handlers are write-once, the better. There's a slight extra memory cost to store the directory name twice, but it's a drop in the bucket overall. > tempfile.c | 14 ++++++-------- > tempfile.h | 2 +- The patch itself looks good to me. -Peff
On Sat, Aug 27, 2022 at 6:05 AM Jeff King <peff@peff.net> wrote: > Yeah, I saw that. It's a bit vague, and if the call returns ENOSYS or > EISDIR, that would be perfectly fine. It's the "what happens on the > implementations that do support it..." part that I'm more worried about. :) The history here is that pre-4.2BSD, Unix systems had no mkdir system call. You used mknod() to make a truly empty directory and the link() to create the "." and ".." entries within it, and all three of these operations were restricted to the super-user. There was no rmdir either, so again, unlink() as the super-user was permitted to do the job (with three calls to unlink the "." and ".." entries first and then remove the directory). Unlinking a directory when it still contains "." leaves the link count at 1 and there's no GC, so it sits around occupying an inode. If you have a mkdir() system call and don't need backwards compatibility, you get to have these return EISDIR errors... Chris
On Sat, Aug 27, 2022 at 02:47:45PM -0700, Chris Torek wrote: > On Sat, Aug 27, 2022 at 6:05 AM Jeff King <peff@peff.net> wrote: > > Yeah, I saw that. It's a bit vague, and if the call returns ENOSYS or > > EISDIR, that would be perfectly fine. It's the "what happens on the > > implementations that do support it..." part that I'm more worried about. :) > > The history here is that pre-4.2BSD, Unix systems had no mkdir > system call. You used mknod() to make a truly empty directory and > the link() to create the "." and ".." entries within it, and all three of > these operations were restricted to the super-user. There was no > rmdir either, so again, unlink() as the super-user was permitted to > do the job (with three calls to unlink the "." and ".." entries first and > then remove the directory). > > Unlinking a directory when it still contains "." leaves the link count > at 1 and there's no GC, so it sits around occupying an inode. Thanks, that matches the sense of unease I had in the back of my mind. I seem to recall that maybe older versions of SunOS exhibited this, but that feels like a lifetime ago. At any rate, we should avoid that unlink() call, and René's patch neatly does so. -Peff
diff --git a/tempfile.c b/tempfile.c index 2024c82691..6134b73972 100644 --- a/tempfile.c +++ b/tempfile.c @@ -89,8 +89,6 @@ static void remove_tempfiles(int in_signal_handler) else unlink_or_warn(p->filename.buf); remove_template_directory(p, in_signal_handler); - - p->active = 0; } } @@ -111,7 +109,6 @@ static struct tempfile *new_tempfile(void) struct tempfile *tempfile = xmalloc(sizeof(*tempfile)); tempfile->fd = -1; tempfile->fp = NULL; - tempfile->active = 0; tempfile->owner = 0; INIT_LIST_HEAD(&tempfile->list); strbuf_init(&tempfile->filename, 0); @@ -123,9 +120,6 @@ static void activate_tempfile(struct tempfile *tempfile) { static int initialized; - if (is_tempfile_active(tempfile)) - BUG("activate_tempfile called for active object"); - if (!initialized) { sigchain_push_common(remove_tempfiles_on_signal); atexit(remove_tempfiles_on_exit); @@ -134,12 +128,10 @@ static void activate_tempfile(struct tempfile *tempfile) volatile_list_add(&tempfile->list, &tempfile_list); tempfile->owner = getpid(); - tempfile->active = 1; } static void deactivate_tempfile(struct tempfile *tempfile) { - tempfile->active = 0; strbuf_release(&tempfile->filename); volatile_list_del(&tempfile->list); free(tempfile); diff --git a/tempfile.h b/tempfile.h index d7804a214a..f0bf59dbac 100644 --- a/tempfile.h +++ b/tempfile.h @@ -77,7 +77,6 @@ struct tempfile { volatile struct volatile_list_head list; - volatile sig_atomic_t active; volatile int fd; FILE *volatile fp; volatile pid_t owner; @@ -221,7 +220,7 @@ FILE *fdopen_tempfile(struct tempfile *tempfile, const char *mode); static inline int is_tempfile_active(struct tempfile *tempfile) { - return tempfile && tempfile->active; + return !!tempfile; } /*