Message ID | 20250103-b4-pks-object-file-racy-collision-check-v1-2-6ef9e2da1f87@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | object-file: retry linking file into place when occluding file vanishes | expand |
Patrick Steinhardt <ps@pks.im> writes: > In a preceding commit, we have adapted `check_collision()` to ignore > the case where either of the colliding files vanishes. This should be > safe in general when we assume that the contents of these two files were > the same. But the check is all about detecting collisions, so that > assumption may be too optimistic. > > Adapt the code to retry linking the object into place when we see that > the destination file has racily vanished. This should generally succeed > as we have just observed that the destination file does not exist > anymore, except in the very unlikely event that it gets recreated by > another concurrent process again. OK. The way the new code is structured, we are set to infinitely retry as long as our link() fails with EEXIST yet check_collision() finds that the destination is missing. It makes me somewhat uneasy, but such a condition needs an actively racing attacker that can perfectly time the creation and the removal of the destination, so it would probably be OK if this code lacks "we retry only once and fail" safety valve. > Furthermore, stop treating `ENOENT` specially for the source file. It > shouldn't happen that the source vanishes as we're using a fresh > temporary file for it, so if it does vanish it indicates an actual > error. Makes sense. > Suggested-by: Jeff King <peff@peff.net> > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > object-file.c | 23 +++++++++++++++++------ > 1 file changed, 17 insertions(+), 6 deletions(-) Will queue.
On Fri, Jan 03, 2025 at 09:19:55AM +0100, Patrick Steinhardt wrote: > In a preceding commit, we have adapted `check_collision()` to ignore If it's in next and we are building on top, I think we can mention it by name: 0ad3d65652 (object-file: fix race in object collision check, 2024-12-30). > the case where either of the colliding files vanishes. This should be > safe in general when we assume that the contents of these two files were > the same. But the check is all about detecting collisions, so that > assumption may be too optimistic. I found this a little vague about what "too optimistic" means. ;) Maybe something like: Prior to 0ad3d65652, callers could expect that a successful return from finalize_object_file() means that either the file was moved into place, or the identical bytes were already present. If neither of those happens, we'd return an error. Since that commit, if the destination file disappears between our link() call and the collision check, we'd return success without actually checking the contents, and without retrying the link. This solves the common case that the files were indeed the same, but it means that we may corrupt the repository if they weren't (this implies a hash collision, but the whole point of this function is protecting against hash collisions). We can't be pessimistic and assume they're different; that hurts the common case that 0ad3d65652 was trying to fix. But after seeing that the destination file went away, we can retry linking again... > Furthermore, stop treating `ENOENT` specially for the source file. It > shouldn't happen that the source vanishes as we're using a fresh > temporary file for it, so if it does vanish it indicates an actual > error. OK. I think this is worth doing, but I'd probably have put it into its own commit. > @@ -1987,6 +1988,8 @@ static int check_collision(const char *source, const char *dest) > if (fd_dest < 0) { > if (errno != ENOENT) > ret = error_errno(_("unable to open %s"), dest); > + else > + ret = CHECK_COLLISION_DEST_VANISHED; > goto out; > } OK. We're depending on error() never using "-2" itself, but I think that is a reasonable thing. > @@ -2034,8 +2037,10 @@ int finalize_object_file(const char *tmpfile, const char *filename) > int finalize_object_file_flags(const char *tmpfile, const char *filename, > enum finalize_object_file_flags flags) > { > - struct stat st; > - int ret = 0; > + int ret; > + > +retry: > + ret = 0; > > if (object_creation_mode == OBJECT_CREATION_USES_RENAMES) > goto try_rename; > @@ -2056,6 +2061,8 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, > * left to unlink. > */ > if (ret && ret != EEXIST) { > + struct stat st; > + OK, we move the stat struct here where it's needed. I think that's somewhat orthogonal to your patch, but reduced scoping does help make the goto's less confusing. I suspect there's a way to write this as a loop that would be more structured, but it would be a bigger refactor. Bonus points if it also get rid of the try_rename goto, too. ;) I'm OK punting on that, though. > @@ -2071,9 +2078,13 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, > errno = saved_errno; > return error_errno(_("unable to write file %s"), filename); > } > - if (!(flags & FOF_SKIP_COLLISION_CHECK) && > - check_collision(tmpfile, filename)) > + if (!(flags & FOF_SKIP_COLLISION_CHECK)) { > + ret = check_collision(tmpfile, filename); > + if (ret == CHECK_COLLISION_DEST_VANISHED) > + goto retry; > + else if (ret) > return -1; > + } > unlink_or_warn(tmpfile); > } I share Junio's uneasiness with looping forever based on external input from the filesystem (even though you _should_ eventually win the race, that's not guaranteed, and of course a weird filesystem might confuse us). Could we put a stop-gap in it like: diff --git a/object-file.c b/object-file.c index 88432cc9c0..262a2f3df2 100644 --- a/object-file.c +++ b/object-file.c @@ -2038,6 +2038,7 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, enum finalize_object_file_flags flags) { int ret; + int retries = 0; retry: ret = 0; @@ -2080,8 +2081,11 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, } if (!(flags & FOF_SKIP_COLLISION_CHECK)) { ret = check_collision(tmpfile, filename); - if (ret == CHECK_COLLISION_DEST_VANISHED) + if (ret == CHECK_COLLISION_DEST_VANISHED) { + if (retries++ > 5) + return error(_("unable to write repeatedly vanishing file %s"), filename); goto retry; + } else if (ret) return -1; } Otherwise, I think the logic looks good. -Peff
On Fri, Jan 03, 2025 at 02:40:58PM -0500, Jeff King wrote: > I suspect there's a way to write this as a loop that would be more > structured, but it would be a bigger refactor. Bonus points if it also > get rid of the try_rename goto, too. ;) > > I'm OK punting on that, though. For fun, here's a version without any goto's in it, that should behave the same. But it would be very easy to miss a case. So I don't know if it is worth the regression risk, and I don't blame you if you delete this message without looking carefully. ;) Diff is kind of hard to read, so you may want to apply (on top of your patches) and just look at the post-image. diff --git a/object-file.c b/object-file.c index 88432cc9c0..923d75a889 100644 --- a/object-file.c +++ b/object-file.c @@ -2037,58 +2037,66 @@ int finalize_object_file(const char *tmpfile, const char *filename) int finalize_object_file_flags(const char *tmpfile, const char *filename, enum finalize_object_file_flags flags) { - int ret; + int tries = 5; -retry: - ret = 0; + while (tries-- > 0) { + int ret = 0; + if (object_creation_mode != OBJECT_CREATION_USES_RENAMES) { + if (!link(tmpfile, filename)) { + unlink_or_warn(tmpfile); + break; + } + ret = errno; + } - if (object_creation_mode == OBJECT_CREATION_USES_RENAMES) - goto try_rename; - else if (link(tmpfile, filename)) - ret = errno; - else - unlink_or_warn(tmpfile); + /* + * Coda hack - coda doesn't like cross-directory links, + * so we fall back to a rename, which will mean that it + * won't be able to check collisions, but that's not a + * big deal. + * + * The same holds for FAT formatted media. + * + * When this succeeds, we just return. We have nothing + * left to unlink. + */ + if (!ret || ret == EEXIST) { + struct stat st; - /* - * Coda hack - coda doesn't like cross-directory links, - * so we fall back to a rename, which will mean that it - * won't be able to check collisions, but that's not a - * big deal. - * - * The same holds for FAT formatted media. - * - * When this succeeds, we just return. We have nothing - * left to unlink. - */ - if (ret && ret != EEXIST) { - struct stat st; + if (!stat(filename, &st)) + ret = EEXIST; + else if (!rename(tmpfile, filename)) + break; + else + ret = errno; + } - try_rename: - if (!stat(filename, &st)) - ret = EEXIST; - else if (!rename(tmpfile, filename)) - goto out; - else - ret = errno; - } - if (ret) { + /* Do not retry most filesystem errors */ if (ret != EEXIST) { int saved_errno = errno; unlink_or_warn(tmpfile); errno = saved_errno; return error_errno(_("unable to write file %s"), filename); } - if (!(flags & FOF_SKIP_COLLISION_CHECK)) { - ret = check_collision(tmpfile, filename); - if (ret == CHECK_COLLISION_DEST_VANISHED) - goto retry; - else if (ret) - return -1; + + ret = (flags & FOF_SKIP_COLLISION_CHECK) ? 0 : + check_collision(tmpfile, filename); + + if (!ret) { + /* Same contents (or we are allowed to assume such). */ + unlink_or_warn(tmpfile); + break; } - unlink_or_warn(tmpfile); + + if (ret != CHECK_COLLISION_DEST_VANISHED) + return -1; /* check_collision() already complained */ + + /* loop again to retry vanished destination */ } -out: + if (tries < 0) + return error(_("unable to write repeatedly vanishing file %s"), filename); + if (adjust_shared_perm(filename)) return error(_("unable to set permission to '%s'"), filename); return 0; -Peff
Jeff King <peff@peff.net> writes: > I share Junio's uneasiness with looping forever based on external input > from the filesystem (even though you _should_ eventually win the race, > that's not guaranteed, and of course a weird filesystem might confuse > us). Yeah, "a weird filesystem" would be a lot more plausible than a determined and accurate attacker to break it. The only thing they have to do is to yield EEXIST when failing link() for some other reason. > Could we put a stop-gap in it like: > > diff --git a/object-file.c b/object-file.c > index 88432cc9c0..262a2f3df2 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -2038,6 +2038,7 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, > enum finalize_object_file_flags flags) > { > int ret; > + int retries = 0; > > retry: > ret = 0; > @@ -2080,8 +2081,11 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, > } > if (!(flags & FOF_SKIP_COLLISION_CHECK)) { > ret = check_collision(tmpfile, filename); > - if (ret == CHECK_COLLISION_DEST_VANISHED) > + if (ret == CHECK_COLLISION_DEST_VANISHED) { > + if (retries++ > 5) > + return error(_("unable to write repeatedly vanishing file %s"), filename); > goto retry; > + } > else if (ret) > return -1; > } Sounds sensible. > Otherwise, I think the logic looks good. > > -Peff Thanks.
Jeff King <peff@peff.net> writes: > Diff is kind of hard to read, so you may want to apply (on top of your > patches) and just look at the post-image. > > diff --git a/object-file.c b/object-file.c > index 88432cc9c0..923d75a889 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -2037,58 +2037,66 @@ int finalize_object_file(const char *tmpfile, const char *filename) > int finalize_object_file_flags(const char *tmpfile, const char *filename, > enum finalize_object_file_flags flags) > { > + int tries = 5; > > + while (tries-- > 0) { > + int ret = 0; > + if (object_creation_mode != OBJECT_CREATION_USES_RENAMES) { Platforms that do not want hardlinks set CREATION_USES_RENAMES flag. We skip this block on them. > + if (!link(tmpfile, filename)) { > + unlink_or_warn(tmpfile); > + break; If we successfully hardlink, we remove the temporary and happily leave the retry loop. > + } > + ret = errno; > + } > > + /* > + * Coda hack - coda doesn't like cross-directory links, > + * so we fall back to a rename, which will mean that it > + * won't be able to check collisions, but that's not a > + * big deal. > + * > + * The same holds for FAT formatted media. > + * > + * When this succeeds, we just return. We have nothing > + * left to unlink. > + */ > + if (!ret || ret == EEXIST) { > + struct stat st; Either we skipped the hardlink step (then ret==0), or tried to hardlink and saw EEXIST, we come here and try renaming. > + if (!stat(filename, &st)) > + ret = EEXIST; We check if the destination already exists, and do the same thing as the case where hardlink failed due to EEXIST, if that is the case. Otherwise, any failure of stat() we assume we are free to rename the new thing there. It is a bit strange that we do not check ENOENT here. The reason why this stat() fails is not all that interesting, because it is subject to a TOCTOU race, and the case we are more interested in is when this stat() succeeds, which positively tells us that there is something at that path (hence we do not have to trigger a failure from rename() to notice a potential collision). Wait, what if stat() succeeds and !S_ISREG(st.st_mode)? But that's the original code for "Coda hack", and that is not something we are trying to "fix" at this point (yet). > + else if (!rename(tmpfile, filename)) > + break; If we manage to rename(), we happily leave the retry loop. Unlike the hardlink case, there is no tmpfile to unlink. Good. > + else > + ret = errno; Here errno is guaranteed from the failure of rename(). If the destination was created immediately after we got ENOENT from stat(), it is likely rename() gave us EEXIST, which we would check for collission and retry. > + } > > + /* Do not retry most filesystem errors */ > if (ret != EEXIST) { > int saved_errno = errno; > unlink_or_warn(tmpfile); > errno = saved_errno; > return error_errno(_("unable to write file %s"), filename); Sensible. > } > + > + ret = (flags & FOF_SKIP_COLLISION_CHECK) ? 0 : > + check_collision(tmpfile, filename); > + > + if (!ret) { > + /* Same contents (or we are allowed to assume such). */ > + unlink_or_warn(tmpfile); > + break; > } > + > + if (ret != CHECK_COLLISION_DEST_VANISHED) > + return -1; /* check_collision() already complained */ > + > + /* loop again to retry vanished destination */ OK. > } > > + if (tries < 0) > + return error(_("unable to write repeatedly vanishing file %s"), filename); > + OK. > if (adjust_shared_perm(filename)) > return error(_("unable to set permission to '%s'"), filename); > return 0;
On Fri, Jan 03, 2025 at 02:40:58PM -0500, Jeff King wrote: > On Fri, Jan 03, 2025 at 09:19:55AM +0100, Patrick Steinhardt wrote: > > > In a preceding commit, we have adapted `check_collision()` to ignore > > If it's in next and we are building on top, I think we can mention it by > name: 0ad3d65652 (object-file: fix race in object collision check, > 2024-12-30). Okay, good to know. I wasn't sure whether the patches might get rewound when `next` gets rewound. > > the case where either of the colliding files vanishes. This should be > > safe in general when we assume that the contents of these two files were > > the same. But the check is all about detecting collisions, so that > > assumption may be too optimistic. > > I found this a little vague about what "too optimistic" means. ;) > Maybe something like: > > Prior to 0ad3d65652, callers could expect that a successful return > from finalize_object_file() means that either the file was moved into > place, or the identical bytes were already present. If neither of > those happens, we'd return an error. > > Since that commit, if the destination file disappears between our > link() call and the collision check, we'd return success without > actually checking the contents, and without retrying the link. This > solves the common case that the files were indeed the same, but it > means that we may corrupt the repository if they weren't (this implies > a hash collision, but the whole point of this function is protecting > against hash collisions). > > We can't be pessimistic and assume they're different; that hurts the > common case that 0ad3d65652 was trying to fix. But after seeing that > the destination file went away, we can retry linking again... Well, as usual your commit messages are something to aspire to :) Thanks. > > Furthermore, stop treating `ENOENT` specially for the source file. It > > shouldn't happen that the source vanishes as we're using a fresh > > temporary file for it, so if it does vanish it indicates an actual > > error. > > OK. I think this is worth doing, but I'd probably have put it into its > own commit. Fair enough, can do. > > @@ -2034,8 +2037,10 @@ int finalize_object_file(const char *tmpfile, const char *filename) > > int finalize_object_file_flags(const char *tmpfile, const char *filename, > > enum finalize_object_file_flags flags) > > { > > - struct stat st; > > - int ret = 0; > > + int ret; > > + > > +retry: > > + ret = 0; > > > > if (object_creation_mode == OBJECT_CREATION_USES_RENAMES) > > goto try_rename; > > @@ -2056,6 +2061,8 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, > > * left to unlink. > > */ > > if (ret && ret != EEXIST) { > > + struct stat st; > > + > > OK, we move the stat struct here where it's needed. I think that's > somewhat orthogonal to your patch, but reduced scoping does help make > the goto's less confusing. > > I suspect there's a way to write this as a loop that would be more > structured, but it would be a bigger refactor. Bonus points if it also > get rid of the try_rename goto, too. ;) > > I'm OK punting on that, though. Yeah, I'll punt on it for now. I don't love the resulting structure, but it's also not that uncommon in our codebase. > > @@ -2071,9 +2078,13 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, > > errno = saved_errno; > > return error_errno(_("unable to write file %s"), filename); > > } > > - if (!(flags & FOF_SKIP_COLLISION_CHECK) && > > - check_collision(tmpfile, filename)) > > + if (!(flags & FOF_SKIP_COLLISION_CHECK)) { > > + ret = check_collision(tmpfile, filename); > > + if (ret == CHECK_COLLISION_DEST_VANISHED) > > + goto retry; > > + else if (ret) > > return -1; > > + } > > unlink_or_warn(tmpfile); > > } > > I share Junio's uneasiness with looping forever based on external input > from the filesystem (even though you _should_ eventually win the race, > that's not guaranteed, and of course a weird filesystem might confuse > us). Could we put a stop-gap in it like: Fair enough. I was also wondering about whether or not I should have a retry counter when writing it but couldn't think about a (sane) scenario where it would be needed. But yeah, filesystems can be weird, and it's not a lot of code either, so I'll add it. Patrick
On Fri, Jan 03, 2025 at 02:59:42PM -0500, Jeff King wrote: > On Fri, Jan 03, 2025 at 02:40:58PM -0500, Jeff King wrote: > > > I suspect there's a way to write this as a loop that would be more > > structured, but it would be a bigger refactor. Bonus points if it also > > get rid of the try_rename goto, too. ;) > > > > I'm OK punting on that, though. > > For fun, here's a version without any goto's in it, that should behave > the same. But it would be very easy to miss a case. So I don't know if > it is worth the regression risk, and I don't blame you if you delete > this message without looking carefully. ;) > > Diff is kind of hard to read, so you may want to apply (on top of your > patches) and just look at the post-image. Thanks. For now though I think I prefer to go with the simpler diff that uses goto, as it feels less risky close to v2.48. We can still refactor this in the next release cycle. Patrick
On Mon, Jan 06, 2025 at 12:11:47PM +0100, Patrick Steinhardt wrote: > On Fri, Jan 03, 2025 at 02:59:42PM -0500, Jeff King wrote: > > On Fri, Jan 03, 2025 at 02:40:58PM -0500, Jeff King wrote: > > > > > I suspect there's a way to write this as a loop that would be more > > > structured, but it would be a bigger refactor. Bonus points if it also > > > get rid of the try_rename goto, too. ;) > > > > > > I'm OK punting on that, though. > > > > For fun, here's a version without any goto's in it, that should behave > > the same. But it would be very easy to miss a case. So I don't know if > > it is worth the regression risk, and I don't blame you if you delete > > this message without looking carefully. ;) > > > > Diff is kind of hard to read, so you may want to apply (on top of your > > patches) and just look at the post-image. > > Thanks. For now though I think I prefer to go with the simpler diff that > uses goto, as it feels less risky close to v2.48. We can still refactor > this in the next release cycle. Sounds good. I looked over your v2 and it seems fine to me. -Peff
diff --git a/object-file.c b/object-file.c index e1989236ca87e565dea4d003f57882f257889ecf..88432cc9c07c3c56ce31a298a0ee90e5b5acbaff 100644 --- a/object-file.c +++ b/object-file.c @@ -1970,6 +1970,8 @@ static void write_object_file_prepare_literally(const struct git_hash_algo *algo hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen); } +#define CHECK_COLLISION_DEST_VANISHED -2 + static int check_collision(const char *source, const char *dest) { char buf_source[4096], buf_dest[4096]; @@ -1978,8 +1980,7 @@ static int check_collision(const char *source, const char *dest) fd_source = open(source, O_RDONLY); if (fd_source < 0) { - if (errno != ENOENT) - ret = error_errno(_("unable to open %s"), source); + ret = error_errno(_("unable to open %s"), source); goto out; } @@ -1987,6 +1988,8 @@ static int check_collision(const char *source, const char *dest) if (fd_dest < 0) { if (errno != ENOENT) ret = error_errno(_("unable to open %s"), dest); + else + ret = CHECK_COLLISION_DEST_VANISHED; goto out; } @@ -2034,8 +2037,10 @@ int finalize_object_file(const char *tmpfile, const char *filename) int finalize_object_file_flags(const char *tmpfile, const char *filename, enum finalize_object_file_flags flags) { - struct stat st; - int ret = 0; + int ret; + +retry: + ret = 0; if (object_creation_mode == OBJECT_CREATION_USES_RENAMES) goto try_rename; @@ -2056,6 +2061,8 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, * left to unlink. */ if (ret && ret != EEXIST) { + struct stat st; + try_rename: if (!stat(filename, &st)) ret = EEXIST; @@ -2071,9 +2078,13 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, errno = saved_errno; return error_errno(_("unable to write file %s"), filename); } - if (!(flags & FOF_SKIP_COLLISION_CHECK) && - check_collision(tmpfile, filename)) + if (!(flags & FOF_SKIP_COLLISION_CHECK)) { + ret = check_collision(tmpfile, filename); + if (ret == CHECK_COLLISION_DEST_VANISHED) + goto retry; + else if (ret) return -1; + } unlink_or_warn(tmpfile); }
In a preceding commit, we have adapted `check_collision()` to ignore the case where either of the colliding files vanishes. This should be safe in general when we assume that the contents of these two files were the same. But the check is all about detecting collisions, so that assumption may be too optimistic. Adapt the code to retry linking the object into place when we see that the destination file has racily vanished. This should generally succeed as we have just observed that the destination file does not exist anymore, except in the very unlikely event that it gets recreated by another concurrent process again. Furthermore, stop treating `ENOENT` specially for the source file. It shouldn't happen that the source vanishes as we're using a fresh temporary file for it, so if it does vanish it indicates an actual error. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> --- object-file.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)