Message ID | 713e51a25c1c4cfa830db97f71cd7c39e85864d4.1705585037.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | 4f36b8597c8f1c20e465ade4159896fb592e34c0 |
Headers | show |
Series | reftable/stack: fix race in up-to-date check | expand |
Patrick Steinhardt <ps@pks.im> writes: > This should address the race in a POSIX-compliant way. The only real > downside is that this mechanism cannot be used on non-POSIX-compliant > systems like Windows. But we at least have the second-level caching > mechanism in place that compares contents of "files.list" with the > currently loaded list of tables. OK. > + /* > + * Cache stat information in case it provides a useful signal to us. > + * According to POSIX, "The st_ino and st_dev fields taken together > + * uniquely identify the file within the system." That being said, > + * Windows is not POSIX compliant and we do not have these fields > + * available. So the information we have there is insufficient to > + * determine whether two file descriptors point to the same file. > + * > + * While we could fall back to using other signals like the file's > + * mtime, those are not sufficient to avoid races. We thus refrain from > + * using the stat cache on such systems and fall back to the secondary > + * caching mechanism, which is to check whether contents of the file > + * have changed. OK. > + * > + * On other systems which are POSIX compliant we must keep the file > + * descriptor open. This is to avoid a race condition where two > + * processes access the reftable stack at the same point in time: > + * > + * 1. A reads the reftable stack and caches its stat info. > + * > + * 2. B updates the stack, appending a new table to "tables.list". > + * This will both use a new inode and result in a different file > + * size, thus invalidating A's cache in theory. > + * > + * 3. B decides to auto-compact the stack and merges two tables. The > + * file size now matches what A has cached again. Furthermore, the > + * filesystem may decide to recycle the inode number of the file > + * we have replaced in (2) because it is not in use anymore. > + * > + * 4. A reloads the reftable stack. Neither the inode number nor the > + * file size changed. If the timestamps did not change either then > + * we think the cached copy of our stack is up-to-date. > + * > + * By keeping the file descriptor open the inode number cannot be > + * recycled, mitigating the race. > + */ This is nasty. Well diagnosed and fixed. Will queue. Thanks.
On Thu, Jan 18, 2024 at 02:41:56PM +0100, Patrick Steinhardt wrote: > Refactor the code to stop using `stat_validity_check()`. Instead, we > manually stat(3P) the file descriptors to make relevant information > available. On Windows and MSYS2 the result will have both `st_dev` and > `st_ino` set to 0, which allows us to address the first issue by not > using the stat-based cache in that case. It also allows us to make sure > that we always compare `st_dev` and `st_ino`, addressing the second > issue. I didn't think too hard about the details, but does this mean that every user of stat_validity_check() has the same issue? The other big one is packed-refs (for which the code was originally written). Should this fix go into that API? -Peff
On Fri, Jan 19, 2024 at 08:05:59PM -0500, Jeff King wrote: > On Thu, Jan 18, 2024 at 02:41:56PM +0100, Patrick Steinhardt wrote: > > > Refactor the code to stop using `stat_validity_check()`. Instead, we > > manually stat(3P) the file descriptors to make relevant information > > available. On Windows and MSYS2 the result will have both `st_dev` and > > `st_ino` set to 0, which allows us to address the first issue by not > > using the stat-based cache in that case. It also allows us to make sure > > that we always compare `st_dev` and `st_ino`, addressing the second > > issue. > > I didn't think too hard about the details, but does this mean that > every user of stat_validity_check() has the same issue? The other big > one is packed-refs (for which the code was originally written). Should > this fix go into that API? In theory, the issue is the same for the `packed-refs` file. But in practice it's much less likely to be an issue: - The file gets rewritten a lot less frequently than the "tables.list" file, making the race less likely in the first place. It can only happen when git-pack-refs(1) races with concurrent readers, whereas it can happen for any two concurrent process with reftables. - Due to entries in the `packed-refs` being of variable length it's less likely that the size will be exactly the same after the file has been rewritten. For reftables we have constant-length entries in the "tables.list", so it's likely to happen there. - It is very unlikely that we have the same issue with inode reuse with the packed-refs file. The only reason why we have it with the reftable backend is that it is very likely that we end up writing to "tables.list" twice, once for the normal update and once for auto compaction. So overall, I doubt that it's all that critical in practice for the packed-refs backend. It _is_ possible to happen, but chances are significantly lower. I cannot recall a single report of this issue, which underpins how unlikely it seems to be for the files backend. Also, applying the same fix for the packed-refs would essentially mean that the caching mechanism is now ineffective on Windows systems where we do not have proper `st_dev` and `st_ino` values available. I think this is a no-go in the context of packed-refs because we don't have a second-level caching mechanism like we do in the reftable backend. It's not great that we have to reread the "tables.list" file on Windows systems for now, but at least it's better than having to reread the complete "packed-refs" file like we'd have to do with the files backend. Patrick
On Mon, Jan 22, 2024 at 11:32:14AM +0100, Patrick Steinhardt wrote: > > I didn't think too hard about the details, but does this mean that > > every user of stat_validity_check() has the same issue? The other big > > one is packed-refs (for which the code was originally written). Should > > this fix go into that API? > > In theory, the issue is the same for the `packed-refs` file. But in > practice it's much less likely to be an issue: Thanks for laying this all out. It does concern me a little that there's still a possible race, because they can be so hard to catch and debug in practice. But I think you make a compelling argument that it's probably not happening a lot in practice, and especially... > Also, applying the same fix for the packed-refs would essentially mean > that the caching mechanism is now ineffective on Windows systems where > we do not have proper `st_dev` and `st_ino` values available. I think > this is a no-go in the context of packed-refs because we don't have a > second-level caching mechanism like we do in the reftable backend. It's > not great that we have to reread the "tables.list" file on Windows > systems for now, but at least it's better than having to reread the > complete "packed-refs" file like we'd have to do with the files backend. ...here that the performance profile is so different. If the "fix" means re-reading the whole packed-refs file constantly, that's going to be quite noticeable. Given that this race has been here forever-ish, I agree with you that we should leave it be. -Peff
diff --git a/reftable/stack.c b/reftable/stack.c index 705cfb6caa..77a387a86c 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -66,6 +66,7 @@ int reftable_new_stack(struct reftable_stack **dest, const char *dir, strbuf_addstr(&list_file_name, "/tables.list"); p->list_file = strbuf_detach(&list_file_name, NULL); + p->list_fd = -1; p->reftable_dir = xstrdup(dir); p->config = config; @@ -175,7 +176,12 @@ void reftable_stack_destroy(struct reftable_stack *st) st->readers_len = 0; FREE_AND_NULL(st->readers); } - stat_validity_clear(&st->list_validity); + + if (st->list_fd >= 0) { + close(st->list_fd); + st->list_fd = -1; + } + FREE_AND_NULL(st->list_file); FREE_AND_NULL(st->reftable_dir); reftable_free(st); @@ -375,11 +381,59 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, sleep_millisec(delay); } - stat_validity_update(&st->list_validity, fd); - out: - if (err) - stat_validity_clear(&st->list_validity); + /* + * Invalidate the stat cache. It is sufficient to only close the file + * descriptor and keep the cached stat info because we never use the + * latter when the former is negative. + */ + if (st->list_fd >= 0) { + close(st->list_fd); + st->list_fd = -1; + } + + /* + * Cache stat information in case it provides a useful signal to us. + * According to POSIX, "The st_ino and st_dev fields taken together + * uniquely identify the file within the system." That being said, + * Windows is not POSIX compliant and we do not have these fields + * available. So the information we have there is insufficient to + * determine whether two file descriptors point to the same file. + * + * While we could fall back to using other signals like the file's + * mtime, those are not sufficient to avoid races. We thus refrain from + * using the stat cache on such systems and fall back to the secondary + * caching mechanism, which is to check whether contents of the file + * have changed. + * + * On other systems which are POSIX compliant we must keep the file + * descriptor open. This is to avoid a race condition where two + * processes access the reftable stack at the same point in time: + * + * 1. A reads the reftable stack and caches its stat info. + * + * 2. B updates the stack, appending a new table to "tables.list". + * This will both use a new inode and result in a different file + * size, thus invalidating A's cache in theory. + * + * 3. B decides to auto-compact the stack and merges two tables. The + * file size now matches what A has cached again. Furthermore, the + * filesystem may decide to recycle the inode number of the file + * we have replaced in (2) because it is not in use anymore. + * + * 4. A reloads the reftable stack. Neither the inode number nor the + * file size changed. If the timestamps did not change either then + * we think the cached copy of our stack is up-to-date. + * + * By keeping the file descriptor open the inode number cannot be + * recycled, mitigating the race. + */ + if (!err && fd >= 0 && !fstat(fd, &st->list_st) && + st->list_st.st_dev && st->list_st.st_ino) { + st->list_fd = fd; + fd = -1; + } + if (fd >= 0) close(fd); free_names(names); @@ -396,8 +450,39 @@ static int stack_uptodate(struct reftable_stack *st) int err; int i = 0; - if (stat_validity_check(&st->list_validity, st->list_file)) - return 0; + /* + * When we have cached stat information available then we use it to + * verify whether the file has been rewritten. + * + * Note that we explicitly do not want to use `stat_validity_check()` + * and friends here because they may end up not comparing the `st_dev` + * and `st_ino` fields. These functions thus cannot guarantee that we + * indeed still have the same file. + */ + if (st->list_fd >= 0) { + struct stat list_st; + + if (stat(st->list_file, &list_st) < 0) { + /* + * It's fine for "tables.list" to not exist. In that + * case, we have to refresh when the loaded stack has + * any readers. + */ + if (errno == ENOENT) + return !!st->readers_len; + return REFTABLE_IO_ERROR; + } + + /* + * When "tables.list" refers to the same file we can assume + * that it didn't change. This is because we always use + * rename(3P) to update the file and never write to it + * directly. + */ + if (st->list_st.st_dev == list_st.st_dev && + st->list_st.st_ino == list_st.st_ino) + return 0; + } err = read_lines(st->list_file, &names); if (err < 0) diff --git a/reftable/stack.h b/reftable/stack.h index 3f80cc598a..c1e3efa899 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -14,8 +14,10 @@ license that can be found in the LICENSE file or at #include "reftable-stack.h" struct reftable_stack { - struct stat_validity list_validity; + struct stat list_st; char *list_file; + int list_fd; + char *reftable_dir; int disable_auto_compact; diff --git a/reftable/system.h b/reftable/system.h index 2cc7adf271..6b74a81514 100644 --- a/reftable/system.h +++ b/reftable/system.h @@ -12,7 +12,6 @@ license that can be found in the LICENSE file or at /* This header glues the reftable library to the rest of Git */ #include "git-compat-util.h" -#include "statinfo.h" #include "strbuf.h" #include "hash-ll.h" /* hash ID, sizes.*/ #include "dir.h" /* remove_dir_recursively, for tests.*/
In 6fdfaf15a0 (reftable/stack: use stat info to avoid re-reading stack list, 2024-01-11) we have introduced a new mechanism to avoid re-reading the table list in case stat(3P) figures out that the stack didn't change since the last time we read it. While this change significantly improved performance when writing many refs, it can unfortunately lead to false negatives in very specific scenarios. Given two processes A and B, there is a feasible sequence of events that cause us to accidentally treat the table list as up-to-date even though it changed: 1. A reads the reftable stack and caches its stat info. 2. B updates the stack, appending a new table to "tables.list". This will both use a new inode and result in a different file size, thus invalidating A's cache in theory. 3. B decides to auto-compact the stack and merges two tables. The file size now matches what A has cached again. Furthermore, the filesystem may decide to recycle the inode number of the file we have replaced in (2) because it is not in use anymore. 4. A reloads the reftable stack. Neither the inode number nor the file size changed. If the timestamps did not change either then we think the cached copy of our stack is up-to-date. In fact, the commit introduced three related issues: - Non-POSIX compliant systems may not report proper `st_dev` and `st_ino` values in stat(3P), which made us rely solely on the file's potentially coarse-grained mtime and ctime. - `stat_validity_check()` and friends may end up not comparing `st_dev` and `st_ino` depending on the "core.checkstat" config, again reducing the signal to the mtime and ctime. - `st_ino` can be recycled, rendering the check moot even on POSIX-compliant systems. Given that POSIX defines that "The st_ino and st_dev fields taken together uniquely identify the file within the system", these issues led to the most important signal to establish file identity to be ignored or become useless in some cases. Refactor the code to stop using `stat_validity_check()`. Instead, we manually stat(3P) the file descriptors to make relevant information available. On Windows and MSYS2 the result will have both `st_dev` and `st_ino` set to 0, which allows us to address the first issue by not using the stat-based cache in that case. It also allows us to make sure that we always compare `st_dev` and `st_ino`, addressing the second issue. The third issue of inode recycling can be addressed by keeping the file descriptor of "files.list" open during the lifetime of the reftable stack. As the file will still exist on disk even though it has been unlinked it is impossible for its inode to be recycled as long as the file descriptor is still open. This should address the race in a POSIX-compliant way. The only real downside is that this mechanism cannot be used on non-POSIX-compliant systems like Windows. But we at least have the second-level caching mechanism in place that compares contents of "files.list" with the currently loaded list of tables. This new mechanism performs roughly the same as the previous one that relied on `stat_validity_check()`: Benchmark 1: update-ref: create many refs (HEAD~) Time (mean ± σ): 4.754 s ± 0.026 s [User: 2.204 s, System: 2.549 s] Range (min … max): 4.694 s … 4.802 s 20 runs Benchmark 2: update-ref: create many refs (HEAD) Time (mean ± σ): 4.721 s ± 0.020 s [User: 2.194 s, System: 2.527 s] Range (min … max): 4.691 s … 4.753 s 20 runs Summary update-ref: create many refs (HEAD~) ran 1.01 ± 0.01 times faster than update-ref: create many refs (HEAD) Signed-off-by: Patrick Steinhardt <ps@pks.im> --- reftable/stack.c | 99 +++++++++++++++++++++++++++++++++++++++++++---- reftable/stack.h | 4 +- reftable/system.h | 1 - 3 files changed, 95 insertions(+), 9 deletions(-)