diff mbox series

[v2,3/5] reftable/stack: use stat info to avoid re-reading stack list

Message ID c0f7cec95bd78231bbbce2a30662b71fe00b091d.1704966670.git.ps@pks.im (mailing list archive)
State Accepted
Commit 6fdfaf15a0c60572ac58c979a46e34634051e12f
Headers show
Series reftable: optimize I/O patterns | expand

Commit Message

Patrick Steinhardt Jan. 11, 2024, 10:06 a.m. UTC
Whenever we call into the refs interfaces we potentially have to reload
refs in case they have been concurrently modified, either in-process or
externally. While this happens somewhat automatically for loose refs
because we simply try to re-read the files, the "packed" backend will
reload its snapshot of the packed-refs file in case its stat info has
changed since last reading it.

In the reftable backend we have a similar mechanism that is provided by
`reftable_stack_reload()`. This function will read the list of stacks
from "tables.list" and, if they have changed from the currently stored
list, reload the stacks. This is heavily inefficient though, as we have
to check whether the stack is up-to-date on basically every read and
thus keep on re-reading the file all the time even if it didn't change
at all.

We can do better and use the same stat(3P)-based mechanism that the
"packed" backend uses. Instead of reading the file, we will only open
the file descriptor, fstat(3P) it, and then compare the info against the
cached value from the last time we have updated the stack. This should
always work alright because "tables.list" is updated atomically via a
rename, so even if the ctime or mtime wasn't granular enough to identify
a change, at least the inode number or file size should have changed.

This change significantly speeds up operations where many refs are read,
like when using git-update-ref(1). The following benchmark creates N
refs in an otherwise-empty repository via `git update-ref --stdin`:

  Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~)
    Time (mean ± σ):       5.1 ms ±   0.2 ms    [User: 2.4 ms, System: 2.6 ms]
    Range (min … max):     4.8 ms …   7.2 ms    109 runs

  Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~)
    Time (mean ± σ):      19.1 ms ±   0.9 ms    [User: 8.9 ms, System: 9.9 ms]
    Range (min … max):    18.4 ms …  26.7 ms    72 runs

  Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~)
    Time (mean ± σ):      1.336 s ±  0.018 s    [User: 0.590 s, System: 0.724 s]
    Range (min … max):    1.314 s …  1.373 s    10 runs

  Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD)
    Time (mean ± σ):       5.1 ms ±   0.2 ms    [User: 2.4 ms, System: 2.6 ms]
    Range (min … max):     4.8 ms …   7.2 ms    109 runs

  Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD)
    Time (mean ± σ):      14.8 ms ±   0.2 ms    [User: 7.1 ms, System: 7.5 ms]
    Range (min … max):    14.2 ms …  15.2 ms    82 runs

  Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD)
    Time (mean ± σ):     927.6 ms ±   5.3 ms    [User: 437.8 ms, System: 489.5 ms]
    Range (min … max):   919.4 ms … 936.4 ms    10 runs

  Summary
    update-ref: create many refs (refcount = 1, revision = HEAD) ran
      1.00 ± 0.07 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~)
      2.89 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
      3.74 ± 0.25 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
    181.26 ± 8.30 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
    261.01 ± 12.35 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c  | 12 +++++++++++-
 reftable/stack.h  |  1 +
 reftable/system.h |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/reftable/stack.c b/reftable/stack.c
index b1ee247601..c28d82299d 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -175,6 +175,7 @@  void reftable_stack_destroy(struct reftable_stack *st)
 		st->readers_len = 0;
 		FREE_AND_NULL(st->readers);
 	}
+	stat_validity_clear(&st->list_validity);
 	FREE_AND_NULL(st->list_file);
 	FREE_AND_NULL(st->reftable_dir);
 	reftable_free(st);
@@ -374,7 +375,11 @@  static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
 		sleep_millisec(delay);
 	}
 
+	stat_validity_update(&st->list_validity, fd);
+
 out:
+	if (err)
+		stat_validity_clear(&st->list_validity);
 	if (fd >= 0)
 		close(fd);
 	free_names(names);
@@ -388,8 +393,13 @@  static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
 static int stack_uptodate(struct reftable_stack *st)
 {
 	char **names = NULL;
-	int err = read_lines(st->list_file, &names);
+	int err;
 	int i = 0;
+
+	if (stat_validity_check(&st->list_validity, st->list_file))
+		return 0;
+
+	err = read_lines(st->list_file, &names);
 	if (err < 0)
 		return err;
 
diff --git a/reftable/stack.h b/reftable/stack.h
index f57005846e..3f80cc598a 100644
--- a/reftable/stack.h
+++ b/reftable/stack.h
@@ -14,6 +14,7 @@  license that can be found in the LICENSE file or at
 #include "reftable-stack.h"
 
 struct reftable_stack {
+	struct stat_validity list_validity;
 	char *list_file;
 	char *reftable_dir;
 	int disable_auto_compact;
diff --git a/reftable/system.h b/reftable/system.h
index 6b74a81514..2cc7adf271 100644
--- a/reftable/system.h
+++ b/reftable/system.h
@@ -12,6 +12,7 @@  license that can be found in the LICENSE file or at
 /* This header glues the reftable library to the rest of Git */
 
 #include "git-compat-util.h"
+#include "statinfo.h"
 #include "strbuf.h"
 #include "hash-ll.h" /* hash ID, sizes.*/
 #include "dir.h" /* remove_dir_recursively, for tests.*/