diff mbox series

[RFC,v2,4/7] update-index: have the index fsync() flush the loose objects

Message ID RFC-patch-v2-4.7-61f4f3d7ef4-20220323T140753Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series bottom-up ns/batched-fsync & "plugging" in object-file.c | expand

Commit Message

Ævar Arnfjörð Bjarmason March 23, 2022, 2:18 p.m. UTC
As with unpack-objects in a preceding commit have update-index.c make
use of the HASH_N_OBJECTS{,_{FIRST,LAST}} flags. We now have a "batch"
mode again for "update-index".

Adding the t/* directory from git.git on a Linux ramdisk is a bit
faster than with the tmp-objdir indirection:

	$ git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/ && git ls-files -- t >repo/.git/to-add.txt' -p 'rm -rf repo/.git/objects/* repo/.git/index' './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' --warmup 1 -r 10Benchmark 1: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'ns/batched-fsync
	  Time (mean ± σ):     281.1 ms ±   2.6 ms    [User: 186.2 ms, System: 92.3 ms]
	  Range (min … max):   278.3 ms … 287.0 ms    10 runs

	Benchmark 2: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'HEAD
	  Time (mean ± σ):     265.9 ms ±   2.6 ms    [User: 181.7 ms, System: 82.1 ms]
	  Range (min … max):   262.0 ms … 270.3 ms    10 runs

	Summary
	  './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'HEAD' ran
	    1.06 ± 0.01 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'ns/batched-fsync'

And as before running that with "strace --summary-only" slows things
down a bit (probably mimicking slower I/O a bit). I then get:

	Summary
	  'strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'HEAD' ran
	    1.19 ± 0.03 times faster than 'strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'ns/batched-fsync'

This one has a twist though, instead of fsync()-ing on the last object
we write we'll not do that, and instead defer the fsync() until we
write the index itself. This is outlined in [1] (as "METHOD THREE").

Because of this under FSYNC_METHOD_BATCH we'll do the N
objects (possibly only one, because we're lazy) as HASH_N_OBJECTS, and
we'll even now support doing this via N arguments on the command-line.

Then we won't fsync() any of it, but we will rename it
in-place (which, if we were still using the tmp-objdir, would leave it
"staged" in the tmp-objdir).

We'll then have the fsync() for the index update "flush" that out, and
thus avoid two fsync() calls when one will do.

Running this with the "git hyperfine" command mentioned in a preceding
commit with "strace --summary-only" shows that we do 1 fsync() now
instead of 2, and have one more sync_file_range(), as expected.

We also go from ~51k syscalls to ~39k, with ~2x the number of link()
and unlink() in ns/batched-fsync, and of course one fsync() instead of
two()>

The flow of this code isn't quite set up for re-plugging the
tmp-objdir back in. In particular we no longer pass
HASH_N_OBJECTS_FIRST (but doing so would be trivial)< and there's no
HASH_N_OBJECTS_LAST.

So this and other callers would need some light transaction-y API, or
to otherwise pass down a "yes, I'd like to flush it" down to
finalize_hashfile(), but doing so will be trivial.

And since we've started structuring it this way it'll become easy to
do any arbitrary number of things down the line that would "bulk
fsync" before the final fsync(). Now we write some objects and fsync()
on the index, but between those two could do any number of other
things where we'd defer the fsync().

This sort of thing might be especially interesting for "git repack"
when it writes e.g. a *.bitmap, *.rev, *.pack and *.idx. In that case
we could skip the fsync() on all of those, and only do it on the *.idx
before we renamed it in-place. I *think* nothing cares about a *.pack
without an *.idx, but even then we could fsync *.idx, rename *.pack,
rename *.idx and still safely do only one fsync(). See "git show
--first-parent" on 62874602032 (Merge branch
'tb/pack-finalize-ordering' into maint, 2021-10-12) for a good
overview of the code involved in that.

1. https://lore.kernel.org/git/220323.86sfr9ndpr.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/update-index.c |  7 ++++---
 cache.h                |  1 +
 read-cache.c           | 29 ++++++++++++++++++++++++++++-
 3 files changed, 33 insertions(+), 4 deletions(-)

Comments

Neeraj Singh March 23, 2022, 8:30 p.m. UTC | #1
On Wed, Mar 23, 2022 at 7:18 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> As with unpack-objects in a preceding commit have update-index.c make
> use of the HASH_N_OBJECTS{,_{FIRST,LAST}} flags. We now have a "batch"
> mode again for "update-index".
>
> Adding the t/* directory from git.git on a Linux ramdisk is a bit
> faster than with the tmp-objdir indirection:
>
>         $ git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/ && git ls-files -- t >repo/.git/to-add.txt' -p 'rm -rf repo/.git/objects/* repo/.git/index' './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' --warmup 1 -r 10Benchmark 1: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'ns/batched-fsync
>           Time (mean ± σ):     281.1 ms ±   2.6 ms    [User: 186.2 ms, System: 92.3 ms]
>           Range (min … max):   278.3 ms … 287.0 ms    10 runs
>
>         Benchmark 2: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'HEAD
>           Time (mean ± σ):     265.9 ms ±   2.6 ms    [User: 181.7 ms, System: 82.1 ms]
>           Range (min … max):   262.0 ms … 270.3 ms    10 runs
>
>         Summary
>           './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'HEAD' ran
>             1.06 ± 0.01 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'ns/batched-fsync'
>
> And as before running that with "strace --summary-only" slows things
> down a bit (probably mimicking slower I/O a bit). I then get:
>
>         Summary
>           'strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'HEAD' ran
>             1.19 ± 0.03 times faster than 'strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin <repo/.git/to-add.txt' in 'ns/batched-fsync'
>
> This one has a twist though, instead of fsync()-ing on the last object
> we write we'll not do that, and instead defer the fsync() until we
> write the index itself. This is outlined in [1] (as "METHOD THREE").
>
> Because of this under FSYNC_METHOD_BATCH we'll do the N
> objects (possibly only one, because we're lazy) as HASH_N_OBJECTS, and
> we'll even now support doing this via N arguments on the command-line.
>
> Then we won't fsync() any of it, but we will rename it
> in-place (which, if we were still using the tmp-objdir, would leave it
> "staged" in the tmp-objdir).
>
> We'll then have the fsync() for the index update "flush" that out, and
> thus avoid two fsync() calls when one will do.
>
> Running this with the "git hyperfine" command mentioned in a preceding
> commit with "strace --summary-only" shows that we do 1 fsync() now
> instead of 2, and have one more sync_file_range(), as expected.
>
> We also go from ~51k syscalls to ~39k, with ~2x the number of link()
> and unlink() in ns/batched-fsync, and of course one fsync() instead of
> two()>
>
> The flow of this code isn't quite set up for re-plugging the
> tmp-objdir back in. In particular we no longer pass
> HASH_N_OBJECTS_FIRST (but doing so would be trivial)< and there's no
> HASH_N_OBJECTS_LAST.
>
> So this and other callers would need some light transaction-y API, or
> to otherwise pass down a "yes, I'd like to flush it" down to
> finalize_hashfile(), but doing so will be trivial.
>
> And since we've started structuring it this way it'll become easy to
> do any arbitrary number of things down the line that would "bulk
> fsync" before the final fsync(). Now we write some objects and fsync()
> on the index, but between those two could do any number of other
> things where we'd defer the fsync().
>
> This sort of thing might be especially interesting for "git repack"
> when it writes e.g. a *.bitmap, *.rev, *.pack and *.idx. In that case
> we could skip the fsync() on all of those, and only do it on the *.idx
> before we renamed it in-place. I *think* nothing cares about a *.pack
> without an *.idx, but even then we could fsync *.idx, rename *.pack,
> rename *.idx and still safely do only one fsync(). See "git show
> --first-parent" on 62874602032 (Merge branch
> 'tb/pack-finalize-ordering' into maint, 2021-10-12) for a good
> overview of the code involved in that.
>
> 1. https://lore.kernel.org/git/220323.86sfr9ndpr.gmgdl@evledraar.gmail.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/update-index.c |  7 ++++---
>  cache.h                |  1 +
>  read-cache.c           | 29 ++++++++++++++++++++++++++++-
>  3 files changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index 34aaaa16c20..6cfec6efb38 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -1142,7 +1142,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>
>                         setup_work_tree();
>                         p = prefix_path(prefix, prefix_length, path);
> -                       update_one(p, 0);
> +                       update_one(p, HASH_N_OBJECTS);
>                         if (set_executable_bit)
>                                 chmod_path(set_executable_bit, p);
>                         free(p);
> @@ -1187,7 +1187,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>                                 strbuf_swap(&buf, &unquoted);
>                         }
>                         p = prefix_path(prefix, prefix_length, buf.buf);
> -                       update_one(p, 0);
> +                       update_one(p, HASH_N_OBJECTS);
>                         if (set_executable_bit)
>                                 chmod_path(set_executable_bit, p);
>                         free(p);
> @@ -1263,7 +1263,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>                                 exit(128);
>                         unable_to_lock_die(get_index_file(), lock_error);
>                 }
> -               if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
> +               if (write_locked_index(&the_index, &lock_file,
> +                                      COMMIT_LOCK | WLI_NEED_LOOSE_FSYNC))
>                         die("Unable to write new index file");
>         }
>
> diff --git a/cache.h b/cache.h
> index 2f3831fa853..7542e009a34 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -751,6 +751,7 @@ void ensure_full_index(struct index_state *istate);
>  /* For use with `write_locked_index()`. */
>  #define COMMIT_LOCK            (1 << 0)
>  #define SKIP_IF_UNCHANGED      (1 << 1)
> +#define WLI_NEED_LOOSE_FSYNC   (1 << 2)
>
>  /*
>   * Write the index while holding an already-taken lock. Close the lock,
> diff --git a/read-cache.c b/read-cache.c
> index 3e0e7d41837..275f6308c32 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2860,6 +2860,33 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>         int ieot_entries = 1;
>         struct index_entry_offset_table *ieot = NULL;
>         int nr, nr_threads;
> +       unsigned int wflags = FSYNC_COMPONENT_INDEX;
> +
> +
> +       /*
> +        * TODO: This is abuse of the API recently modified
> +        * finalize_hashfile() which reveals a shortcoming of its
> +        * "fsync" design.
> +        *
> +        * I.e. It expects a "enum fsync_component component" label,
> +        * but here we're passing it an OR of the two, knowing that
> +        * it'll call fsync_component_or_die() which (in
> +        * write-or-die.c) will do "(fsync_components & wflags)" (to
> +        * our "wflags" here).
> +        *
> +        * But the API really should be changed to explicitly take
> +        * such flags, because in this case we'd like to fsync() the
> +        * index if we're in the bulk mode, *even if* our
> +        * "core.fsync=index" isn't configured.
> +        *
> +        * That's because at this point we've been queuing up object
> +        * writes that we didn't fsync(), and are going to use this
> +        * fsync() to "flush" the whole thing. Doing it this way
> +        * avoids redundantly calling fsync() twice when once will do.
> +        */
> +       if (fsync_method == FSYNC_METHOD_BATCH &&
> +           flags & WLI_NEED_LOOSE_FSYNC)
> +               wflags |= FSYNC_COMPONENT_LOOSE_OBJECT;
>
>         f = hashfd(tempfile->fd, tempfile->filename.buf);
>
> @@ -3094,7 +3121,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>         if (!alternate_index_output && (flags & COMMIT_LOCK))
>                 csum_fsync_flag = CSUM_FSYNC;
>
> -       finalize_hashfile(f, istate->oid.hash, FSYNC_COMPONENT_INDEX,
> +       finalize_hashfile(f, istate->oid.hash, wflags,
>                           CSUM_HASH_IN_STREAM | csum_fsync_flag);
>
>         if (close_tempfile_gently(tempfile)) {
> --
> 2.35.1.1428.g1c1a0152d61
>

In the long run, we should attach the "need to fsync the index" to an
ongoing 'repo-transaction' so that we can composably sync at the best
point regardless of what the top-level git operation does.
diff mbox series

Patch

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 34aaaa16c20..6cfec6efb38 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1142,7 +1142,7 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 
 			setup_work_tree();
 			p = prefix_path(prefix, prefix_length, path);
-			update_one(p, 0);
+			update_one(p, HASH_N_OBJECTS);
 			if (set_executable_bit)
 				chmod_path(set_executable_bit, p);
 			free(p);
@@ -1187,7 +1187,7 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 				strbuf_swap(&buf, &unquoted);
 			}
 			p = prefix_path(prefix, prefix_length, buf.buf);
-			update_one(p, 0);
+			update_one(p, HASH_N_OBJECTS);
 			if (set_executable_bit)
 				chmod_path(set_executable_bit, p);
 			free(p);
@@ -1263,7 +1263,8 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 				exit(128);
 			unable_to_lock_die(get_index_file(), lock_error);
 		}
-		if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
+		if (write_locked_index(&the_index, &lock_file,
+				       COMMIT_LOCK | WLI_NEED_LOOSE_FSYNC))
 			die("Unable to write new index file");
 	}
 
diff --git a/cache.h b/cache.h
index 2f3831fa853..7542e009a34 100644
--- a/cache.h
+++ b/cache.h
@@ -751,6 +751,7 @@  void ensure_full_index(struct index_state *istate);
 /* For use with `write_locked_index()`. */
 #define COMMIT_LOCK		(1 << 0)
 #define SKIP_IF_UNCHANGED	(1 << 1)
+#define WLI_NEED_LOOSE_FSYNC	(1 << 2)
 
 /*
  * Write the index while holding an already-taken lock. Close the lock,
diff --git a/read-cache.c b/read-cache.c
index 3e0e7d41837..275f6308c32 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2860,6 +2860,33 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	int ieot_entries = 1;
 	struct index_entry_offset_table *ieot = NULL;
 	int nr, nr_threads;
+	unsigned int wflags = FSYNC_COMPONENT_INDEX;
+
+
+	/*
+	 * TODO: This is abuse of the API recently modified
+	 * finalize_hashfile() which reveals a shortcoming of its
+	 * "fsync" design.
+	 * 
+	 * I.e. It expects a "enum fsync_component component" label,
+	 * but here we're passing it an OR of the two, knowing that
+	 * it'll call fsync_component_or_die() which (in
+	 * write-or-die.c) will do "(fsync_components & wflags)" (to
+	 * our "wflags" here).
+	 *
+	 * But the API really should be changed to explicitly take
+	 * such flags, because in this case we'd like to fsync() the
+	 * index if we're in the bulk mode, *even if* our
+	 * "core.fsync=index" isn't configured.
+	 *
+	 * That's because at this point we've been queuing up object
+	 * writes that we didn't fsync(), and are going to use this
+	 * fsync() to "flush" the whole thing. Doing it this way
+	 * avoids redundantly calling fsync() twice when once will do.
+	 */
+	if (fsync_method == FSYNC_METHOD_BATCH && 
+	    flags & WLI_NEED_LOOSE_FSYNC)
+		wflags |= FSYNC_COMPONENT_LOOSE_OBJECT;
 
 	f = hashfd(tempfile->fd, tempfile->filename.buf);
 
@@ -3094,7 +3121,7 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	if (!alternate_index_output && (flags & COMMIT_LOCK))
 		csum_fsync_flag = CSUM_FSYNC;
 
-	finalize_hashfile(f, istate->oid.hash, FSYNC_COMPONENT_INDEX,
+	finalize_hashfile(f, istate->oid.hash, wflags,
 			  CSUM_HASH_IN_STREAM | csum_fsync_flag);
 
 	if (close_tempfile_gently(tempfile)) {