diff mbox series

[RFC,v2,2/7] object-file: pass down unpack-objects.c flags for "bulk" checkin

Message ID RFC-patch-v2-2.7-c6f776fc2bc-20220323T140753Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series bottom-up ns/batched-fsync & "plugging" in object-file.c | expand

Commit Message

Ævar Arnfjörð Bjarmason March 23, 2022, 2:18 p.m. UTC
Remove much of this as a POC for exploring some of what I mentioned in
https://lore.kernel.org/git/220322.86mthinxnn.gmgdl@evledraar.gmail.com/

This commit is obviously not what we *should* do as end-state, but
demonstrates what's needed (I think) for a bare-minimum implementation
of just the "bulk" syncing method for loose objects without the part
where we do the tmp-objdir.c dance.

Performance with this is already quite promising. Benchmarking with:

	git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3' \
	    	-p 'rm -rf r.git && git init --bare r.git' \
		'./git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack'

I.e. unpacking a small packfile (my dotfiles) yields, on a Linux
ramdisk:

	Benchmark 1: ./git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'ns/batched-fsync
	  Time (mean ± σ):     815.9 ms ±   8.2 ms    [User: 522.9 ms, System: 287.9 ms]
	  Range (min … max):   805.6 ms … 835.9 ms    10 runs

	Benchmark 2: ./git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'HEAD
	  Time (mean ± σ):     779.4 ms ±  15.4 ms    [User: 505.7 ms, System: 270.2 ms]
	  Range (min … max):   763.1 ms … 813.9 ms    10 runs

	Summary
	  './git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'HEAD' ran
	    1.05 ± 0.02 times faster than './git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'ns/batched-fsync'

Doing the same with "strace --summary-only", which probably helps to
emulate cases with slower syscalls is ~15% faster than using the
tmp-objdir indirection:

	Summary
	  'strace --summary-only ./git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'HEAD' ran
	    1.16 ± 0.01 times faster than 'strace --summary-only ./git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'ns/batched-fsync'

Which makes sense in terms of syscalls. In my case HEAD has ~101k
calls, and the parent topic is making ~129k calls, with around 2x the
number of unlink(), link() as expected.

Of course some users will want to use the tmp-objdir.c method. So a
version of this commit could be rewritten to come earlier in the
series, with the "bulk" on top being optional.

It seems to me that it's a much better strategy to do this whole thing
in close_loose_object() after passing down the new HASH_N_OBJECTS /
HASH_N_OBJECTS_FIRST / HASH_N_OBJECTS_LAST flags.

Doing that for the "builtin/add.c" and "builtin/unpack-objects.c" code
having its {un,}plug_bulk_checkin() removed here is then just a matter
of passing down a similar set of flags indicating whether we're
dealing with N objects, and if so if we're dealing with the last one
or not.

As we'll see in subsequent commits doing it this way also effortlessly
integrates with other HASH_* flags. E.g. for "update-index" the code
being rm'd here doesn't handle the interaction with
"HASH_WRITE_OBJECT" properly, but once we've moved all this sync
bootstrapping logic to close_loose_object() we'll never get to it if
we're not actually writing something.

This code currently doesn't use the HASH_N_OBJECTS_FIRST flag, but
that's what we'd use later to optionally call tmp_objdir_create().

Aside: This also changes logic that was a bit confusing and repetitive
in close_loose_object(). Previously we'd first call
batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT) which is just as
shorthand for:

	fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT &&
	fsync_method == FSYNC_METHOD_BATCH

We'd then proceed to call
fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT) later in the same
function, which is just a way of calling fsync_or_die() if:

	fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT

Now we instead just define a local "fsync_loose" variable by checking
"fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT", which shows us that
the previous case of fsync_component_or_die(...)" could just be added
to the existing "fsync_object_files > 0" branch.

Note: This commit reverts much of "core.fsyncmethod: batched disk
flushes for loose-objects". We'll set up new structures to bring what
it was doing back in a different way. I.e. to do the tmp-objdir
plug-in in object-file.c

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/unpack-objects.c |  2 --
 builtin/update-index.c   |  4 ---
 bulk-checkin.c           | 74 ----------------------------------------
 bulk-checkin.h           |  3 --
 cache.h                  |  5 ---
 object-file.c            | 37 ++++++++++++++------
 6 files changed, 26 insertions(+), 99 deletions(-)

Comments

Neeraj Singh March 23, 2022, 8:25 p.m. UTC | #1
On Wed, Mar 23, 2022 at 7:18 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> Remove much of this as a POC for exploring some of what I mentioned in
> https://lore.kernel.org/git/220322.86mthinxnn.gmgdl@evledraar.gmail.com/
>
> This commit is obviously not what we *should* do as end-state, but
> demonstrates what's needed (I think) for a bare-minimum implementation
> of just the "bulk" syncing method for loose objects without the part
> where we do the tmp-objdir.c dance.
>
> Performance with this is already quite promising. Benchmarking with:
>
>         git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3' \
>                 -p 'rm -rf r.git && git init --bare r.git' \
>                 './git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack'
>
> I.e. unpacking a small packfile (my dotfiles) yields, on a Linux
> ramdisk:
>
>         Benchmark 1: ./git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'ns/batched-fsync
>           Time (mean ± σ):     815.9 ms ±   8.2 ms    [User: 522.9 ms, System: 287.9 ms]
>           Range (min … max):   805.6 ms … 835.9 ms    10 runs
>
>         Benchmark 2: ./git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'HEAD
>           Time (mean ± σ):     779.4 ms ±  15.4 ms    [User: 505.7 ms, System: 270.2 ms]
>           Range (min … max):   763.1 ms … 813.9 ms    10 runs
>
>         Summary
>           './git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'HEAD' ran
>             1.05 ± 0.02 times faster than './git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'ns/batched-fsync'
>
> Doing the same with "strace --summary-only", which probably helps to
> emulate cases with slower syscalls is ~15% faster than using the
> tmp-objdir indirection:
>
>         Summary
>           'strace --summary-only ./git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'HEAD' ran
>             1.16 ± 0.01 times faster than 'strace --summary-only ./git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects </tmp/pack-dotfiles.pack' in 'ns/batched-fsync'
>
> Which makes sense in terms of syscalls. In my case HEAD has ~101k
> calls, and the parent topic is making ~129k calls, with around 2x the
> number of unlink(), link() as expected.
>
> Of course some users will want to use the tmp-objdir.c method. So a
> version of this commit could be rewritten to come earlier in the
> series, with the "bulk" on top being optional.
>
> It seems to me that it's a much better strategy to do this whole thing
> in close_loose_object() after passing down the new HASH_N_OBJECTS /
> HASH_N_OBJECTS_FIRST / HASH_N_OBJECTS_LAST flags.
>
> Doing that for the "builtin/add.c" and "builtin/unpack-objects.c" code
> having its {un,}plug_bulk_checkin() removed here is then just a matter
> of passing down a similar set of flags indicating whether we're
> dealing with N objects, and if so if we're dealing with the last one
> or not.
>
> As we'll see in subsequent commits doing it this way also effortlessly
> integrates with other HASH_* flags. E.g. for "update-index" the code
> being rm'd here doesn't handle the interaction with
> "HASH_WRITE_OBJECT" properly, but once we've moved all this sync
> bootstrapping logic to close_loose_object() we'll never get to it if
> we're not actually writing something.
>
> This code currently doesn't use the HASH_N_OBJECTS_FIRST flag, but
> that's what we'd use later to optionally call tmp_objdir_create().
>
> Aside: This also changes logic that was a bit confusing and repetitive
> in close_loose_object(). Previously we'd first call
> batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT) which is just as
> shorthand for:
>
>         fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT &&
>         fsync_method == FSYNC_METHOD_BATCH
>
> We'd then proceed to call
> fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT) later in the same
> function, which is just a way of calling fsync_or_die() if:
>
>         fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT
>
> Now we instead just define a local "fsync_loose" variable by checking
> "fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT", which shows us that
> the previous case of fsync_component_or_die(...)" could just be added
> to the existing "fsync_object_files > 0" branch.
>
> Note: This commit reverts much of "core.fsyncmethod: batched disk
> flushes for loose-objects". We'll set up new structures to bring what
> it was doing back in a different way. I.e. to do the tmp-objdir
> plug-in in object-file.c
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/unpack-objects.c |  2 --
>  builtin/update-index.c   |  4 ---
>  bulk-checkin.c           | 74 ----------------------------------------
>  bulk-checkin.h           |  3 --
>  cache.h                  |  5 ---
>  object-file.c            | 37 ++++++++++++++------
>  6 files changed, 26 insertions(+), 99 deletions(-)
>
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index ec40c6fd966..93da436581b 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -507,7 +507,6 @@ static void unpack_all(void)
>         if (!quiet)
>                 progress = start_progress(_("Unpacking objects"), nr_objects);
>         CALLOC_ARRAY(obj_list, nr_objects);
> -       plug_bulk_checkin();
>         oflags = nr_objects > 1 ? HASH_N_OBJECTS : 0;
>         for (i = 0; i < nr_objects; i++) {
>                 int nth = i + 1;
> @@ -517,7 +516,6 @@ static void unpack_all(void)
>                 unpack_one(i, oflags | f);
>                 display_progress(progress, nth);
>         }
> -       unplug_bulk_checkin();
>         stop_progress(&progress);
>
>         if (delta_list)
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index cbd2b0d633b..95ed3c47b2e 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -1118,8 +1118,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>         parse_options_start(&ctx, argc, argv, prefix,
>                             options, PARSE_OPT_STOP_AT_NON_OPTION);
>
> -       /* optimize adding many objects to the object database */
> -       plug_bulk_checkin();
>         while (ctx.argc) {
>                 if (parseopt_state != PARSE_OPT_DONE)
>                         parseopt_state = parse_options_step(&ctx, options,
> @@ -1194,8 +1192,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>                 strbuf_release(&buf);
>         }
>
> -       /* by now we must have added all of the new objects */
> -       unplug_bulk_checkin();
>         if (split_index > 0) {
>                 if (git_config_get_split_index() == 0)
>                         warning(_("core.splitIndex is set to false; "
> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index a0dca79ba6a..577b135e39c 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -3,20 +3,15 @@
>   */
>  #include "cache.h"
>  #include "bulk-checkin.h"
> -#include "lockfile.h"
>  #include "repository.h"
>  #include "csum-file.h"
>  #include "pack.h"
>  #include "strbuf.h"
> -#include "string-list.h"
> -#include "tmp-objdir.h"
>  #include "packfile.h"
>  #include "object-store.h"
>
>  static int bulk_checkin_plugged;
>
> -static struct tmp_objdir *bulk_fsync_objdir;
> -
>  static struct bulk_checkin_state {
>         char *pack_tmp_name;
>         struct hashfile *f;
> @@ -85,40 +80,6 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
>         reprepare_packed_git(the_repository);
>  }
>
> -/*
> - * Cleanup after batch-mode fsync_object_files.
> - */
> -static void do_batch_fsync(void)
> -{
> -       struct strbuf temp_path = STRBUF_INIT;
> -       struct tempfile *temp;
> -
> -       if (!bulk_fsync_objdir)
> -               return;
> -
> -       /*
> -        * Issue a full hardware flush against a temporary file to ensure
> -        * that all objects are durable before any renames occur. The code in
> -        * fsync_loose_object_bulk_checkin has already issued a writeout
> -        * request, but it has not flushed any writeback cache in the storage
> -        * hardware or any filesystem logs. This fsync call acts as a barrier
> -        * to ensure that the data in each new object file is durable before
> -        * the final name is visible.
> -        */
> -       strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
> -       temp = xmks_tempfile(temp_path.buf);
> -       fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
> -       delete_tempfile(&temp);
> -       strbuf_release(&temp_path);
> -
> -       /*
> -        * Make the object files visible in the primary ODB after their data is
> -        * fully durable.
> -        */
> -       tmp_objdir_migrate(bulk_fsync_objdir);
> -       bulk_fsync_objdir = NULL;
> -}
> -
>  static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
>  {
>         int i;
> @@ -313,26 +274,6 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
>         return 0;
>  }
>
> -void prepare_loose_object_bulk_checkin(void)
> -{
> -       if (bulk_checkin_plugged && !bulk_fsync_objdir)
> -               bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
> -}
> -
> -void fsync_loose_object_bulk_checkin(int fd, const char *filename)
> -{
> -       /*
> -        * If we have a plugged bulk checkin, we issue a call that
> -        * cleans the filesystem page cache but avoids a hardware flush
> -        * command. Later on we will issue a single hardware flush
> -        * before as part of do_batch_fsync.
> -        */
> -       if (!bulk_fsync_objdir ||
> -           git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0) {
> -               fsync_or_die(fd, filename);
> -       }
> -}
> -
>  int index_bulk_checkin(struct object_id *oid,
>                        int fd, size_t size, enum object_type type,
>                        const char *path, unsigned flags)
> @@ -347,19 +288,6 @@ int index_bulk_checkin(struct object_id *oid,
>  void plug_bulk_checkin(void)
>  {
>         assert(!bulk_checkin_plugged);
> -
> -       /*
> -        * A temporary object directory is used to hold the files
> -        * while they are not fsynced.
> -        */
> -       if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) {
> -               bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
> -               if (!bulk_fsync_objdir)
> -                       die(_("Could not create temporary object directory for core.fsyncMethod=batch"));
> -
> -               tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0);
> -       }
> -
>         bulk_checkin_plugged = 1;
>  }
>
> @@ -369,6 +297,4 @@ void unplug_bulk_checkin(void)
>         bulk_checkin_plugged = 0;
>         if (bulk_checkin_state.f)
>                 finish_bulk_checkin(&bulk_checkin_state);
> -
> -       do_batch_fsync();
>  }
> diff --git a/bulk-checkin.h b/bulk-checkin.h
> index 181d3447ff9..b26f3dc3b74 100644
> --- a/bulk-checkin.h
> +++ b/bulk-checkin.h
> @@ -6,9 +6,6 @@
>
>  #include "cache.h"
>
> -void prepare_loose_object_bulk_checkin(void);
> -void fsync_loose_object_bulk_checkin(int fd, const char *filename);
> -
>  int index_bulk_checkin(struct object_id *oid,
>                        int fd, size_t size, enum object_type type,
>                        const char *path, unsigned flags);
> diff --git a/cache.h b/cache.h
> index 72c91c91286..2f3831fa853 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1772,11 +1772,6 @@ void fsync_or_die(int fd, const char *);
>  int fsync_component(enum fsync_component component, int fd);
>  void fsync_component_or_die(enum fsync_component component, int fd, const char *msg);
>
> -static inline int batch_fsync_enabled(enum fsync_component component)
> -{
> -       return (fsync_components & component) && (fsync_method == FSYNC_METHOD_BATCH);
> -}
> -
>  ssize_t read_in_full(int fd, void *buf, size_t count);
>  ssize_t write_in_full(int fd, const void *buf, size_t count);
>  ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset);
> diff --git a/object-file.c b/object-file.c
> index cd0ddb49e4b..dbeb3df502d 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1886,19 +1886,37 @@ void hash_object_file(const struct git_hash_algo *algo, const void *buf,
>         hash_object_file_literally(algo, buf, len, type_name(type), oid);
>  }
>
> +static void sync_loose_object_batch(int fd, const char *filename,
> +                                   const unsigned oflags)
> +{
> +       const int last = oflags & HASH_N_OBJECTS_LAST;
> +
> +       /*
> +        * We're doing a sync_file_range() (or equivalent) for 1..N-1
> +        * objects, and then a "real" fsync() for N. On some OS's
> +        * enabling core.fsync=loose-object && core.fsyncMethod=batch
> +        * improves the performance by a lot.
> +        */
> +       if (last || (!last && git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0))
> +               fsync_or_die(fd, filename);
> +}
> +
>  /* Finalize a file on disk, and close it. */
> -static void close_loose_object(int fd, const char *filename)
> +static void close_loose_object(int fd, const char *filename,
> +                              const unsigned oflags)
>  {
> +       int fsync_loose;
> +
>         if (the_repository->objects->odb->will_destroy)
>                 goto out;
>
> -       if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
> -               fsync_loose_object_bulk_checkin(fd, filename);
> -       else if (fsync_object_files > 0)
> +       fsync_loose = fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT;
> +
> +       if (oflags & HASH_N_OBJECTS && fsync_loose &&
> +           fsync_method == FSYNC_METHOD_BATCH)
> +               sync_loose_object_batch(fd, filename, oflags);
> +       else if (fsync_object_files > 0 || fsync_loose)
>                 fsync_or_die(fd, filename);
> -       else
> -               fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd,
> -                                      filename);
>
>  out:
>         if (close(fd) != 0)
> @@ -1962,9 +1980,6 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>         static struct strbuf tmp_file = STRBUF_INIT;
>         static struct strbuf filename = STRBUF_INIT;
>
> -       if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
> -               prepare_loose_object_bulk_checkin();
> -
>         loose_object_path(the_repository, &filename, oid);
>
>         fd = create_tmpfile(&tmp_file, filename.buf);
> @@ -2015,7 +2030,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>                 die(_("confused by unstable object source data for %s"),
>                     oid_to_hex(oid));
>
> -       close_loose_object(fd, tmp_file.buf);
> +       close_loose_object(fd, tmp_file.buf, flags);
>
>         if (mtime) {
>                 struct utimbuf utb;
> --
> 2.35.1.1428.g1c1a0152d61
>

Fine. Doing this patch series as non-RFC, we could start from prior to
my fsyncMethod=batch series.
diff mbox series

Patch

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index ec40c6fd966..93da436581b 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -507,7 +507,6 @@  static void unpack_all(void)
 	if (!quiet)
 		progress = start_progress(_("Unpacking objects"), nr_objects);
 	CALLOC_ARRAY(obj_list, nr_objects);
-	plug_bulk_checkin();
 	oflags = nr_objects > 1 ? HASH_N_OBJECTS : 0;
 	for (i = 0; i < nr_objects; i++) {
 		int nth = i + 1;
@@ -517,7 +516,6 @@  static void unpack_all(void)
 		unpack_one(i, oflags | f);
 		display_progress(progress, nth);
 	}
-	unplug_bulk_checkin();
 	stop_progress(&progress);
 
 	if (delta_list)
diff --git a/builtin/update-index.c b/builtin/update-index.c
index cbd2b0d633b..95ed3c47b2e 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1118,8 +1118,6 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 	parse_options_start(&ctx, argc, argv, prefix,
 			    options, PARSE_OPT_STOP_AT_NON_OPTION);
 
-	/* optimize adding many objects to the object database */
-	plug_bulk_checkin();
 	while (ctx.argc) {
 		if (parseopt_state != PARSE_OPT_DONE)
 			parseopt_state = parse_options_step(&ctx, options,
@@ -1194,8 +1192,6 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
-	/* by now we must have added all of the new objects */
-	unplug_bulk_checkin();
 	if (split_index > 0) {
 		if (git_config_get_split_index() == 0)
 			warning(_("core.splitIndex is set to false; "
diff --git a/bulk-checkin.c b/bulk-checkin.c
index a0dca79ba6a..577b135e39c 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,20 +3,15 @@ 
  */
 #include "cache.h"
 #include "bulk-checkin.h"
-#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
-#include "string-list.h"
-#include "tmp-objdir.h"
 #include "packfile.h"
 #include "object-store.h"
 
 static int bulk_checkin_plugged;
 
-static struct tmp_objdir *bulk_fsync_objdir;
-
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
@@ -85,40 +80,6 @@  static void finish_bulk_checkin(struct bulk_checkin_state *state)
 	reprepare_packed_git(the_repository);
 }
 
-/*
- * Cleanup after batch-mode fsync_object_files.
- */
-static void do_batch_fsync(void)
-{
-	struct strbuf temp_path = STRBUF_INIT;
-	struct tempfile *temp;
-
-	if (!bulk_fsync_objdir)
-		return;
-
-	/*
-	 * Issue a full hardware flush against a temporary file to ensure
-	 * that all objects are durable before any renames occur. The code in
-	 * fsync_loose_object_bulk_checkin has already issued a writeout
-	 * request, but it has not flushed any writeback cache in the storage
-	 * hardware or any filesystem logs. This fsync call acts as a barrier
-	 * to ensure that the data in each new object file is durable before
-	 * the final name is visible.
-	 */
-	strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
-	temp = xmks_tempfile(temp_path.buf);
-	fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
-	delete_tempfile(&temp);
-	strbuf_release(&temp_path);
-
-	/*
-	 * Make the object files visible in the primary ODB after their data is
-	 * fully durable.
-	 */
-	tmp_objdir_migrate(bulk_fsync_objdir);
-	bulk_fsync_objdir = NULL;
-}
-
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -313,26 +274,6 @@  static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
-void prepare_loose_object_bulk_checkin(void)
-{
-	if (bulk_checkin_plugged && !bulk_fsync_objdir)
-		bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
-}
-
-void fsync_loose_object_bulk_checkin(int fd, const char *filename)
-{
-	/*
-	 * If we have a plugged bulk checkin, we issue a call that
-	 * cleans the filesystem page cache but avoids a hardware flush
-	 * command. Later on we will issue a single hardware flush
-	 * before as part of do_batch_fsync.
-	 */
-	if (!bulk_fsync_objdir ||
-	    git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0) {
-		fsync_or_die(fd, filename);
-	}
-}
-
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
@@ -347,19 +288,6 @@  int index_bulk_checkin(struct object_id *oid,
 void plug_bulk_checkin(void)
 {
 	assert(!bulk_checkin_plugged);
-
-	/*
-	 * A temporary object directory is used to hold the files
-	 * while they are not fsynced.
-	 */
-	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) {
-		bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
-		if (!bulk_fsync_objdir)
-			die(_("Could not create temporary object directory for core.fsyncMethod=batch"));
-
-		tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0);
-	}
-
 	bulk_checkin_plugged = 1;
 }
 
@@ -369,6 +297,4 @@  void unplug_bulk_checkin(void)
 	bulk_checkin_plugged = 0;
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
-
-	do_batch_fsync();
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index 181d3447ff9..b26f3dc3b74 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,9 +6,6 @@ 
 
 #include "cache.h"
 
-void prepare_loose_object_bulk_checkin(void);
-void fsync_loose_object_bulk_checkin(int fd, const char *filename);
-
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
diff --git a/cache.h b/cache.h
index 72c91c91286..2f3831fa853 100644
--- a/cache.h
+++ b/cache.h
@@ -1772,11 +1772,6 @@  void fsync_or_die(int fd, const char *);
 int fsync_component(enum fsync_component component, int fd);
 void fsync_component_or_die(enum fsync_component component, int fd, const char *msg);
 
-static inline int batch_fsync_enabled(enum fsync_component component)
-{
-	return (fsync_components & component) && (fsync_method == FSYNC_METHOD_BATCH);
-}
-
 ssize_t read_in_full(int fd, void *buf, size_t count);
 ssize_t write_in_full(int fd, const void *buf, size_t count);
 ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset);
diff --git a/object-file.c b/object-file.c
index cd0ddb49e4b..dbeb3df502d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1886,19 +1886,37 @@  void hash_object_file(const struct git_hash_algo *algo, const void *buf,
 	hash_object_file_literally(algo, buf, len, type_name(type), oid);
 }
 
+static void sync_loose_object_batch(int fd, const char *filename,
+				    const unsigned oflags)
+{
+	const int last = oflags & HASH_N_OBJECTS_LAST;
+
+	/*
+	 * We're doing a sync_file_range() (or equivalent) for 1..N-1
+	 * objects, and then a "real" fsync() for N. On some OS's
+	 * enabling core.fsync=loose-object && core.fsyncMethod=batch
+	 * improves the performance by a lot.
+	 */
+	if (last || (!last && git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0))
+		fsync_or_die(fd, filename);
+}
+
 /* Finalize a file on disk, and close it. */
-static void close_loose_object(int fd, const char *filename)
+static void close_loose_object(int fd, const char *filename,
+			       const unsigned oflags)
 {
+	int fsync_loose;
+
 	if (the_repository->objects->odb->will_destroy)
 		goto out;
 
-	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
-		fsync_loose_object_bulk_checkin(fd, filename);
-	else if (fsync_object_files > 0)
+	fsync_loose = fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT;
+
+	if (oflags & HASH_N_OBJECTS && fsync_loose &&
+	    fsync_method == FSYNC_METHOD_BATCH)
+		sync_loose_object_batch(fd, filename, oflags);
+	else if (fsync_object_files > 0 || fsync_loose)
 		fsync_or_die(fd, filename);
-	else
-		fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd,
-				       filename);
 
 out:
 	if (close(fd) != 0)
@@ -1962,9 +1980,6 @@  static int write_loose_object(const struct object_id *oid, char *hdr,
 	static struct strbuf tmp_file = STRBUF_INIT;
 	static struct strbuf filename = STRBUF_INIT;
 
-	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
-		prepare_loose_object_bulk_checkin();
-
 	loose_object_path(the_repository, &filename, oid);
 
 	fd = create_tmpfile(&tmp_file, filename.buf);
@@ -2015,7 +2030,7 @@  static int write_loose_object(const struct object_id *oid, char *hdr,
 		die(_("confused by unstable object source data for %s"),
 		    oid_to_hex(oid));
 
-	close_loose_object(fd, tmp_file.buf);
+	close_loose_object(fd, tmp_file.buf, flags);
 
 	if (mtime) {
 		struct utimbuf utb;