diff mbox series

[v2,4/7] unpack-objects: use the bulk-checkin infrastructure

Message ID 6662e2dae0f5d65c158fba785d186885f9671073.1647760561.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand

Commit Message

Neeraj Singh (WINDOWS-SFS) March 20, 2022, 7:15 a.m. UTC
From: Neeraj Singh <neerajsi@microsoft.com>

The unpack-objects functionality is used by fetch, push, and fast-import
to turn the transfered data into object database entries when there are
fewer objects than the 'unpacklimit' setting.

By enabling bulk-checkin when unpacking objects, we can take advantage
of batched fsyncs.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/unpack-objects.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Junio C Hamano March 21, 2022, 5:55 p.m. UTC | #1
"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> The unpack-objects functionality is used by fetch, push, and fast-import
> to turn the transfered data into object database entries when there are
> fewer objects than the 'unpacklimit' setting.
>
> By enabling bulk-checkin when unpacking objects, we can take advantage
> of batched fsyncs.

This feels confused in that we dispatch to unpack-objects (instead
of index-objects) only when the number of loose objects should not
matter from performance point of view, and bulk-checkin should shine
from performance point of view only when there are enough objects to
batch.

Also if we ever add "too many small loose objects is wasteful, let's
send them into a single 'batch pack'" optimization, it would create
a funny situation where the caller sends the contents of a small
incoming packfile to unpack-objects, but the command chooses to
bunch them all together in a packfile anyway ;-)

So, I dunno.


> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  builtin/unpack-objects.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index dbeb0680a58..c55b6616aed 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -1,5 +1,6 @@
>  #include "builtin.h"
>  #include "cache.h"
> +#include "bulk-checkin.h"
>  #include "config.h"
>  #include "object-store.h"
>  #include "object.h"
> @@ -503,10 +504,12 @@ static void unpack_all(void)
>  	if (!quiet)
>  		progress = start_progress(_("Unpacking objects"), nr_objects);
>  	CALLOC_ARRAY(obj_list, nr_objects);
> +	plug_bulk_checkin();
>  	for (i = 0; i < nr_objects; i++) {
>  		unpack_one(i);
>  		display_progress(progress, i + 1);
>  	}
> +	unplug_bulk_checkin();
>  	stop_progress(&progress);
>  
>  	if (delta_list)
Neeraj Singh March 21, 2022, 11:02 p.m. UTC | #2
On Mon, Mar 21, 2022 at 10:55 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > The unpack-objects functionality is used by fetch, push, and fast-import
> > to turn the transfered data into object database entries when there are
> > fewer objects than the 'unpacklimit' setting.
> >
> > By enabling bulk-checkin when unpacking objects, we can take advantage
> > of batched fsyncs.
>
> This feels confused in that we dispatch to unpack-objects (instead
> of index-objects) only when the number of loose objects should not
> matter from performance point of view, and bulk-checkin should shine
> from performance point of view only when there are enough objects to
> batch.
>
> Also if we ever add "too many small loose objects is wasteful, let's
> send them into a single 'batch pack'" optimization, it would create
> a funny situation where the caller sends the contents of a small
> incoming packfile to unpack-objects, but the command chooses to
> bunch them all together in a packfile anyway ;-)
>
> So, I dunno.
>

I'd be happy to just drop this patch.  I originally added it to answer Avarab's
question: how does batch mode compare to packfiles? [1] [2].

[1] https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/
[2] https://lore.kernel.org/git/pull.1076.v5.git.git.1632514331.gitgitgadget@gmail.com/
Neeraj Singh March 22, 2022, 8:54 p.m. UTC | #3
On Mon, Mar 21, 2022 at 4:02 PM Neeraj Singh <nksingh85@gmail.com> wrote:
>
> On Mon, Mar 21, 2022 at 10:55 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > From: Neeraj Singh <neerajsi@microsoft.com>
> > >
> > > The unpack-objects functionality is used by fetch, push, and fast-import
> > > to turn the transfered data into object database entries when there are
> > > fewer objects than the 'unpacklimit' setting.
> > >
> > > By enabling bulk-checkin when unpacking objects, we can take advantage
> > > of batched fsyncs.
> >
> > This feels confused in that we dispatch to unpack-objects (instead
> > of index-objects) only when the number of loose objects should not
> > matter from performance point of view, and bulk-checkin should shine
> > from performance point of view only when there are enough objects to
> > batch.
> >
> > Also if we ever add "too many small loose objects is wasteful, let's
> > send them into a single 'batch pack'" optimization, it would create
> > a funny situation where the caller sends the contents of a small
> > incoming packfile to unpack-objects, but the command chooses to
> > bunch them all together in a packfile anyway ;-)
> >
> > So, I dunno.
> >
>
> I'd be happy to just drop this patch.  I originally added it to answer Avarab's
> question: how does batch mode compare to packfiles? [1] [2].
>
> [1] https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/
> [2] https://lore.kernel.org/git/pull.1076.v5.git.git.1632514331.gitgitgadget@gmail.com/

Well looking back again at the spreadsheet [3], at 90 objects, which
is below the
default transfer.unpackLimit, we see a 3x difference in performance
between batch
mode and the default fsync mode.  That's a different interaction class
(230 ms versus 760 ms).

I'll include a small table in the commit description with these
performance numbers to
help justify it.

[3] https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ
diff mbox series

Patch

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dbeb0680a58..c55b6616aed 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -1,5 +1,6 @@ 
 #include "builtin.h"
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "object-store.h"
 #include "object.h"
@@ -503,10 +504,12 @@  static void unpack_all(void)
 	if (!quiet)
 		progress = start_progress(_("Unpacking objects"), nr_objects);
 	CALLOC_ARRAY(obj_list, nr_objects);
+	plug_bulk_checkin();
 	for (i = 0; i < nr_objects; i++) {
 		unpack_one(i);
 		display_progress(progress, i + 1);
 	}
+	unplug_bulk_checkin();
 	stop_progress(&progress);
 
 	if (delta_list)