diff mbox series

[v5,07/14] update-index: use the bulk-checkin infrastructure

Message ID 8cac94598a58704d9b625a9d8a593779f7adc30f.1648616734.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand

Commit Message

Neeraj Singh (WINDOWS-SFS) March 30, 2022, 5:05 a.m. UTC
From: Neeraj Singh <neerajsi@microsoft.com>

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables odb-transactions for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
batch fsync functionality.

There is some risk with this change, since under batch fsync, the object
files will be in a tmp-objdir until update-index is complete, so callers
using the --stdin option will not see them until update-index is done.
This risk is mitigated by not keeping an ODB transaction open around
--stdin processing if in --verbose mode. Without --verbose mode,
a caller feeding update-index via --stdin wouldn't know when
update-index adds an object, event without an ODB transaction.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/update-index.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Junio C Hamano March 30, 2022, 5:52 p.m. UTC | #1
"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	/*
> +	 * Allow the object layer to optimize adding multiple objects in
> +	 * a batch.
> +	 */
> +	begin_odb_transaction();
>  	while (ctx.argc) {
>  		if (parseopt_state != PARSE_OPT_DONE)
>  			parseopt_state = parse_options_step(&ctx, options,
> @@ -1167,6 +1174,17 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>  		the_index.version = preferred_index_format;
>  	}
>  
> +	/*
> +	 * It is possible, though unlikely, that a caller could use the verbose
> +	 * output to synchronize with addition of objects to the object
> +	 * database. The current implementation of ODB transactions leaves
> +	 * objects invisible while a transaction is active, so end the
> +	 * transaction here if verbose output is enabled.
> +	 */
> +
> +	if (verbose)
> +		end_odb_transaction();
> +
>  	if (read_from_stdin) {
>  		struct strbuf buf = STRBUF_INIT;
>  		struct strbuf unquoted = STRBUF_INIT;
> @@ -1190,6 +1208,12 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>  		strbuf_release(&buf);
>  	}
>  
> +	/*
> +	 * By now we have added all of the new objects
> +	 */
> +	if (!verbose)
> +		end_odb_transaction();

If we had "flush" in addition to "begin" and "end", then we could,
instead of the above

    begin_transaction
	do things
    if condition:
	end_transaction
    loop:
	do thing
    if !condition:
	end_transaction


which is somewhat hard to follow and maintain, consider using a
different flow, which is

    begin_transaction
	do things
    loop:
	do thing
	if condition:
	    flush
    end_transaction

and that might make it easier to follow and maintain.  I am not 100%
sure if it is worth it, but I am leaning to say it would be.

Thanks.
Neeraj Singh March 30, 2022, 7:09 p.m. UTC | #2
On Wed, Mar 30, 2022 at 10:52 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +     /*
> > +      * Allow the object layer to optimize adding multiple objects in
> > +      * a batch.
> > +      */
> > +     begin_odb_transaction();
> >       while (ctx.argc) {
> >               if (parseopt_state != PARSE_OPT_DONE)
> >                       parseopt_state = parse_options_step(&ctx, options,
> > @@ -1167,6 +1174,17 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
> >               the_index.version = preferred_index_format;
> >       }
> >
> > +     /*
> > +      * It is possible, though unlikely, that a caller could use the verbose
> > +      * output to synchronize with addition of objects to the object
> > +      * database. The current implementation of ODB transactions leaves
> > +      * objects invisible while a transaction is active, so end the
> > +      * transaction here if verbose output is enabled.
> > +      */
> > +
> > +     if (verbose)
> > +             end_odb_transaction();
> > +
> >       if (read_from_stdin) {
> >               struct strbuf buf = STRBUF_INIT;
> >               struct strbuf unquoted = STRBUF_INIT;
> > @@ -1190,6 +1208,12 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
> >               strbuf_release(&buf);
> >       }
> >
> > +     /*
> > +      * By now we have added all of the new objects
> > +      */
> > +     if (!verbose)
> > +             end_odb_transaction();
>
> If we had "flush" in addition to "begin" and "end", then we could,
> instead of the above
>
>     begin_transaction
>         do things
>     if condition:
>         end_transaction
>     loop:
>         do thing
>     if !condition:
>         end_transaction
>
>
> which is somewhat hard to follow and maintain, consider using a
> different flow, which is
>
>     begin_transaction
>         do things
>     loop:
>         do thing
>         if condition:
>             flush
>     end_transaction
>
> and that might make it easier to follow and maintain.  I am not 100%
> sure if it is worth it, but I am leaning to say it would be.
>
> Thanks.

I thought about this, but I was somewhat worried about the extra cost
of "flushing" versus just making things visible immediately if the
top-level update-index command knows it's going to need objects to be
visible.  I'll go ahead and implement flush in the next iteration.
diff mbox series

Patch

diff --git a/builtin/update-index.c b/builtin/update-index.c
index aafe7eeac2a..50f9063e1c6 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@ 
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1116,6 +1117,12 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 	 */
 	parse_options_start(&ctx, argc, argv, prefix,
 			    options, PARSE_OPT_STOP_AT_NON_OPTION);
+
+	/*
+	 * Allow the object layer to optimize adding multiple objects in
+	 * a batch.
+	 */
+	begin_odb_transaction();
 	while (ctx.argc) {
 		if (parseopt_state != PARSE_OPT_DONE)
 			parseopt_state = parse_options_step(&ctx, options,
@@ -1167,6 +1174,17 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 		the_index.version = preferred_index_format;
 	}
 
+	/*
+	 * It is possible, though unlikely, that a caller could use the verbose
+	 * output to synchronize with addition of objects to the object
+	 * database. The current implementation of ODB transactions leaves
+	 * objects invisible while a transaction is active, so end the
+	 * transaction here if verbose output is enabled.
+	 */
+
+	if (verbose)
+		end_odb_transaction();
+
 	if (read_from_stdin) {
 		struct strbuf buf = STRBUF_INIT;
 		struct strbuf unquoted = STRBUF_INIT;
@@ -1190,6 +1208,12 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	/*
+	 * By now we have added all of the new objects
+	 */
+	if (!verbose)
+		end_odb_transaction();
+
 	if (split_index > 0) {
 		if (git_config_get_split_index() == 0)
 			warning(_("core.splitIndex is set to false; "