diff mbox series

[v5,02/14] bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'

Message ID d045b13795b38caa27f8e25340212f736b66bb05.1648616734.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand

Commit Message

Neeraj Singh (WINDOWS-SFS) March 30, 2022, 5:05 a.m. UTC
From: Neeraj Singh <neerajsi@microsoft.com>

Make it clearer in the naming and documentation of the plug_bulk_checkin
and unplug_bulk_checkin APIs that they can be thought of as
a "transaction" to optimize operations on the object database. These
transactions may be nested so that subsystems like the cache-tree
writing code can optimize their operations without caring whether the
top-level code has a transaction active.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/add.c  |  4 ++--
 bulk-checkin.c | 20 ++++++++++++--------
 bulk-checkin.h | 14 ++++++++++++--
 3 files changed, 26 insertions(+), 12 deletions(-)

Comments

Junio C Hamano March 30, 2022, 5:17 p.m. UTC | #1
"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> Make it clearer in the naming and documentation of the plug_bulk_checkin
> and unplug_bulk_checkin APIs that they can be thought of as
> a "transaction" to optimize operations on the object database. These
> transactions may be nested so that subsystems like the cache-tree
> writing code can optimize their operations without caring whether the
> top-level code has a transaction active.

I can see that "checkin" part of the name is too limiting (you may
want to do more than optimize checkin, e.g. fsync), and that you may
prefer "begin/end" over "plug/unplug", but I am not sure if we want
to limit ourselves to "odb".  If we find our code doing things on
many instances of something that are not objects (e.g. refs, config
variables), don't we want to give them the same chance to be optimized
by batching them?

{begin,end}_bulk_transaction perhaps?  I dunno.
Neeraj Singh March 31, 2022, 5:51 a.m. UTC | #2
On Wed, Mar 30, 2022 at 10:17 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > Make it clearer in the naming and documentation of the plug_bulk_checkin
> > and unplug_bulk_checkin APIs that they can be thought of as
> > a "transaction" to optimize operations on the object database. These
> > transactions may be nested so that subsystems like the cache-tree
> > writing code can optimize their operations without caring whether the
> > top-level code has a transaction active.
>
> I can see that "checkin" part of the name is too limiting (you may
> want to do more than optimize checkin, e.g. fsync), and that you may
> prefer "begin/end" over "plug/unplug", but I am not sure if we want
> to limit ourselves to "odb".  If we find our code doing things on
> many instances of something that are not objects (e.g. refs, config
> variables), don't we want to give them the same chance to be optimized
> by batching them?
>
> {begin,end}_bulk_transaction perhaps?  I dunno.

At least in the current code where the implementation of each
'database table' (odb, refs-db, config, index) is pretty separate, it
seems better to keep bulk-checkin.c and its plugging scoped to the
ODB.  Patrick's (who I previously misnamed as 'Peter') older patch at
https://lore.kernel.org/git/d9aa96913b1730f1d0c238d7d52e27c20bc55390.1636544377.git.ps@pks.im/
showed a pretty nice and concise implementation for refs tied to the
ref-transaction infrastructure.

Thanks,
Neeraj
diff mbox series

Patch

diff --git a/builtin/add.c b/builtin/add.c
index 3ffb86a4338..9bf37ceae8e 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -670,7 +670,7 @@  int cmd_add(int argc, const char **argv, const char *prefix)
 		string_list_clear(&only_match_skip_worktree, 0);
 	}
 
-	plug_bulk_checkin();
+	begin_odb_transaction();
 
 	if (add_renormalize)
 		exit_status |= renormalize_tracked_files(&pathspec, flags);
@@ -682,7 +682,7 @@  int cmd_add(int argc, const char **argv, const char *prefix)
 
 	if (chmod_arg && pathspec.nr)
 		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
-	unplug_bulk_checkin();
+	end_odb_transaction();
 
 finish:
 	if (write_locked_index(&the_index, &lock_file,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 577b135e39c..8b0fd5c7723 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -10,7 +10,7 @@ 
 #include "packfile.h"
 #include "object-store.h"
 
-static int bulk_checkin_plugged;
+static int odb_transaction_nesting;
 
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
@@ -280,21 +280,25 @@  int index_bulk_checkin(struct object_id *oid,
 {
 	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!bulk_checkin_plugged)
+	if (!odb_transaction_nesting)
 		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
-void plug_bulk_checkin(void)
+void begin_odb_transaction(void)
 {
-	assert(!bulk_checkin_plugged);
-	bulk_checkin_plugged = 1;
+	odb_transaction_nesting += 1;
 }
 
-void unplug_bulk_checkin(void)
+void end_odb_transaction(void)
 {
-	assert(bulk_checkin_plugged);
-	bulk_checkin_plugged = 0;
+	odb_transaction_nesting -= 1;
+	if (odb_transaction_nesting < 0)
+		BUG("Unbalanced ODB transaction nesting");
+
+	if (odb_transaction_nesting)
+		return;
+
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..69a94422ac7 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -10,7 +10,17 @@  int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
 
-void plug_bulk_checkin(void);
-void unplug_bulk_checkin(void);
+/*
+ * Tell the object database to optimize for adding
+ * multiple objects. end_odb_transaction must be called
+ * to make new objects visible.
+ */
+void begin_odb_transaction(void);
+
+/*
+ * Tell the object database to make any objects from the
+ * current transaction visible.
+ */
+void end_odb_transaction(void);
 
 #endif