diff mbox series

[1/3] refs/reftable: introduce "reftable.lockTimeout"

Message ID ca3eab99f7ef86d1b7a5b4d4bdb8d2b0a55566e1.1726578382.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: graceful concurrent writes | expand

Commit Message

Patrick Steinhardt Sept. 17, 2024, 1:43 p.m. UTC
When multiple concurrent processes try to update references in a
repository they may try to lock the same lockfiles. This can happen even
when the updates are non-conflicting and can both be applied, so it
doesn't always make sense to abort the transaction immediately. Both the
"loose" and "packed" backends thus have a grace period that they wait
for the lock to be released that can be controlled via the config values
"core.filesRefLockTimeout" and "core.packedRefsTimeout", respectively.

The reftable backend doesn't have such a setting yet and instead fails
immediately when it sees such a lock. But the exact same concepts apply
here as they do apply to the other backends.

Introduce a new "reftable.lockTimeout" config that controls how long we
may wait for a "tables.list" lock to be released. The default value of
this config is 100ms, which is the same default as we have it for the
"loose" backend.

Note that even though we also lock individual tables, this config really
only applies to the "tables.list" file. This is because individual
tables are only ever locked when we already hold the "tables.list" lock
during compaction. When we observe such a lock we in fact do not want to
compact the table at all because it is already in the process of being
compacted by a concurrent process. So applying the same timeout here
would not make any sense and only delay progress.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/reftable.txt |  7 +++++++
 refs/reftable-backend.c           |  4 ++++
 reftable/reftable-writer.h        |  8 ++++++++
 reftable/stack.c                  | 18 ++++++++++++------
 t/t0610-reftable-basics.sh        | 27 +++++++++++++++++++++++++++
 5 files changed, 58 insertions(+), 6 deletions(-)

Comments

karthik nayak Sept. 17, 2024, 5:46 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

[snip]

> +reftable.lockTimeout::
> +	Whenever the reftable backend appends a new table to the stack, it has
> +	to lock the central "tables.list" file before updating it. This config
> +	controls how long the process will wait to acquire the lock in case
> +	another process has already acquired it. Default is 1000 (i.e., retry
> +	for 1 second).

Isn't the default 100ms? As that was what was mentioned in the commit
message

[snip]

> +test_expect_success 'ref transaction: retry acquiring tables.list lock' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit initial &&
> +		LOCK=.git/reftable/tables.list.lock &&
> +		>$LOCK &&
> +		{
> +			( sleep 1 && rm -f $LOCK ) &
> +		} &&
> +		git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
> +	)
> +'

Nit: This does stall the test for 1s. Which is slightly annoying when
running single tests locally. Couldn't we achieve this by doing `sleep
0.1`?
Eric Sunshine Sept. 17, 2024, 5:50 p.m. UTC | #2
On Tue, Sep 17, 2024 at 1:46 PM karthik nayak <karthik.188@gmail.com> wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > +test_expect_success 'ref transaction: retry acquiring tables.list lock' '
> > +     test_when_finished "rm -rf repo" &&
> > +     git init repo &&
> > +     (
> > +             cd repo &&
> > +             test_commit initial &&
> > +             LOCK=.git/reftable/tables.list.lock &&
> > +             >$LOCK &&
> > +             {
> > +                     ( sleep 1 && rm -f $LOCK ) &
> > +             } &&
> > +             git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
> > +     )
> > +'
>
> Nit: This does stall the test for 1s. Which is slightly annoying when
> running single tests locally. Couldn't we achieve this by doing `sleep
> 0.1`?

`sleep 0.1` is neither POSIX nor portable.
Patrick Steinhardt Sept. 18, 2024, 4:31 a.m. UTC | #3
On Tue, Sep 17, 2024 at 01:50:27PM -0400, Eric Sunshine wrote:
> On Tue, Sep 17, 2024 at 1:46 PM karthik nayak <karthik.188@gmail.com> wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> > > +test_expect_success 'ref transaction: retry acquiring tables.list lock' '
> > > +     test_when_finished "rm -rf repo" &&
> > > +     git init repo &&
> > > +     (
> > > +             cd repo &&
> > > +             test_commit initial &&
> > > +             LOCK=.git/reftable/tables.list.lock &&
> > > +             >$LOCK &&
> > > +             {
> > > +                     ( sleep 1 && rm -f $LOCK ) &
> > > +             } &&
> > > +             git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
> > > +     )
> > > +'
> >
> > Nit: This does stall the test for 1s. Which is slightly annoying when
> > running single tests locally. Couldn't we achieve this by doing `sleep
> > 0.1`?
> 
> `sleep 0.1` is neither POSIX nor portable.

The above test also verifies that the timeout can in fact be overridden.
If we only had `sleep 0.1` we wouldn't be able to reliably verify that,
as the default is a timeout of 100ms.

We also have essentially the same tests in t0601 for packed-refs, I
mostly just copied the logic over from there. It's not exactly pretty,
and I'm not a huge fan of introducing timing sensitive tests, but in the
end I guess it's okayish.

Patrick
Patrick Steinhardt Sept. 18, 2024, 4:31 a.m. UTC | #4
On Tue, Sep 17, 2024 at 01:46:39PM -0400, karthik nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> [snip]
> 
> > +reftable.lockTimeout::
> > +	Whenever the reftable backend appends a new table to the stack, it has
> > +	to lock the central "tables.list" file before updating it. This config
> > +	controls how long the process will wait to acquire the lock in case
> > +	another process has already acquired it. Default is 1000 (i.e., retry
> > +	for 1 second).
> 
> Isn't the default 100ms? As that was what was mentioned in the commit
> message

Oh, yeah. I ended up changing it. 1 second is the default timeout used
by packed-refs, 100ms is the default timeout used by loose refs. The
reason why I ultimately picked 100ms over 1s is that our usecase is
closer to loose refs, because it is our "normal" code path that we hit
whenever we write reftables.

Will fix.

Patrick
diff mbox series

Patch

diff --git a/Documentation/config/reftable.txt b/Documentation/config/reftable.txt
index 05157279778..6c9449d0232 100644
--- a/Documentation/config/reftable.txt
+++ b/Documentation/config/reftable.txt
@@ -46,3 +46,10 @@  reftable.geometricFactor::
 By default, the geometric sequence uses a factor of 2, meaning that for any
 table, the next-biggest table must at least be twice as big. A maximum factor
 of 256 is supported.
+
+reftable.lockTimeout::
+	Whenever the reftable backend appends a new table to the stack, it has
+	to lock the central "tables.list" file before updating it. This config
+	controls how long the process will wait to acquire the lock in case
+	another process has already acquired it. Default is 1000 (i.e., retry
+	for 1 second).
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 1c4b19e737f..e90ddfb98dd 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -256,6 +256,9 @@  static int reftable_be_config(const char *var, const char *value,
 		if (factor > UINT8_MAX)
 			die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX);
 		opts->auto_compaction_factor = factor;
+	} else if (!strcmp(var, "reftable.locktimeout")) {
+		unsigned long lock_timeout = git_config_ulong(var, value, ctx->kvi);
+		opts->lock_timeout_ms = lock_timeout;
 	}
 
 	return 0;
@@ -281,6 +284,7 @@  static struct ref_store *reftable_be_init(struct repository *repo,
 	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
 	refs->write_options.disable_auto_compact =
 		!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1);
+	refs->write_options.lock_timeout_ms = 100;
 
 	git_config(reftable_be_config, &refs->write_options);
 
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index 189b1f4144f..d5f2446220b 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -51,6 +51,14 @@  struct reftable_write_options {
 	 * tables to compact. Defaults to 2 if unset.
 	 */
 	uint8_t auto_compaction_factor;
+
+	/*
+	 * The number of milliseconds to wait when trying to lock "tables.list".
+	 * Note that this does not apply to locking individual tables, as these
+	 * should only ever be locked when already holding the "tables.list"
+	 * lock.
+	 */
+	unsigned lock_timeout_ms;
 };
 
 /* reftable_block_stats holds statistics for a single block type */
diff --git a/reftable/stack.c b/reftable/stack.c
index ce0a35216ba..5ccad2cff31 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -603,8 +603,10 @@  static int reftable_stack_init_addition(struct reftable_addition *add,
 
 	add->stack = st;
 
-	err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&add->tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
@@ -1056,8 +1058,10 @@  static int stack_compact_range(struct reftable_stack *st,
 	 * Hold the lock so that we can read "tables.list" and lock all tables
 	 * which are part of the user-specified range.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
@@ -1156,8 +1160,10 @@  static int stack_compact_range(struct reftable_stack *st,
 	 * "tables.list". We'll then replace the compacted range of tables with
 	 * the new table.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 37510c2b2a5..62da3e37823 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -423,6 +423,33 @@  test_expect_success 'ref transaction: fails gracefully when auto compaction fail
 	)
 '
 
+test_expect_success 'ref transaction: timeout acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		>.git/reftable/tables.list.lock &&
+		test_must_fail git update-ref refs/heads/branch HEAD 2>err &&
+		test_grep "cannot lock references" err
+	)
+'
+
+test_expect_success 'ref transaction: retry acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		LOCK=.git/reftable/tables.list.lock &&
+		>$LOCK &&
+		{
+			( sleep 1 && rm -f $LOCK ) &
+		} &&
+		git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
+	)
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&